From svenar at simula.no Mon Dec 1 00:43:02 2008 From: svenar at simula.no (Sven-Arne Reinemo) Date: Mon, 01 Dec 2008 09:43:02 +0100 Subject: [ofa-general] opensm support for toroidal meshes In-Reply-To: <000501c9437d$ffa7cd90$fef768b0$@com> References: <000501c9437d$ffa7cd90$fef768b0$@com> Message-ID: <1228120982.1854.54.camel@zaltys.simula.no> Hi all, I just thought I should share some simulation results with you. I just did some simulations to test Bob's suggested changes, and I see that the number VLs required for both 2d and 3d tori is either _reduced_ or _equal_ to that of the existing implementation. Moreover, the port reordering seems to work very well on tori that is not cabled regularly with regards to port numbering. As state by Bob it makes LASH route them as if they where regularly cabled, which is the most optimal for LASH. Below are some numbers for the VLs required for various 2d and 3d tori. Be aware that the number of VLs required would be different if the torus has a different size along each dimension. Tori Current Patch 4x4 2 2 5x5 3 3 6x6 4 3 7x7 3 3 8x8 6 4 9x9 4 4 10x10 9 6 11x11 5 5 12x12 9 4 13x13 7 7 14x14 12 8 15x15 10 10 4x4x4 5 5 5x5x5 5 4 6x6x6 10 6 7x7x7 10 10 8x8x8 12 9 9x9x9 14 14 Regards, Sven-Arne On ma., 2008-11-10 at 15:47 -0600, Robert Pearson wrote: > We have been involved in a project to deliver a large system based on a > toroidal mesh fabric. One of the requirements for this system is to be able > to guarantee a deadlock free routing of the fabric. The lash routing engine > in opensm did not work in this case because required number of VLs for the > machine as configured was 12 which exceeded the number of VLs supported by > Mellanox switch ASICs. It turns out that if one has the freedom to reorder > the order of the port assignments used by lash optimally that lash can > successfully route the fabric but that is impractical in the hardware. The > attached note describes an algorithm for automatically recognizing when a > Cartesian mesh fabric is a torus, determining its size and optimally > reordering the ports in opensm so that lash can generate a route with the > smallest number of VLs. > > We have implemented a set of changes to opensm that implement this algorithm > and will submit the changes as patches. This note will help to understand > the code. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From vlad at lists.openfabrics.org Mon Dec 1 03:20:34 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 1 Dec 2008 03:20:34 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081201-0200 daily build status Message-ID: <20081201112034.43748E60CF5@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From jackm at dev.mellanox.co.il Mon Dec 1 04:32:38 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 1 Dec 2008 14:32:38 +0200 Subject: [ofa-general] [PATCH] mlx4_ib: Fix MTT leakage in resize_cq Message-ID: <200812011432.38939.jackm@dev.mellanox.co.il> mlx4_ib: Fix MTT leakage in resize_cq. MTTs associated with the old CQE buffer were not deallocated (i.e., returned to the free pool). As a result, the MTT free pool was eventually completely emptied (for users of resize_cq). Once the resize_cq command returns successfully from FW, FW no longer accesses the old CQE buffer, so it is safe to deallocate the MTT entries used by the old CQE buffer. Finally, if the resize_cq command fails, the MTTs allocated for the new CQEs buffer also need to be de-allocated. Signed-off-by: Jack Morgenstein --- Roland, Resize_cq is already in kernel 2.6.27, so I think this fix should go into a "latest stable" kernel version (say, 2.6.27.8). For heavy resize_cq users, such as MPI, this fix is significant. There is no need to call mlx4_mtt_cleanup separately for userspace and kernel-space paths -- this way the call is made in only one place. Actually, we cannot call mlx4_mtt_cleanup inside a spinlock anyway, because it may sleep (it indirectly invokes mlx4_UNMAP_ICM). There is also no need to protect the mlx4_mtt_cleanup call against parallel access (spinlock), since the cleanup only is relevant to FW access to the buffer (MTT entries); the buffer itself is still available (or not!) to the driver. Finally, I save the mtt for cleanup because the switchover to the new buffer may have already occurred (in mlx4_ib_poll_one) when the resize_cq call returns (in which case cq->buf.mtt has already been overwritten with the new CQE buffer info). diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index d0866a3..20d6833 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -343,6 +343,7 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) { struct mlx4_ib_dev *dev = to_mdev(ibcq->device); struct mlx4_ib_cq *cq = to_mcq(ibcq); + struct mlx4_mtt mtt; int outst_cqe; int err; @@ -376,10 +377,12 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) goto out; } + mtt = cq->buf.mtt; err = mlx4_cq_resize(dev->dev, &cq->mcq, entries, &cq->resize_buf->buf.mtt); if (err) goto err_buf; + mlx4_mtt_cleanup(dev->dev, &mtt); if (ibcq->uobject) { cq->buf = cq->resize_buf->buf; cq->ibcq.cqe = cq->resize_buf->cqe; @@ -406,6 +409,7 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) goto out; err_buf: + mlx4_mtt_cleanup(dev->dev, &cq->resize_buf->buf.mtt); if (!ibcq->uobject) mlx4_ib_free_cq_buf(dev, &cq->resize_buf->buf, cq->resize_buf->cqe); diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c From alekseys at voltaire.com Mon Dec 1 05:16:44 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Mon, 01 Dec 2008 15:16:44 +0200 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv5 1/2]IPv6 IB addr resolution Message-ID: <1228137404.3852.23.camel@alst60.voltaire.com> Set from two patches for support IPv6 protocol in RMDA CM Changes: Use two patches instead of six. This patch adds IPv6 support for IB address resolution. >From da89c2a9ce0b2309362208f167fa2582f2f0d929 Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Mon, 1 Dec 2008 13:55:01 +0200 Subject: [PATCH] IB addr IPv6 support Support for network discovery in addr_send_arp function. Local IPv6 address resolution. Added remote IPv6 address resolusion for RDMA CM. Function addr_resolve_remote used as wrapper for two other functions: addr4_resolve_remote ( original addr_resolve_remote ) addr6_resolve_remote ( new function ) Signed-off-by: Aleksey Senin :100644 100644 f95d21f... d9170c6... M drivers/infiniband/core/addr.c diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index f95d21f..d9170c6 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -43,6 +43,7 @@ #include #include #include +#include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("IB Address Translation"); @@ -172,27 +173,42 @@ static void queue_req(struct addr_req *req) mutex_unlock(&lock); } -static void addr_send_arp(struct sockaddr_in *dst_in) +static void addr_send_arp(struct sockaddr *dst_in) { struct rtable *rt; struct flowi fl; - __be32 dst_ip = dst_in->sin_addr.s_addr; + struct dst_entry *dst; memset(&fl, 0, sizeof fl); - fl.nl_u.ip4_u.daddr = dst_ip; - if (ip_route_output_key(&init_net, &rt, &fl)) - return; + if (dst_in->sa_family == AF_INET) { + fl.nl_u.ip4_u.daddr = + ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; - neigh_event_send(rt->u.dst.neighbour, NULL); - ip_rt_put(rt); + if (ip_route_output_key(&init_net, &rt, &fl)) + return; + + neigh_event_send(rt->u.dst.neighbour, NULL); + ip_rt_put(rt); + + } else { + fl.nl_u.ip6_u.daddr = + ((struct sockaddr_in6 *)dst_in)->sin6_addr; + + dst = ip6_route_output(&init_net, NULL, &fl); + if (!dst) + return; + + neigh_event_send(dst->neighbour, NULL); + dst_release(dst); + } } -static int addr_resolve_remote(struct sockaddr *src_in, - struct sockaddr *dst_in, +static int addr4_resolve_remote(struct sockaddr_in *src_in, + struct sockaddr_in *dst_in, struct rdma_dev_addr *addr) { - __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; - __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; + __be32 src_ip = src_in->sin_addr.s_addr; + __be32 dst_ip = dst_in->sin_addr.s_addr; struct flowi fl; struct rtable *rt; struct neighbour *neigh; @@ -223,8 +239,8 @@ static int addr_resolve_remote(struct sockaddr *src_in, } if (!src_ip) { - src_in->sa_family = dst_in->sa_family; - ((struct sockaddr_in *)src_in)->sin_addr.s_addr = rt->rt_src; + src_in->sin_family = dst_in->sin_family; + src_in->sin_addr.s_addr = rt->rt_src; } ret = rdma_copy_addr(addr, neigh->dev, neigh->ha); @@ -236,6 +252,47 @@ out: return ret; } +static int addr6_resolve_remote(struct sockaddr_in6 *src_in, + struct sockaddr_in6 *dst_in, + struct rdma_dev_addr *addr) +{ + struct flowi fl; + struct neighbour *neigh; + struct dst_entry *dst; + int ret = -ENODATA; + + memset(&fl, 0, sizeof fl); + fl.nl_u.ip6_u.daddr = dst_in->sin6_addr; + fl.nl_u.ip6_u.saddr = src_in->sin6_addr; + + dst = ip6_route_output(&init_net, NULL, &fl); + if (!dst) + return ret; + + if (dst->dev->flags & IFF_NOARP) { + ret = rdma_copy_addr(addr, dst->dev, NULL); + } else { + neigh = dst->neighbour; + if (neigh && (neigh->nud_state & NUD_VALID)) + ret = rdma_copy_addr(addr, neigh->dev, neigh->ha); + } + + dst_release(dst); + return ret; +} + +static int addr_resolve_remote(struct sockaddr *src_in, + struct sockaddr *dst_in, + struct rdma_dev_addr *addr) +{ + if (src_in->sa_family == AF_INET) { + return addr4_resolve_remote((struct sockaddr_in *)src_in, + (struct sockaddr_in *)dst_in, addr); + } else + return addr6_resolve_remote((struct sockaddr_in6 *)src_in, + (struct sockaddr_in6 *)dst_in, addr); +} + static void process_req(struct work_struct *work) { struct addr_req *req, *temp_req; @@ -279,29 +336,58 @@ static int addr_resolve_local(struct sockaddr *src_in, struct rdma_dev_addr *addr) { struct net_device *dev; - __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; - __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; - int ret; + int ret = -EADDRNOTAVAIL; - dev = ip_dev_find(&init_net, dst_ip); - if (!dev) - return -EADDRNOTAVAIL; + if (dst_in->sa_family == AF_INET) { + __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; + __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; + + dev = ip_dev_find(&init_net, dst_ip); + if (!dev) + return -EADDRNOTAVAIL; - if (ipv4_is_zeronet(src_ip)) { - src_in->sa_family = dst_in->sa_family; - ((struct sockaddr_in *)src_in)->sin_addr.s_addr = dst_ip; - ret = rdma_copy_addr(addr, dev, dev->dev_addr); - } else if (ipv4_is_loopback(src_ip)) { - ret = rdma_translate_ip(dst_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + if (ipv4_is_zeronet(src_ip)) { + src_in->sa_family = dst_in->sa_family; + ((struct sockaddr_in *)src_in)->sin_addr.s_addr = dst_ip; + ret = rdma_copy_addr(addr, dev, dev->dev_addr); + } else if (ipv4_is_loopback(src_ip)) { + ret = rdma_translate_ip(dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } else { + ret = rdma_translate_ip(src_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } + dev_put(dev); } else { - ret = rdma_translate_ip(src_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + struct in6_addr *a = &((struct sockaddr_in6 *)dst_in)->sin6_addr; + + for_each_netdev(&init_net, dev) + if (ipv6_chk_addr(&init_net, &((struct sockaddr_in6 *) addr)->sin6_addr, dev, 1)) + break; + + if (!dev) + return -EADDRNOTAVAIL; + + a = &((struct sockaddr_in6 *)src_in)->sin6_addr; + + if (ipv6_addr_any(a)) { + src_in->sa_family = dst_in->sa_family; + ((struct sockaddr_in6 *)src_in)->sin6_addr = + ((struct sockaddr_in6 *)dst_in)->sin6_addr; + ret = rdma_copy_addr(addr, dev, dev->dev_addr); + } else if (ipv6_addr_loopback(a)) { + ret = rdma_translate_ip(dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } else { + ret = rdma_translate_ip(src_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } } - dev_put(dev); return ret; } @@ -344,7 +430,7 @@ int rdma_resolve_ip(struct rdma_addr_client *client, case -ENODATA: req->timeout = msecs_to_jiffies(timeout_ms) + jiffies; queue_req(req); - addr_send_arp((struct sockaddr_in *)dst_in); + addr_send_arp(dst_in); break; default: ret = req->status; -- 1.5.6 From alekseys at voltaire.com Mon Dec 1 05:23:26 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Mon, 01 Dec 2008 15:23:26 +0200 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv5 2/2] RDMA CM support In-Reply-To: <1228137404.3852.23.camel@alst60.voltaire.com> References: <1228137404.3852.23.camel@alst60.voltaire.com> Message-ID: <1228137806.17485.1.camel@alst60.voltaire.com> IPv6 RDMA CM support >From 1a020899aa5d0d8454c172a0a908903c8dbe055e Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Mon, 1 Dec 2008 14:01:10 +0200 Subject: [PATCH] RDMA CM IPv6 support AF_INET6 case in rdma_bind_addr added AF_INET6 support in cma_format_hdr function Use sockaddr_storage structure in cma_bind_any function Signed-off-by: Aleksey Senin :100644 100644 d951896... df22c5c... M drivers/infiniband/core/cma.c diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d951896..df22c5c 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1467,10 +1467,10 @@ static void cma_listen_on_all(struct rdma_id_private *id_priv) static int cma_bind_any(struct rdma_cm_id *id, sa_family_t af) { - struct sockaddr_in addr_in; + struct sockaddr_storage addr_in; memset(&addr_in, 0, sizeof addr_in); - addr_in.sin_family = af; + addr_in.ss_family = af; return rdma_bind_addr(id, (struct sockaddr *) &addr_in); } @@ -2073,7 +2073,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) struct rdma_id_private *id_priv; int ret; - if (addr->sa_family != AF_INET) + if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6) return -EAFNOSUPPORT; id_priv = container_of(id, struct rdma_id_private, id); @@ -2113,32 +2113,61 @@ EXPORT_SYMBOL(rdma_bind_addr); static int cma_format_hdr(void *hdr, enum rdma_port_space ps, struct rdma_route *route) { - struct sockaddr_in *src4, *dst4; struct cma_hdr *cma_hdr; struct sdp_hh *sdp_hdr; - src4 = (struct sockaddr_in *) &route->addr.src_addr; - dst4 = (struct sockaddr_in *) &route->addr.dst_addr; - - switch (ps) { - case RDMA_PS_SDP: - sdp_hdr = hdr; - if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) - return -EINVAL; - sdp_set_ip_ver(sdp_hdr, 4); - sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; - sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; - sdp_hdr->port = src4->sin_port; - break; - default: - cma_hdr = hdr; - cma_hdr->cma_version = CMA_VERSION; - cma_set_ip_ver(cma_hdr, 4); - cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; - cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; - cma_hdr->port = src4->sin_port; - break; + if (route->addr.src_addr.ss_family == AF_INET) { + struct sockaddr_in *src4, *dst4; + + src4 = (struct sockaddr_in *) &route->addr.src_addr; + dst4 = (struct sockaddr_in *) &route->addr.dst_addr; + + switch (ps) { + case RDMA_PS_SDP: + sdp_hdr = hdr; + if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) + return -EINVAL; + sdp_set_ip_ver(sdp_hdr, 4); + sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + sdp_hdr->port = src4->sin_port; + break; + default: + cma_hdr = hdr; + cma_hdr->cma_version = CMA_VERSION; + cma_set_ip_ver(cma_hdr, 4); + cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + cma_hdr->port = src4->sin_port; + break; + } + } else { + struct sockaddr_in6 *src6, *dst6; + + src6 = (struct sockaddr_in6 *) &route->addr.src_addr; + dst6 = (struct sockaddr_in6 *) &route->addr.dst_addr; + + switch (ps) { + case RDMA_PS_SDP: + sdp_hdr = hdr; + if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) + return -EINVAL; + sdp_set_ip_ver(sdp_hdr, 6); + sdp_hdr->src_addr.ip6 = src6->sin6_addr; + sdp_hdr->dst_addr.ip6 = dst6->sin6_addr; + sdp_hdr->port = src6->sin6_port; + break; + default: + cma_hdr = hdr; + cma_hdr->cma_version = CMA_VERSION; + cma_set_ip_ver(cma_hdr, 6); + cma_hdr->src_addr.ip6 = src6->sin6_addr; + cma_hdr->dst_addr.ip6 = dst6->sin6_addr; + cma_hdr->port = src6->sin6_port; + break; + } } + return 0; } -- 1.5.6 From john.russo at qlogic.com Mon Dec 1 07:29:03 2008 From: john.russo at qlogic.com (John Russo) Date: Mon, 1 Dec 2008 09:29:03 -0600 Subject: [ofa-general] net.ipv4.tcp_timestamps Message-ID: <99863D2ED484D449811D97A4C44C9CBDA6D3F3@EPEXCH2.qlogic.org> Does anyone know why this value is being altered by OFED? "net.ipv4.tcp_timestamps" is being set to 0 during OFED installation. Default value of this parameter is set to 1 on standard RHEL/SLES distros which OFED installation script modifies to 0. Also, when OFED is uninstalled, it does not reset these sysctl parameters to their original values. This parameter is specifically recommended to be turned ON for High performance network. This is a TCP option that can be used to calculate the Round Trip Measurement in a better and more accurate way than the retransmission timeout method can. Accurate value of retransmission timeout should be determined to avoid unnecessary retransmissions and hence to improve TCP performance. RFC 1323 talks about this TCP extension for High Performance. When the parameter net.ipv4.tcp_timestamps=1, then it adds extra 12 bytes into TCP header increasing its size. This has an obvious effect of decrease in bandwidth as we have some extra data flowing. Is this the reason why OFED turns it OFF to net.ipv4.tcp_timestamps=0. __________________________ John F. Russo Manager, Engineering QLogic Corporation 780 Fifth Avenue, Suite 140 King of Prussia, PA 19406 Direct: 610-233-4866 Main: 610-233-4800 Fax: 610-233-4777 Cell: 610-246-9903 Email: John.Russo at qlogic.com www.qlogic.com True success is the undeniable truth that we have proved ourselves. -Joe Luppino-Esposito -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 3677 bytes Desc: image001.jpg URL: From john.russo at qlogic.com Mon Dec 1 07:33:27 2008 From: john.russo at qlogic.com (John Russo) Date: Mon, 1 Dec 2008 09:33:27 -0600 Subject: [ofa-general] RESEND: net.ipv4.tcp_timestamps Message-ID: <99863D2ED484D449811D97A4C44C9CBDA6D3F5@EPEXCH2.qlogic.org> Does anyone know why this value is being altered by OFED? "net.ipv4.tcp_timestamps" is being set to 0 during OFED installation.  Default value of this parameter is set to 1 on standard RHEL/SLES distros which OFED installation script modifies to 0.  Also, when OFED is uninstalled, it does not reset these sysctl parameters to their original values. This parameter is specifically recommended to be turned ON for High performance network.  This is a TCP option that can be used to calculate the Round Trip Measurement in a better and more accurate way than the retransmission timeout method can.  Accurate value of retransmission timeout should be determined to avoid unnecessary retransmissions and hence to improve TCP performance.  RFC 1323 talks about this TCP extension for High Performance.  When the parameter net.ipv4.tcp_timestamps=1, then it adds extra 12 bytes into TCP header increasing its size.  This has an obvious effect of decrease in bandwidth as we have some extra data flowing.  Is this the reason why OFED turns it OFF to net.ipv4.tcp_timestamps=0. From tziporet at mellanox.co.il Mon Dec 1 08:21:17 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 1 Dec 2008 18:21:17 +0200 Subject: [ofa-general] Agenda for OFED meeting today (Dec 1) on OFED 1.4 Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD011A8DD1@mtlexch01.mtl.com> This is the Agenda for the OFED meeting today: 1. Bug list review: 1421 blo raisch at de.ibm.com ipoib/ehca crash when bouncing network 1413 cri jackm at mellanox.co.il rmmod mlx4_ib deadlock - on work 1384 maj eli at mellanox.co.il netperf latency small messages increase 5% - UD mode issue, seems caused by LRO 1386 maj eli at mellanox.co.il ofed 1.4 - iperf tcp connected mode BW large messages dec... - RC mode - not reproduced for now 1395 maj vu at mellanox.com kernel panic during SRP HA test - on work 1419 maj vlad at mellanox.co.il Iperf-2.0.4 fails: page allocation failure. order:5 1423 maj jackm at mellanox.co.il RDMA_Write RC small message bw degradation jitter of 6% - doing more measurements 2. UNH Logo testing update 3. Issue with new Intel MPI Benchmark (IMB, known as Pallas). In OFED 1.3 we used IMB 3.0 and in OFED 1.4 we upgraded the IMB to 3.1 version. During our performance verification we found 5% latency degradation with IMB 3.1 multi PingPong benchmark (we compare the result to IMB 3.0 version). Do we wish to go back to IMB 3.0? 4. Open discussion Tziporet From rdreier at cisco.com Mon Dec 1 10:16:10 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 01 Dec 2008 10:16:10 -0800 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get two fixes to the ehca driver for problems in patches added in the 2.6.28 cycle, and two fixes for mlx4, one for a regression introduced in the 2.6.28 cycle, and one for a resource leak that is serious enough and has a simple enough fix to target -stable. Jack Morgenstein (2): mlx4_core: Save/restore default port IB capability mask IB/mlx4: Fix MTT leakage in resize CQ Joachim Fenkes (1): IB/ehca: Change misleading error message on memory hotplug Roland Dreier (1): Merge branches 'ehca' and 'mlx4' into for-linus Stefan Roscher (1): IB/ehca: Fix problem with generated flush work completions drivers/infiniband/hw/ehca/ehca_classes.h | 4 ++- drivers/infiniband/hw/ehca/ehca_main.c | 3 +- drivers/infiniband/hw/ehca/ehca_qp.c | 26 +++++++++++--- drivers/infiniband/hw/ehca/ehca_reqs.c | 51 +++++++++++++++++------------ drivers/infiniband/hw/mlx4/cq.c | 5 +++ drivers/net/mlx4/main.c | 8 ++++ drivers/net/mlx4/mlx4.h | 1 + drivers/net/mlx4/port.c | 39 +++++++++++++++++++++- include/linux/mlx4/device.h | 1 + 9 files changed, 107 insertions(+), 31 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h index 4df887a..7fc35cf 100644 --- a/drivers/infiniband/hw/ehca/ehca_classes.h +++ b/drivers/infiniband/hw/ehca/ehca_classes.h @@ -163,7 +163,8 @@ struct ehca_mod_qp_parm { /* struct for tracking if cqes have been reported to the application */ struct ehca_qmap_entry { u16 app_wr_id; - u16 reported; + u8 reported; + u8 cqe_req; }; struct ehca_queue_map { @@ -171,6 +172,7 @@ struct ehca_queue_map { unsigned int entries; unsigned int tail; unsigned int left_to_poll; + unsigned int next_wqe_idx; /* Idx to first wqe to be flushed */ }; struct ehca_qp { diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index bb02a86..bec7e02 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -994,8 +994,7 @@ static int ehca_mem_notifier(struct notifier_block *nb, if (printk_timed_ratelimit(&ehca_dmem_warn_time, 30 * 1000)) ehca_gen_err("DMEM operations are not allowed" - "as long as an ehca adapter is" - "attached to the LPAR"); + "in conjunction with eHCA"); return NOTIFY_BAD; } } diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c index 9e05ee2..cadbf0c 100644 --- a/drivers/infiniband/hw/ehca/ehca_qp.c +++ b/drivers/infiniband/hw/ehca/ehca_qp.c @@ -435,9 +435,13 @@ static void reset_queue_map(struct ehca_queue_map *qmap) { int i; - qmap->tail = 0; - for (i = 0; i < qmap->entries; i++) + qmap->tail = qmap->entries - 1; + qmap->left_to_poll = 0; + qmap->next_wqe_idx = 0; + for (i = 0; i < qmap->entries; i++) { qmap->map[i].reported = 1; + qmap->map[i].cqe_req = 0; + } } /* @@ -1121,6 +1125,7 @@ static int calc_left_cqes(u64 wqe_p, struct ipz_queue *ipz_queue, void *wqe_v; u64 q_ofs; u32 wqe_idx; + unsigned int tail_idx; /* convert real to abs address */ wqe_p = wqe_p & (~(1UL << 63)); @@ -1133,12 +1138,17 @@ static int calc_left_cqes(u64 wqe_p, struct ipz_queue *ipz_queue, return -EFAULT; } + tail_idx = (qmap->tail + 1) % qmap->entries; wqe_idx = q_ofs / ipz_queue->qe_size; - if (wqe_idx < qmap->tail) - qmap->left_to_poll = (qmap->entries - qmap->tail) + wqe_idx; - else - qmap->left_to_poll = wqe_idx - qmap->tail; + /* check all processed wqes, whether a cqe is requested or not */ + while (tail_idx != wqe_idx) { + if (qmap->map[tail_idx].cqe_req) + qmap->left_to_poll++; + tail_idx = (tail_idx + 1) % qmap->entries; + } + /* save index in queue, where we have to start flushing */ + qmap->next_wqe_idx = wqe_idx; return 0; } @@ -1185,10 +1195,14 @@ static int check_for_left_cqes(struct ehca_qp *my_qp, struct ehca_shca *shca) } else { spin_lock_irqsave(&my_qp->send_cq->spinlock, flags); my_qp->sq_map.left_to_poll = 0; + my_qp->sq_map.next_wqe_idx = (my_qp->sq_map.tail + 1) % + my_qp->sq_map.entries; spin_unlock_irqrestore(&my_qp->send_cq->spinlock, flags); spin_lock_irqsave(&my_qp->recv_cq->spinlock, flags); my_qp->rq_map.left_to_poll = 0; + my_qp->rq_map.next_wqe_idx = (my_qp->rq_map.tail + 1) % + my_qp->rq_map.entries; spin_unlock_irqrestore(&my_qp->recv_cq->spinlock, flags); } diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c index 6492807..00a648f 100644 --- a/drivers/infiniband/hw/ehca/ehca_reqs.c +++ b/drivers/infiniband/hw/ehca/ehca_reqs.c @@ -179,6 +179,7 @@ static inline int ehca_write_swqe(struct ehca_qp *qp, qmap_entry->app_wr_id = get_app_wr_id(send_wr->wr_id); qmap_entry->reported = 0; + qmap_entry->cqe_req = 0; switch (send_wr->opcode) { case IB_WR_SEND: @@ -203,8 +204,10 @@ static inline int ehca_write_swqe(struct ehca_qp *qp, if ((send_wr->send_flags & IB_SEND_SIGNALED || qp->init_attr.sq_sig_type == IB_SIGNAL_ALL_WR) - && !hidden) + && !hidden) { wqe_p->wr_flag |= WQE_WRFLAG_REQ_SIGNAL_COM; + qmap_entry->cqe_req = 1; + } if (send_wr->opcode == IB_WR_SEND_WITH_IMM || send_wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM) { @@ -569,6 +572,7 @@ static int internal_post_recv(struct ehca_qp *my_qp, qmap_entry = &my_qp->rq_map.map[rq_map_idx]; qmap_entry->app_wr_id = get_app_wr_id(cur_recv_wr->wr_id); qmap_entry->reported = 0; + qmap_entry->cqe_req = 1; wqe_cnt++; } /* eof for cur_recv_wr */ @@ -706,27 +710,34 @@ repoll: goto repoll; wc->qp = &my_qp->ib_qp; + qmap_tail_idx = get_app_wr_id(cqe->work_request_id); + if (!(cqe->w_completion_flags & WC_SEND_RECEIVE_BIT)) + /* We got a send completion. */ + qmap = &my_qp->sq_map; + else + /* We got a receive completion. */ + qmap = &my_qp->rq_map; + + /* advance the tail pointer */ + qmap->tail = qmap_tail_idx; + if (is_error) { /* * set left_to_poll to 0 because in error state, we will not * get any additional CQEs */ - ehca_add_to_err_list(my_qp, 1); + my_qp->sq_map.next_wqe_idx = (my_qp->sq_map.tail + 1) % + my_qp->sq_map.entries; my_qp->sq_map.left_to_poll = 0; + ehca_add_to_err_list(my_qp, 1); + my_qp->rq_map.next_wqe_idx = (my_qp->rq_map.tail + 1) % + my_qp->rq_map.entries; + my_qp->rq_map.left_to_poll = 0; if (HAS_RQ(my_qp)) ehca_add_to_err_list(my_qp, 0); - my_qp->rq_map.left_to_poll = 0; } - qmap_tail_idx = get_app_wr_id(cqe->work_request_id); - if (!(cqe->w_completion_flags & WC_SEND_RECEIVE_BIT)) - /* We got a send completion. */ - qmap = &my_qp->sq_map; - else - /* We got a receive completion. */ - qmap = &my_qp->rq_map; - qmap_entry = &qmap->map[qmap_tail_idx]; if (qmap_entry->reported) { ehca_warn(cq->device, "Double cqe on qp_num=%#x", @@ -738,10 +749,6 @@ repoll: wc->wr_id = replace_wr_id(cqe->work_request_id, qmap_entry->app_wr_id); qmap_entry->reported = 1; - /* this is a proper completion, we need to advance the tail pointer */ - if (++qmap->tail == qmap->entries) - qmap->tail = 0; - /* if left_to_poll is decremented to 0, add the QP to the error list */ if (qmap->left_to_poll > 0) { qmap->left_to_poll--; @@ -805,13 +812,14 @@ static int generate_flush_cqes(struct ehca_qp *my_qp, struct ib_cq *cq, else qmap = &my_qp->rq_map; - qmap_entry = &qmap->map[qmap->tail]; + qmap_entry = &qmap->map[qmap->next_wqe_idx]; while ((nr < num_entries) && (qmap_entry->reported == 0)) { /* generate flush CQE */ + memset(wc, 0, sizeof(*wc)); - offset = qmap->tail * ipz_queue->qe_size; + offset = qmap->next_wqe_idx * ipz_queue->qe_size; wqe = (struct ehca_wqe *)ipz_qeit_calc(ipz_queue, offset); if (!wqe) { ehca_err(cq->device, "Invalid wqe offset=%#lx on " @@ -850,11 +858,12 @@ static int generate_flush_cqes(struct ehca_qp *my_qp, struct ib_cq *cq, wc->qp = &my_qp->ib_qp; - /* mark as reported and advance tail pointer */ + /* mark as reported and advance next_wqe pointer */ qmap_entry->reported = 1; - if (++qmap->tail == qmap->entries) - qmap->tail = 0; - qmap_entry = &qmap->map[qmap->tail]; + qmap->next_wqe_idx++; + if (qmap->next_wqe_idx == qmap->entries) + qmap->next_wqe_idx = 0; + qmap_entry = &qmap->map[qmap->next_wqe_idx]; wc++; nr++; } diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index d0866a3..1830849 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -343,6 +343,7 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) { struct mlx4_ib_dev *dev = to_mdev(ibcq->device); struct mlx4_ib_cq *cq = to_mcq(ibcq); + struct mlx4_mtt mtt; int outst_cqe; int err; @@ -376,10 +377,13 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) goto out; } + mtt = cq->buf.mtt; + err = mlx4_cq_resize(dev->dev, &cq->mcq, entries, &cq->resize_buf->buf.mtt); if (err) goto err_buf; + mlx4_mtt_cleanup(dev->dev, &mtt); if (ibcq->uobject) { cq->buf = cq->resize_buf->buf; cq->ibcq.cqe = cq->resize_buf->cqe; @@ -406,6 +410,7 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) goto out; err_buf: + mlx4_mtt_cleanup(dev->dev, &cq->resize_buf->buf.mtt); if (!ibcq->uobject) mlx4_ib_free_cq_buf(dev, &cq->resize_buf->buf, cq->resize_buf->cqe); diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 468921b..90a0281 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -753,6 +753,7 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) struct mlx4_priv *priv = mlx4_priv(dev); int err; int port; + __be32 ib_port_default_caps; err = mlx4_init_uar_table(dev); if (err) { @@ -852,6 +853,13 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) } for (port = 1; port <= dev->caps.num_ports; port++) { + ib_port_default_caps = 0; + err = mlx4_get_port_ib_caps(dev, port, &ib_port_default_caps); + if (err) + mlx4_warn(dev, "failed to get port %d default " + "ib capabilities (%d). Continuing with " + "caps = 0\n", port, err); + dev->caps.ib_port_def_cap[port] = ib_port_default_caps; err = mlx4_SET_PORT(dev, port); if (err) { mlx4_err(dev, "Failed to set port %d, aborting\n", diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 56a2e21..34c909d 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -385,5 +385,6 @@ void mlx4_init_mac_table(struct mlx4_dev *dev, struct mlx4_mac_table *table); void mlx4_init_vlan_table(struct mlx4_dev *dev, struct mlx4_vlan_table *table); int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port); +int mlx4_get_port_ib_caps(struct mlx4_dev *dev, u8 port, __be32 *caps); #endif /* MLX4_H */ diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c index e2fdab4..0a057e5 100644 --- a/drivers/net/mlx4/port.c +++ b/drivers/net/mlx4/port.c @@ -258,6 +258,42 @@ out: } EXPORT_SYMBOL_GPL(mlx4_unregister_vlan); +int mlx4_get_port_ib_caps(struct mlx4_dev *dev, u8 port, __be32 *caps) +{ + struct mlx4_cmd_mailbox *inmailbox, *outmailbox; + u8 *inbuf, *outbuf; + int err; + + inmailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(inmailbox)) + return PTR_ERR(inmailbox); + + outmailbox = mlx4_alloc_cmd_mailbox(dev); + if (IS_ERR(outmailbox)) { + mlx4_free_cmd_mailbox(dev, inmailbox); + return PTR_ERR(outmailbox); + } + + inbuf = inmailbox->buf; + outbuf = outmailbox->buf; + memset(inbuf, 0, 256); + memset(outbuf, 0, 256); + inbuf[0] = 1; + inbuf[1] = 1; + inbuf[2] = 1; + inbuf[3] = 1; + *(__be16 *) (&inbuf[16]) = cpu_to_be16(0x0015); + *(__be32 *) (&inbuf[20]) = cpu_to_be32(port); + + err = mlx4_cmd_box(dev, inmailbox->dma, outmailbox->dma, port, 3, + MLX4_CMD_MAD_IFC, MLX4_CMD_TIME_CLASS_C); + if (!err) + *caps = *(__be32 *) (outbuf + 84); + mlx4_free_cmd_mailbox(dev, inmailbox); + mlx4_free_cmd_mailbox(dev, outmailbox); + return err; +} + int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port) { struct mlx4_cmd_mailbox *mailbox; @@ -273,7 +309,8 @@ int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port) ((u8 *) mailbox->buf)[3] = 6; ((__be16 *) mailbox->buf)[4] = cpu_to_be16(1 << 15); ((__be16 *) mailbox->buf)[6] = cpu_to_be16(1 << 15); - } + } else + ((__be32 *) mailbox->buf)[1] = dev->caps.ib_port_def_cap[port]; err = mlx4_cmd(dev, mailbox->dma, port, is_eth, MLX4_CMD_SET_PORT, MLX4_CMD_TIME_CLASS_B); diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index bd9977b..371086f 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -179,6 +179,7 @@ struct mlx4_caps { int num_ports; int vl_cap[MLX4_MAX_PORTS + 1]; int ib_mtu_cap[MLX4_MAX_PORTS + 1]; + __be32 ib_port_def_cap[MLX4_MAX_PORTS + 1]; u64 def_mac[MLX4_MAX_PORTS + 1]; int eth_mtu_cap[MLX4_MAX_PORTS + 1]; int gid_table_len[MLX4_MAX_PORTS + 1]; From Jeffrey.C.Becker at nasa.gov Mon Dec 1 10:21:46 2008 From: Jeffrey.C.Becker at nasa.gov (Jeff Becker) Date: Mon, 01 Dec 2008 10:21:46 -0800 Subject: [ofa-general] GitWeb really slow In-Reply-To: <49338E23.9000600@ext.bull.net> References: <49338E23.9000600@ext.bull.net> Message-ID: <49342B3A.4060100@nasa.gov> Hi Nicolas Nicolas Morey Chaisemartin wrote: > Hi, > > This is not necessary the best place to post it but I was wondering > why is ofed's gitweb so slow on the main page? > It takes only a few seconds to display all the repository on > kernel.org (and there's a lot more) but it takes nearly a minute to > display the OFED git main page... I have noticed this as well. OFA will be moving to a new server after OFED 1.4 releases (soon). We will make sure the problem does not occur there. Please bear with us for now. Thanks. Jeff Becker OFA server admin > > I know it's probably not the most critical issue you have to work on > but I connect quite often on this page and it starts to be really > bugging me. And I'm probably not the only one ;) > > Thanks in advance > > > Nicolas Morey-Chaisemartin > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From halr at obsidianresearch.com Mon Dec 1 10:43:47 2008 From: halr at obsidianresearch.com (Hal Rosenstock) Date: Mon, 01 Dec 2008 11:43:47 -0700 Subject: [ofa-general] [PATCH] opensm/osm_prefix_route.h: prefix and guid are in network rather than host endian order Message-ID: <49343063.5030200@obsidianresearch.com> Sasha, Minor patch to declare prefix and guid in network rather than host endian order. -- Hal -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: patch-osm-prefix2 URL: From sean.hefty at intel.com Mon Dec 1 11:38:29 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 1 Dec 2008 11:38:29 -0800 Subject: [ofa-general] RE: [RMDA CM IPv6 support. PATCHv5 1/2]IPv6 IB addr resolution In-Reply-To: <1228137404.3852.23.camel@alst60.voltaire.com> References: <1228137404.3852.23.camel@alst60.voltaire.com> Message-ID: <000001c953ec$633e8a00$65cc180a@amr.corp.intel.com> >-----Original Message----- >From: Aleksey Senin [mailto:alekseys at voltaire.com] >Sent: Monday, December 01, 2008 5:17 AM >To: general at lists.openfabrics.org >Cc: Hefty, Sean; Olga Shern >Subject: [RMDA CM IPv6 support. PATCHv5 1/2]IPv6 IB addr resolution > >Set from two patches for support IPv6 protocol in RMDA CM > > >Changes: >Use two patches instead of six. > >This patch adds IPv6 support for IB address resolution. > > >>From da89c2a9ce0b2309362208f167fa2582f2f0d929 Mon Sep 17 00:00:00 2001 >From: Aleksey Senin >Date: Mon, 1 Dec 2008 13:55:01 +0200 >Subject: [PATCH] IB addr IPv6 support > >Support for network discovery in addr_send_arp function. >Local IPv6 address resolution. >Added remote IPv6 address resolusion for RDMA CM. >Function addr_resolve_remote used as wrapper for two other functions: > addr4_resolve_remote ( original addr_resolve_remote ) > addr6_resolve_remote ( new function ) > >Signed-off-by: Aleksey Senin Thanks for adding this. Acked-by: Sean Hefty with one nit... > static void process_req(struct work_struct *work) > { > struct addr_req *req, *temp_req; >@@ -279,29 +336,58 @@ static int addr_resolve_local(struct sockaddr *src_in, > struct rdma_dev_addr *addr) > { > struct net_device *dev; >- __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; >- __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; >- int ret; >+ int ret = -EADDRNOTAVAIL; Is this assignment needed/used? > >- dev = ip_dev_find(&init_net, dst_ip); >- if (!dev) >- return -EADDRNOTAVAIL; >+ if (dst_in->sa_family == AF_INET) { >+ __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; >+ __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; >+ >+ dev = ip_dev_find(&init_net, dst_ip); >+ if (!dev) >+ return -EADDRNOTAVAIL; > >- if (ipv4_is_zeronet(src_ip)) { >- src_in->sa_family = dst_in->sa_family; >- ((struct sockaddr_in *)src_in)->sin_addr.s_addr = dst_ip; >- ret = rdma_copy_addr(addr, dev, dev->dev_addr); >- } else if (ipv4_is_loopback(src_ip)) { >- ret = rdma_translate_ip(dst_in, addr); >- if (!ret) >- memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); >+ if (ipv4_is_zeronet(src_ip)) { >+ src_in->sa_family = dst_in->sa_family; >+ ((struct sockaddr_in *)src_in)->sin_addr.s_addr = dst_ip; >+ ret = rdma_copy_addr(addr, dev, dev->dev_addr); >+ } else if (ipv4_is_loopback(src_ip)) { >+ ret = rdma_translate_ip(dst_in, addr); >+ if (!ret) >+ memcpy(addr->dst_dev_addr, dev->dev_addr, >MAX_ADDR_LEN); >+ } else { >+ ret = rdma_translate_ip(src_in, addr); >+ if (!ret) >+ memcpy(addr->dst_dev_addr, dev->dev_addr, >MAX_ADDR_LEN); >+ } >+ dev_put(dev); > } else { >- ret = rdma_translate_ip(src_in, addr); >- if (!ret) >- memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); >+ struct in6_addr *a = &((struct sockaddr_in6 *)dst_in)->sin6_addr; >+ >+ for_each_netdev(&init_net, dev) >+ if (ipv6_chk_addr(&init_net, &((struct sockaddr_in6 *) addr)- >>sin6_addr, dev, 1)) >+ break; >+ >+ if (!dev) >+ return -EADDRNOTAVAIL; >+ >+ a = &((struct sockaddr_in6 *)src_in)->sin6_addr; >+ >+ if (ipv6_addr_any(a)) { >+ src_in->sa_family = dst_in->sa_family; >+ ((struct sockaddr_in6 *)src_in)->sin6_addr = >+ ((struct sockaddr_in6 *)dst_in)->sin6_addr; >+ ret = rdma_copy_addr(addr, dev, dev->dev_addr); >+ } else if (ipv6_addr_loopback(a)) { >+ ret = rdma_translate_ip(dst_in, addr); >+ if (!ret) >+ memcpy(addr->dst_dev_addr, dev->dev_addr, >MAX_ADDR_LEN); >+ } else { >+ ret = rdma_translate_ip(src_in, addr); >+ if (!ret) >+ memcpy(addr->dst_dev_addr, dev->dev_addr, >MAX_ADDR_LEN); >+ } > } > >- dev_put(dev); > return ret; > } From sean.hefty at intel.com Mon Dec 1 11:52:57 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 1 Dec 2008 11:52:57 -0800 Subject: [ofa-general] [RMDA CM IPv6 support. PATCHv5 2/2] RDMA CM support In-Reply-To: <1228137806.17485.1.camel@alst60.voltaire.com> References: <1228137404.3852.23.camel@alst60.voltaire.com> <1228137806.17485.1.camel@alst60.voltaire.com> Message-ID: <000101c953ee$68a3fff0$65cc180a@amr.corp.intel.com> >@@ -2073,7 +2073,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr >*addr) > struct rdma_id_private *id_priv; > int ret; > >- if (addr->sa_family != AF_INET) >+ if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6) > return -EAFNOSUPPORT; > > id_priv = container_of(id, struct rdma_id_private, id); I missed this before, but I think we also need to change cma_loopback_addr() (called from rdma_bind_addr() through cma_any_addr()) to handle IPv6. - Sean From sashak at voltaire.com Mon Dec 1 12:51:48 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 1 Dec 2008 22:51:48 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_prefix_route.h: prefix and guid are in network rather than host endian order In-Reply-To: <49343063.5030200@obsidianresearch.com> References: <49343063.5030200@obsidianresearch.com> Message-ID: <20081201205148.GE6183@sashak.voltaire.com> On 11:43 Mon 01 Dec , Hal Rosenstock wrote: > Sasha, > > Minor patch to declare prefix and guid in network rather than host endian > order. > > -- Hal > osm_prefix_route.h: prefix and guid are in network rather than host endian order > > Signed-off-by: Hal Rosenstock Applied. Thanks. Sasha From twbowman at gmail.com Mon Dec 1 15:28:43 2008 From: twbowman at gmail.com (Todd Bowman) Date: Mon, 1 Dec 2008 16:28:43 -0700 Subject: [ofa-general] ***SPAM*** git http url for ofed_1_3/management.git Message-ID: I am trying to download the ofed_1_3/mangement.git using http. It seems that I do not have the right url I have used the following git clone http://git.openfabrics.org/ofed_1_3/mangement.git git clone "http://www.openfabrics.org/git/?p=ofed_1_3/management.git;a=tree" neither have been succesful. Can someone point me to the right url. I need to use http due to firewall, and I have set proxy with the env variable http_proxy. Thanks, Todd -------------- next part -------------- An HTML attachment was scrubbed... URL: From hal.rosenstock at gmail.com Mon Dec 1 16:05:56 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 1 Dec 2008 19:05:56 -0500 Subject: [ofa-general] ***SPAM*** git http url for ofed_1_3/management.git In-Reply-To: References: Message-ID: On Mon, Dec 1, 2008 at 6:28 PM, Todd Bowman wrote: > I am trying to download the ofed_1_3/mangement.git using http. It seems > that I do not have the right url > > I have used the following > > git clone http://git.openfabrics.org/ofed_1_3/mangement.git git://git.openfabrics.org/ofed_1_3/management.git > git clone "http://www.openfabrics.org/git/?p=ofed_1_3/management.git;a=tree" > > neither have been succesful. Can someone point me to the right url. > > I need to use http due to firewall, and I have set proxy with the env > variable http_proxy. > > Thanks, > Todd > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From vlad at lists.openfabrics.org Tue Dec 2 03:25:37 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 2 Dec 2008 03:25:37 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081202-0200 daily build status Message-ID: <20081202112537.C1BBFE60C68@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From nicolas.morey-chaisemartin at ext.bull.net Tue Dec 2 03:37:04 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Tue, 02 Dec 2008 12:37:04 +0100 Subject: [ofa-general] [PATCH] SDP: Fix to limit max buffer size in sdp_resize_buffers on IA64 Message-ID: <49351DE0.9050201@ext.bull.net> Fix for bug 1311 https://bugs.openfabrics.org/show_bug.cgi?id=1311 Signed-off-by: Nicolas Morey-Chaisemartin --- drivers/infiniband/ulp/sdp/sdp_bcopy.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) -------------- next part -------------- A non-text attachment was scrubbed... Name: 202423e4ee7bf44fafdb5e0122097448d4217e71.diff Type: text/x-patch Size: 947 bytes Desc: not available URL: From alekseys at voltaire.com Tue Dec 2 04:44:51 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Tue, 02 Dec 2008 14:44:51 +0200 Subject: [ofa-general] [PATCHv6 RDMA CM IPv6 1/2] IB address changes Message-ID: <1228221891.14862.3.camel@alst60.voltaire.com> >From 9a2a1420a9644508fffb744d4fc9436476049d69 Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Mon, 1 Dec 2008 13:55:01 +0200 Subject: [PATCH] IB addr IPv6 support Support for network discovery in addr_send_arp function. Local IPv6 address resolution. Added remote IPv6 address resolusion for RDMA CM. Function addr_resolve_remote used as wrapper for two other functions: addr4_resolve_remote ( original addr_resolve_remote ) addr6_resolve_remote ( new function ) Signed-off-by: Aleksey Senin :100644 100644 f95d21f... d9170c6... M drivers/infiniband/core/addr.c :100644 100644 f95d21f... bc1a97b... M drivers/infiniband/core/addr.c diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index f95d21f..bc1a97b 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -43,6 +43,7 @@ #include #include #include +#include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("IB Address Translation"); @@ -172,27 +173,42 @@ static void queue_req(struct addr_req *req) mutex_unlock(&lock); } -static void addr_send_arp(struct sockaddr_in *dst_in) +static void addr_send_arp(struct sockaddr *dst_in) { struct rtable *rt; struct flowi fl; - __be32 dst_ip = dst_in->sin_addr.s_addr; + struct dst_entry *dst; memset(&fl, 0, sizeof fl); - fl.nl_u.ip4_u.daddr = dst_ip; - if (ip_route_output_key(&init_net, &rt, &fl)) - return; + if (dst_in->sa_family == AF_INET) { + fl.nl_u.ip4_u.daddr = + ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; - neigh_event_send(rt->u.dst.neighbour, NULL); - ip_rt_put(rt); + if (ip_route_output_key(&init_net, &rt, &fl)) + return; + + neigh_event_send(rt->u.dst.neighbour, NULL); + ip_rt_put(rt); + + } else { + fl.nl_u.ip6_u.daddr = + ((struct sockaddr_in6 *)dst_in)->sin6_addr; + + dst = ip6_route_output(&init_net, NULL, &fl); + if (!dst) + return; + + neigh_event_send(dst->neighbour, NULL); + dst_release(dst); + } } -static int addr_resolve_remote(struct sockaddr *src_in, - struct sockaddr *dst_in, +static int addr4_resolve_remote(struct sockaddr_in *src_in, + struct sockaddr_in *dst_in, struct rdma_dev_addr *addr) { - __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; - __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; + __be32 src_ip = src_in->sin_addr.s_addr; + __be32 dst_ip = dst_in->sin_addr.s_addr; struct flowi fl; struct rtable *rt; struct neighbour *neigh; @@ -223,8 +239,8 @@ static int addr_resolve_remote(struct sockaddr *src_in, } if (!src_ip) { - src_in->sa_family = dst_in->sa_family; - ((struct sockaddr_in *)src_in)->sin_addr.s_addr = rt->rt_src; + src_in->sin_family = dst_in->sin_family; + src_in->sin_addr.s_addr = rt->rt_src; } ret = rdma_copy_addr(addr, neigh->dev, neigh->ha); @@ -236,6 +252,47 @@ out: return ret; } +static int addr6_resolve_remote(struct sockaddr_in6 *src_in, + struct sockaddr_in6 *dst_in, + struct rdma_dev_addr *addr) +{ + struct flowi fl; + struct neighbour *neigh; + struct dst_entry *dst; + int ret = -ENODATA; + + memset(&fl, 0, sizeof fl); + fl.nl_u.ip6_u.daddr = dst_in->sin6_addr; + fl.nl_u.ip6_u.saddr = src_in->sin6_addr; + + dst = ip6_route_output(&init_net, NULL, &fl); + if (!dst) + return ret; + + if (dst->dev->flags & IFF_NOARP) { + ret = rdma_copy_addr(addr, dst->dev, NULL); + } else { + neigh = dst->neighbour; + if (neigh && (neigh->nud_state & NUD_VALID)) + ret = rdma_copy_addr(addr, neigh->dev, neigh->ha); + } + + dst_release(dst); + return ret; +} + +static int addr_resolve_remote(struct sockaddr *src_in, + struct sockaddr *dst_in, + struct rdma_dev_addr *addr) +{ + if (src_in->sa_family == AF_INET) { + return addr4_resolve_remote((struct sockaddr_in *)src_in, + (struct sockaddr_in *)dst_in, addr); + } else + return addr6_resolve_remote((struct sockaddr_in6 *)src_in, + (struct sockaddr_in6 *)dst_in, addr); +} + static void process_req(struct work_struct *work) { struct addr_req *req, *temp_req; @@ -279,29 +336,58 @@ static int addr_resolve_local(struct sockaddr *src_in, struct rdma_dev_addr *addr) { struct net_device *dev; - __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; - __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; int ret; - dev = ip_dev_find(&init_net, dst_ip); - if (!dev) - return -EADDRNOTAVAIL; + if (dst_in->sa_family == AF_INET) { + __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; + __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; - if (ipv4_is_zeronet(src_ip)) { - src_in->sa_family = dst_in->sa_family; - ((struct sockaddr_in *)src_in)->sin_addr.s_addr = dst_ip; - ret = rdma_copy_addr(addr, dev, dev->dev_addr); - } else if (ipv4_is_loopback(src_ip)) { - ret = rdma_translate_ip(dst_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + dev = ip_dev_find(&init_net, dst_ip); + if (!dev) + return -EADDRNOTAVAIL; + + if (ipv4_is_zeronet(src_ip)) { + src_in->sa_family = dst_in->sa_family; + ((struct sockaddr_in *)src_in)->sin_addr.s_addr = dst_ip; + ret = rdma_copy_addr(addr, dev, dev->dev_addr); + } else if (ipv4_is_loopback(src_ip)) { + ret = rdma_translate_ip(dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } else { + ret = rdma_translate_ip(src_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } + dev_put(dev); } else { - ret = rdma_translate_ip(src_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + struct in6_addr *a = &((struct sockaddr_in6 *)dst_in)->sin6_addr; + + for_each_netdev(&init_net, dev) + if (ipv6_chk_addr(&init_net, &((struct sockaddr_in6 *) addr)->sin6_addr, dev, 1)) + break; + + if (!dev) + return -EADDRNOTAVAIL; + + a = &((struct sockaddr_in6 *)src_in)->sin6_addr; + + if (ipv6_addr_any(a)) { + src_in->sa_family = dst_in->sa_family; + ((struct sockaddr_in6 *)src_in)->sin6_addr = + ((struct sockaddr_in6 *)dst_in)->sin6_addr; + ret = rdma_copy_addr(addr, dev, dev->dev_addr); + } else if (ipv6_addr_loopback(a)) { + ret = rdma_translate_ip(dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } else { + ret = rdma_translate_ip(src_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } } - dev_put(dev); return ret; } @@ -344,7 +430,7 @@ int rdma_resolve_ip(struct rdma_addr_client *client, case -ENODATA: req->timeout = msecs_to_jiffies(timeout_ms) + jiffies; queue_req(req); - addr_send_arp((struct sockaddr_in *)dst_in); + addr_send_arp(dst_in); break; default: ret = req->status; -- 1.5.6 From alekseys at voltaire.com Tue Dec 2 04:49:23 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Tue, 02 Dec 2008 14:49:23 +0200 Subject: [ofa-general] [PATCHv6 RDMA CM IPv6 2/2] RDMA CM In-Reply-To: <1228221891.14862.3.camel@alst60.voltaire.com> References: <1228221891.14862.3.camel@alst60.voltaire.com> Message-ID: <1228222163.14862.8.camel@alst60.voltaire.com> >From 50e55a911bed9cae1e0130b5e5a279f27248b75c Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Mon, 1 Dec 2008 14:01:10 +0200 Subject: [PATCH] RDMA CM IPv6 support AF_INET6 case in rdma_bind_addr added AF_INET6 support in cma_format_hdr function AF_INET6 support when checking loopback address Use sockaddr_storage structure in cma_bind_any function Signed-off-by: Aleksey Senin :100644 100644 d951896... df22c5c... M drivers/infiniband/core/cma.c :100644 100644 d951896... 35775cf... M drivers/infiniband/core/cma.c diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d951896..35775cf 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -636,7 +636,12 @@ static inline int cma_zero_addr(struct sockaddr *addr) static inline int cma_loopback_addr(struct sockaddr *addr) { - return ipv4_is_loopback(((struct sockaddr_in *) addr)->sin_addr.s_addr); + if (addr->sa_family == AF_INET) + return ipv4_is_loopback( + ((struct sockaddr_in *) addr)->sin_addr.s_addr); + else + return ipv6_addr_loopback( + &((struct sockaddr_in6 *) addr)->sin6_addr); } static inline int cma_any_addr(struct sockaddr *addr) @@ -1467,10 +1472,10 @@ static void cma_listen_on_all(struct rdma_id_private *id_priv) static int cma_bind_any(struct rdma_cm_id *id, sa_family_t af) { - struct sockaddr_in addr_in; + struct sockaddr_storage addr_in; memset(&addr_in, 0, sizeof addr_in); - addr_in.sin_family = af; + addr_in.ss_family = af; return rdma_bind_addr(id, (struct sockaddr *) &addr_in); } @@ -2073,7 +2078,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) struct rdma_id_private *id_priv; int ret; - if (addr->sa_family != AF_INET) + if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6) return -EAFNOSUPPORT; id_priv = container_of(id, struct rdma_id_private, id); @@ -2113,32 +2118,61 @@ EXPORT_SYMBOL(rdma_bind_addr); static int cma_format_hdr(void *hdr, enum rdma_port_space ps, struct rdma_route *route) { - struct sockaddr_in *src4, *dst4; struct cma_hdr *cma_hdr; struct sdp_hh *sdp_hdr; - src4 = (struct sockaddr_in *) &route->addr.src_addr; - dst4 = (struct sockaddr_in *) &route->addr.dst_addr; - - switch (ps) { - case RDMA_PS_SDP: - sdp_hdr = hdr; - if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) - return -EINVAL; - sdp_set_ip_ver(sdp_hdr, 4); - sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; - sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; - sdp_hdr->port = src4->sin_port; - break; - default: - cma_hdr = hdr; - cma_hdr->cma_version = CMA_VERSION; - cma_set_ip_ver(cma_hdr, 4); - cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; - cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; - cma_hdr->port = src4->sin_port; - break; + if (route->addr.src_addr.ss_family == AF_INET) { + struct sockaddr_in *src4, *dst4; + + src4 = (struct sockaddr_in *) &route->addr.src_addr; + dst4 = (struct sockaddr_in *) &route->addr.dst_addr; + + switch (ps) { + case RDMA_PS_SDP: + sdp_hdr = hdr; + if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) + return -EINVAL; + sdp_set_ip_ver(sdp_hdr, 4); + sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + sdp_hdr->port = src4->sin_port; + break; + default: + cma_hdr = hdr; + cma_hdr->cma_version = CMA_VERSION; + cma_set_ip_ver(cma_hdr, 4); + cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + cma_hdr->port = src4->sin_port; + break; + } + } else { + struct sockaddr_in6 *src6, *dst6; + + src6 = (struct sockaddr_in6 *) &route->addr.src_addr; + dst6 = (struct sockaddr_in6 *) &route->addr.dst_addr; + + switch (ps) { + case RDMA_PS_SDP: + sdp_hdr = hdr; + if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) + return -EINVAL; + sdp_set_ip_ver(sdp_hdr, 6); + sdp_hdr->src_addr.ip6 = src6->sin6_addr; + sdp_hdr->dst_addr.ip6 = dst6->sin6_addr; + sdp_hdr->port = src6->sin6_port; + break; + default: + cma_hdr = hdr; + cma_hdr->cma_version = CMA_VERSION; + cma_set_ip_ver(cma_hdr, 6); + cma_hdr->src_addr.ip6 = src6->sin6_addr; + cma_hdr->dst_addr.ip6 = dst6->sin6_addr; + cma_hdr->port = src6->sin6_port; + break; + } } + return 0; } -- 1.5.6 From alekseys at voltaire.com Tue Dec 2 04:58:00 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Tue, 02 Dec 2008 14:58:00 +0200 Subject: [ofa-general] [PATCH] cma_zero_addr Message-ID: <1228222680.14862.13.camel@alst60.voltaire.com> This is not really bug fix, but more cosmetic change. There is ipv6_addr_any function in the kernel, so I propose to use it instead of the same code duplicated in cma_zero_addr. >From affd57c591799101f89b552973624b39e433e217 Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Tue, 2 Dec 2008 14:52:27 +0200 Subject: [PATCH] cma_zero_addr optimized Using builtin kernel function to check zero address Signed-off-by: Aleksey Senin :100644 100644 f69dda4... 17864d6... M drivers/infiniband/core/cma.c diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index f69dda4..17864d6 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -623,16 +623,12 @@ EXPORT_SYMBOL(rdma_init_qp_attr); static inline int cma_zero_addr(struct sockaddr *addr) { - struct in6_addr *ip6; - if (addr->sa_family == AF_INET) return ipv4_is_zeronet( ((struct sockaddr_in *)addr)->sin_addr.s_addr); - else { - ip6 = &((struct sockaddr_in6 *) addr)->sin6_addr; - return (ip6->s6_addr32[0] | ip6->s6_addr32[1] | - ip6->s6_addr32[2] | ip6->s6_addr32[3]) == 0; - } + else + return ipv6_addr_any( + &((struct sockaddr_in6 *)addr)->sin6_addr); } static inline int cma_loopback_addr(struct sockaddr *addr) -- 1.5.6 From twbowman at gmail.com Tue Dec 2 06:43:21 2008 From: twbowman at gmail.com (Todd Bowman) Date: Tue, 2 Dec 2008 07:43:21 -0700 Subject: [ofa-general] ***SPAM*** git http url for ofed_1_3/management.git In-Reply-To: References: Message-ID: Thanks for the reply. Our firewall blocks the git protocol so I can't use the git url. I need to use the http url for git. Todd On Tue, Dec 2, 2008 at 7:33 AM, Hal Rosenstock wrote: > Todd, > > On Tue, Dec 2, 2008 at 9:13 AM, Todd Bowman wrote: > > Hal, > > > > You sent a reply but I don't see any new text. > > The url is git://git.openfabrics.org/ofed_1_3/management.git > so something like > git clone git://git.openfabrics.org/ofed_1_3/management.git management > > I think that there are some additional fixes which may not be there > but are in Sasha's tree (ofed 1.3 branch) if that matters. > > -- Hal > > > > > Todd > > > > On Mon, Dec 1, 2008 at 5:05 PM, Hal Rosenstock > > > wrote: > >> > >> On Mon, Dec 1, 2008 at 6:28 PM, Todd Bowman wrote: > >> > I am trying to download the ofed_1_3/mangement.git using http. It > seems > >> > that I do not have the right url > >> > > >> > I have used the following > >> > > >> > git clone http://git.openfabrics.org/ofed_1_3/mangement.git > >> git://git.openfabrics.org/ofed_1_3/management.git > >> > >> > git clone > >> > "http://www.openfabrics.org/git/?p=ofed_1_3/management.git;a=tree" > >> > > >> > neither have been succesful. Can someone point me to the right url. > >> > > >> > I need to use http due to firewall, and I have set proxy with the env > >> > variable http_proxy. > >> > > >> > Thanks, > >> > Todd > >> > _______________________________________________ > >> > general mailing list > >> > general at lists.openfabrics.org > >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > > >> > To unsubscribe, please visit > >> > http://openib.org/mailman/listinfo/openib-general > >> > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hal.rosenstock at gmail.com Tue Dec 2 07:15:14 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 2 Dec 2008 10:15:14 -0500 Subject: [ofa-general] ***SPAM*** git http url for ofed_1_3/management.git In-Reply-To: References: Message-ID: On Tue, Dec 2, 2008 at 9:43 AM, Todd Bowman wrote: > Thanks for the reply. Our firewall blocks the git protocol so I can't use > the git url. I need to use the http url for git. I'm not sure about whether http/https is supported for this on the OFA server. Another alternative would be to ssh (if you have an account on there). Not sure how others whose firewall blocks git deal with/get around this. -- Hal > Todd > On Tue, Dec 2, 2008 at 7:33 AM, Hal Rosenstock > wrote: >> >> Todd, >> >> On Tue, Dec 2, 2008 at 9:13 AM, Todd Bowman wrote: >> > Hal, >> > >> > You sent a reply but I don't see any new text. >> >> The url is git://git.openfabrics.org/ofed_1_3/management.git >> so something like >> git clone git://git.openfabrics.org/ofed_1_3/management.git management >> >> I think that there are some additional fixes which may not be there >> but are in Sasha's tree (ofed 1.3 branch) if that matters. >> >> -- Hal >> >> > >> > Todd >> > >> > On Mon, Dec 1, 2008 at 5:05 PM, Hal Rosenstock >> > >> > wrote: >> >> >> >> On Mon, Dec 1, 2008 at 6:28 PM, Todd Bowman wrote: >> >> > I am trying to download the ofed_1_3/mangement.git using http. It >> >> > seems >> >> > that I do not have the right url >> >> > >> >> > I have used the following >> >> > >> >> > git clone http://git.openfabrics.org/ofed_1_3/mangement.git >> >> git://git.openfabrics.org/ofed_1_3/management.git >> >> >> >> > git clone >> >> > "http://www.openfabrics.org/git/?p=ofed_1_3/management.git;a=tree" >> >> > >> >> > neither have been succesful. Can someone point me to the right url. >> >> > >> >> > I need to use http due to firewall, and I have set proxy with the env >> >> > variable http_proxy. >> >> > >> >> > Thanks, >> >> > Todd >> >> > _______________________________________________ >> >> > general mailing list >> >> > general at lists.openfabrics.org >> >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> > >> >> > To unsubscribe, please visit >> >> > http://openib.org/mailman/listinfo/openib-general >> >> > >> > >> > > > From tziporet at mellanox.co.il Tue Dec 2 07:29:49 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 2 Dec 2008 17:29:49 +0200 Subject: [ofa-general] OFED Dec 1, 2008 meeting minutes on OFED 1.4 release status Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD011A9617@mtlexch01.mtl.com> OFED Dec 1, 2008 meeting minutes on OFED 1.4 release status =============================================== Meeting minutes on the web: http://www.openfabrics.org/txt/documentation/linux/EWG_meeting_minutes/ Meeting Summary: ============== - OFED 1.4 release: GA on Dec 9 (delayed in one day to be after the OFED release meeting) - UNH Logo testing: Rupert will ask them to start with the SW testing, so we can get a report by middle of this week Details: ======= 1. Bug list review: 1421 blo raisch at de.ibm.com ipoib/ehca crash when bouncing network - fixed 1413 cri jackm at mellanox.co.il rmmod mlx4_ib deadlock - very rare case - reduced to normal 1384 maj eli at mellanox.co.il netperf latency small messages increase 5% - UD mode issue - under investigation 1386 maj eli at mellanox.co.il ofed 1.4 - iperf tcp connected mode BW large messages dec... - under investigation 1395 maj vu at mellanox.com kernel panic during SRP HA test - on work 1419 maj vlad at mellanox.co.il Iperf-2.0.4 fails: page allocation failure. order:5 1423 maj jackm at mellanox.co.il RDMA_Write RC small message bw degradation jitter of 6% - doing more measurements 2. UNH Logo testing update - Have not run a lot of tests on RC5 - Should do it this week with RC6 - Need to send the results of the SW part in the middle of this week so we will know we are OK for the GA release next week 3. Issue with new Intel MPI Benchmark (IMB, known as Pallas). In OFED 1.3 we used IMB 3.0 and in OFED 1.4 we upgraded the IMB to 3.1 version. During our performance verification we found 5% latency degradation with IMB 3.1 multi PingPong benchmark (we compare the result to IMB 3.0 version). Do we wish to go back to IMB 3.0? Will wait for decision according to reply of Intel IMB owner Tziporet From alekseys at voltaire.com Tue Dec 2 07:29:04 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Tue, 02 Dec 2008 17:29:04 +0200 Subject: [ofa-general] [PATCHv6 RDMA CM IPv6 2/2] RDMA CM In-Reply-To: <1228222163.14862.8.camel@alst60.voltaire.com> References: <1228221891.14862.3.camel@alst60.voltaire.com> <1228222163.14862.8.camel@alst60.voltaire.com> Message-ID: <1228231744.14862.19.camel@alst60.voltaire.com> It seems that check for IPv6 loopback address on IB interface is unwarranted, because ::1 address "may never be assigned to physical interface". RFC3513, Section 2.5.3. So, previous patch can be applied safely From jackm at dev.mellanox.co.il Tue Dec 2 09:43:44 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 2 Dec 2008 19:43:44 +0200 Subject: [ofa-general] [PATCH] uverbs: return ENOSYS for unimplemented commands (not EINVAL) Message-ID: <200812021943.44732.jackm@dev.mellanox.co.il> uverbs: return ENOSYS for unimplemented commands (not EINVAL) In the original commit (883a99c7024c5763d6d4f22d9239c133893e8d74) (Add a mask of device methods allowed for userspace), the driver returned EINVAL for unimplemented commands. This creates a problem that there is no way to differentiate between an unimplemented command and an implemented one which is incorrectly invoked (which also returns EINVAL). The fix is to have unimplemented commands return ENOSYS. Signed-off-by: Jack Morgenstein --- Roland, We've got a bit of a salad here (d--ned if we do, d--ned if we don't). In userspace, we have low-level libraries put NULL in the virtual function table for unimplemented verbs. libibverbs then returns NULL for those unimplemented verbs which expect a pointer return (e.g., ibv_create_srq) (also a problem since this does not differentiate between a missing verb and an incorrectly-invoked one), and ENOSYS for verbs which expect an int returned (e.g., resize_cq). This is not consistent with what was done in the kernel for unimplemented verbs (where EINVAL is returned). Additionally, what was done in the kernel (returning EINVAL) is problematic in that it makes it impossible to differentiate an unimplemented verb from one which simply had errors in the calling parameters. IMHO, the correct fix is to have unimplemented kernel verbs return ENOSYS. **(MPI already checks for ENOSYS when deciding whether or not to use resize-cq)**. I'm not sure, though, how to handle returns from older kernels -- should all user apps check for either ENOSYS or EINVAL? What about cases where the verb actually is implemented, but incorrectly called -- in older kernels this is already a problem. What about apps which checked for ENOSYS (consistent with userspace usage), and then run with a new low-level library over an older kernel, and suddenly start getting EINVAL returns (for resize_cq, there is no "activation" bit anywhere which gets passed from kernel to the user-level)? Ouch! Any ideas? Index: infiniband/drivers/infiniband/core/uverbs_main.c =================================================================== --- infiniband.orig/drivers/infiniband/core/uverbs_main.c +++ infiniband/drivers/infiniband/core/uverbs_main.c @@ -584,10 +584,12 @@ static ssize_t ib_uverbs_write(struct fi if (hdr.command < 0 || hdr.command >= ARRAY_SIZE(uverbs_cmd_table) || - !uverbs_cmd_table[hdr.command] || - !(file->device->ib_dev->uverbs_cmd_mask & (1ull << hdr.command))) + !uverbs_cmd_table[hdr.command]) return -EINVAL; + if (!(file->device->ib_dev->uverbs_cmd_mask & (1ull << hdr.command))) + return -ENOSYS; + if (!file->ucontext && hdr.command != IB_USER_VERBS_CMD_GET_CONTEXT) return -EINVAL; From chu11 at llnl.gov Tue Dec 2 10:07:19 2008 From: chu11 at llnl.gov (Al Chu) Date: Tue, 02 Dec 2008 10:07:19 -0800 Subject: [ofa-general] [ipoib][patch] support default_pkey module option Message-ID: <1228241240.17375.14.camel@cardanus.llnl.gov> Hi all, As far as I can tell, the only way to create an ipoib interface w/ a non-default pkey is to: 1) goto /sys/class/net/ibX/ 2) echo $MY_NEW_PKEY > create_child 3) then bring up ibX.MY_NEW_PKEY interface 3a) assuming I don't want the original ib0, bring it down (although leaving it up, I guess may not harm anything) It seems somewhat cumbersome for an administrator to script this all up if they only want 1 ipoib interface w/ a non-default pkey. The attached patch creates a module option called "default_pkey" to allow ipoib to default to a different pkey. If nothing is input, it still uses the pkey at index 0. Al -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-support-ipoib-default_pkey-module-option.patch Type: text/x-patch Size: 1796 bytes Desc: not available URL: From sean.hefty at intel.com Tue Dec 2 11:21:40 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 2 Dec 2008 11:21:40 -0800 Subject: [ofa-general] [PATCHv6 RDMA CM IPv6 2/2] RDMA CM In-Reply-To: <1228231744.14862.19.camel@alst60.voltaire.com> References: <1228221891.14862.3.camel@alst60.voltaire.com> <1228222163.14862.8.camel@alst60.voltaire.com> <1228231744.14862.19.camel@alst60.voltaire.com> Message-ID: <000101c954b3$345b2f00$d998070a@amr.corp.intel.com> >It seems that check for IPv6 loopback address on IB interface is >unwarranted, because ::1 address "may never be assigned to physical >interface". > >RFC3513, Section 2.5.3. > > >So, previous patch can be applied safely I don't understand what you're leading to. The rdma_cm basically treats loopback addresses as 'any', and maps the address to the first RDMA port in its list, so that loopback addresses are unusable. - Sean From michael.heinz at qlogic.com Tue Dec 2 11:43:19 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue, 2 Dec 2008 13:43:19 -0600 Subject: [ofa-general] Querying paths on fabrics that don't have Ipoib In-Reply-To: <1228222163.14862.8.camel@alst60.voltaire.com> References: <1228221891.14862.3.camel@alst60.voltaire.com> <1228222163.14862.8.camel@alst60.voltaire.com> Message-ID: I'm trying to develop a user-land test app to exercise the SM on large fabrics. I don't actually have 4000 machines laying around, so I'm using a simulator to create a virtual fabric. Because of this, there are no real ULPs running on the fabric, just a bunch of virtual "HCAs" and "Switches" that the SM can see. Unfortunately, I'm running into a wall, because the rdma_cm requires IPOIB addresses to resolve paths, but the simulator cannot accommodate IPOIB. Is there an API for either the ibcm or the rdmacm that lets me resolve to a path without IPOIB? I know I can accomplish this by doing sa queries by hand, but part of this effort is to evaluate the effect of the caching features in ib_sa, and those seem to work only with the cm modules and ipoib. Thanks in advance... -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania From sean.hefty at intel.com Tue Dec 2 12:40:36 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 2 Dec 2008 12:40:36 -0800 Subject: [ofa-general] [PATCHv6 RDMA CM IPv6 2/2] RDMA CM References: <1228221891.14862.3.camel@alst60.voltaire.com> <1228222163.14862.8.camel@alst60.voltaire.com> <1228231744.14862.19.camel@alst60.voltaire.com> Message-ID: <000401c954be$3af7af40$d998070a@amr.corp.intel.com> >I don't understand what you're leading to. The rdma_cm basically treats >loopback addresses as 'any', and maps the address to the first RDMA port in its >list, so that loopback addresses are unusable. uhm... this should have been 'are usable' From sean.hefty at intel.com Tue Dec 2 11:07:41 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 2 Dec 2008 11:07:41 -0800 Subject: [ofa-general] RE: [PATCH] cma_zero_addr In-Reply-To: <1228222680.14862.13.camel@alst60.voltaire.com> References: <1228222680.14862.13.camel@alst60.voltaire.com> Message-ID: <000001c954b1$402dfcb0$d998070a@amr.corp.intel.com> >This is not really bug fix, but more cosmetic change. There is >ipv6_addr_any function in the kernel, so I propose to use it instead of >the same code duplicated in cma_zero_addr. > > >>From affd57c591799101f89b552973624b39e433e217 Mon Sep 17 00:00:00 2001 >From: Aleksey Senin >Date: Tue, 2 Dec 2008 14:52:27 +0200 >Subject: [PATCH] cma_zero_addr optimized > >Using builtin kernel function to check zero address > >Signed-off-by: Aleksey Senin Acked-by: Sean Hefty > >:100644 100644 f69dda4... 17864d6... M drivers/infiniband/core/cma.c > >diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c >index f69dda4..17864d6 100644 >--- a/drivers/infiniband/core/cma.c >+++ b/drivers/infiniband/core/cma.c >@@ -623,16 +623,12 @@ EXPORT_SYMBOL(rdma_init_qp_attr); > > static inline int cma_zero_addr(struct sockaddr *addr) > { >- struct in6_addr *ip6; >- > if (addr->sa_family == AF_INET) > return ipv4_is_zeronet( > ((struct sockaddr_in *)addr)->sin_addr.s_addr); >- else { >- ip6 = &((struct sockaddr_in6 *) addr)->sin6_addr; >- return (ip6->s6_addr32[0] | ip6->s6_addr32[1] | >- ip6->s6_addr32[2] | ip6->s6_addr32[3]) == 0; >- } >+ else >+ return ipv6_addr_any( >+ &((struct sockaddr_in6 *)addr)->sin6_addr); > } > > static inline int cma_loopback_addr(struct sockaddr *addr) >-- >1.5.6 > From sean.hefty at intel.com Tue Dec 2 11:22:39 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 2 Dec 2008 11:22:39 -0800 Subject: [ofa-general] [PATCHv6 RDMA CM IPv6 2/2] RDMA CM In-Reply-To: <1228222163.14862.8.camel@alst60.voltaire.com> References: <1228221891.14862.3.camel@alst60.voltaire.com> <1228222163.14862.8.camel@alst60.voltaire.com> Message-ID: <000201c954b3$5761b4b0$d998070a@amr.corp.intel.com> >From: Aleksey Senin >Date: Mon, 1 Dec 2008 14:01:10 +0200 >Subject: [PATCH] RDMA CM IPv6 support > >AF_INET6 case in rdma_bind_addr added >AF_INET6 support in cma_format_hdr function >AF_INET6 support when checking loopback address >Use sockaddr_storage structure in cma_bind_any function > >Signed-off-by: Aleksey Senin Acked-by: Sean Hefty > >:100644 100644 d951896... df22c5c... M drivers/infiniband/core/cma.c > >:100644 100644 d951896... 35775cf... M drivers/infiniband/core/cma.c > >diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c >index d951896..35775cf 100644 >--- a/drivers/infiniband/core/cma.c >+++ b/drivers/infiniband/core/cma.c >@@ -636,7 +636,12 @@ static inline int cma_zero_addr(struct sockaddr *addr) > > static inline int cma_loopback_addr(struct sockaddr *addr) > { >- return ipv4_is_loopback(((struct sockaddr_in *) addr)->sin_addr.s_addr); >+ if (addr->sa_family == AF_INET) >+ return ipv4_is_loopback( >+ ((struct sockaddr_in *) addr)->sin_addr.s_addr); >+ else >+ return ipv6_addr_loopback( >+ &((struct sockaddr_in6 *) addr)->sin6_addr); > } > > static inline int cma_any_addr(struct sockaddr *addr) >@@ -1467,10 +1472,10 @@ static void cma_listen_on_all(struct rdma_id_private >*id_priv) > > static int cma_bind_any(struct rdma_cm_id *id, sa_family_t af) > { >- struct sockaddr_in addr_in; >+ struct sockaddr_storage addr_in; > > memset(&addr_in, 0, sizeof addr_in); >- addr_in.sin_family = af; >+ addr_in.ss_family = af; > return rdma_bind_addr(id, (struct sockaddr *) &addr_in); > } > >@@ -2073,7 +2078,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr >*addr) > struct rdma_id_private *id_priv; > int ret; > >- if (addr->sa_family != AF_INET) >+ if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6) > return -EAFNOSUPPORT; > > id_priv = container_of(id, struct rdma_id_private, id); >@@ -2113,32 +2118,61 @@ EXPORT_SYMBOL(rdma_bind_addr); > static int cma_format_hdr(void *hdr, enum rdma_port_space ps, > struct rdma_route *route) > { >- struct sockaddr_in *src4, *dst4; > struct cma_hdr *cma_hdr; > struct sdp_hh *sdp_hdr; > >- src4 = (struct sockaddr_in *) &route->addr.src_addr; >- dst4 = (struct sockaddr_in *) &route->addr.dst_addr; >- >- switch (ps) { >- case RDMA_PS_SDP: >- sdp_hdr = hdr; >- if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) >- return -EINVAL; >- sdp_set_ip_ver(sdp_hdr, 4); >- sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; >- sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; >- sdp_hdr->port = src4->sin_port; >- break; >- default: >- cma_hdr = hdr; >- cma_hdr->cma_version = CMA_VERSION; >- cma_set_ip_ver(cma_hdr, 4); >- cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; >- cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; >- cma_hdr->port = src4->sin_port; >- break; >+ if (route->addr.src_addr.ss_family == AF_INET) { >+ struct sockaddr_in *src4, *dst4; >+ >+ src4 = (struct sockaddr_in *) &route->addr.src_addr; >+ dst4 = (struct sockaddr_in *) &route->addr.dst_addr; >+ >+ switch (ps) { >+ case RDMA_PS_SDP: >+ sdp_hdr = hdr; >+ if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) >+ return -EINVAL; >+ sdp_set_ip_ver(sdp_hdr, 4); >+ sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; >+ sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; >+ sdp_hdr->port = src4->sin_port; >+ break; >+ default: >+ cma_hdr = hdr; >+ cma_hdr->cma_version = CMA_VERSION; >+ cma_set_ip_ver(cma_hdr, 4); >+ cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; >+ cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; >+ cma_hdr->port = src4->sin_port; >+ break; >+ } >+ } else { >+ struct sockaddr_in6 *src6, *dst6; >+ >+ src6 = (struct sockaddr_in6 *) &route->addr.src_addr; >+ dst6 = (struct sockaddr_in6 *) &route->addr.dst_addr; >+ >+ switch (ps) { >+ case RDMA_PS_SDP: >+ sdp_hdr = hdr; >+ if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) >+ return -EINVAL; >+ sdp_set_ip_ver(sdp_hdr, 6); >+ sdp_hdr->src_addr.ip6 = src6->sin6_addr; >+ sdp_hdr->dst_addr.ip6 = dst6->sin6_addr; >+ sdp_hdr->port = src6->sin6_port; >+ break; >+ default: >+ cma_hdr = hdr; >+ cma_hdr->cma_version = CMA_VERSION; >+ cma_set_ip_ver(cma_hdr, 6); >+ cma_hdr->src_addr.ip6 = src6->sin6_addr; >+ cma_hdr->dst_addr.ip6 = dst6->sin6_addr; >+ cma_hdr->port = src6->sin6_port; >+ break; >+ } > } >+ > return 0; > } > >-- >1.5.6 > From sean.hefty at intel.com Tue Dec 2 12:38:44 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 2 Dec 2008 12:38:44 -0800 Subject: [ofa-general] RE: Querying paths on fabrics that don't have Ipoib In-Reply-To: References: <1228221891.14862.3.camel@alst60.voltaire.com> <1228222163.14862.8.camel@alst60.voltaire.com> Message-ID: <000301c954bd$f7dfc800$d998070a@amr.corp.intel.com> >Is there an API for either the ibcm or the rdmacm that lets me resolve >to a path without IPOIB? I know I can accomplish this by doing sa >queries by hand, but part of this effort is to evaluate the effect of >the caching features in ib_sa, and those seem to work only with the cm >modules and ipoib. There is not a supported API for what you are trying to do. There is a very old libibsa library and supporting kernel module that I created for the labs a couple of years ago that may help, but I do not know how much effort it would take to get it running again. Plus, it would need to be modified to support PR queries. If interested, see my rdma-dev.git (user_sa branch) and libibsa.git trees. You could also try modifying the rdma_cm code, so that rdma_resolve_addr() takes a DGID directly to avoid IP to GID mappings. The GID could be passed as an IPv6 address, with some sort of indication that it really is a GID (different address family, for example). - Sean From michael.heinz at qlogic.com Tue Dec 2 13:25:36 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Tue, 2 Dec 2008 15:25:36 -0600 Subject: [ofa-general] RE: Querying paths on fabrics that don't have Ipoib In-Reply-To: <000301c954bd$f7dfc800$d998070a@amr.corp.intel.com> References: <1228221891.14862.3.camel@alst60.voltaire.com> <1228222163.14862.8.camel@alst60.voltaire.com> <000301c954bd$f7dfc800$d998070a@amr.corp.intel.com> Message-ID: Thanks, Sean. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Sean Hefty [mailto:sean.hefty at intel.com] Sent: Tuesday, December 02, 2008 3:39 PM To: Mike Heinz; general at lists.openfabrics.org Subject: RE: Querying paths on fabrics that don't have Ipoib >Is there an API for either the ibcm or the rdmacm that lets me resolve >to a path without IPOIB? I know I can accomplish this by doing sa >queries by hand, but part of this effort is to evaluate the effect of >the caching features in ib_sa, and those seem to work only with the cm >modules and ipoib. There is not a supported API for what you are trying to do. There is a very old libibsa library and supporting kernel module that I created for the labs a couple of years ago that may help, but I do not know how much effort it would take to get it running again. Plus, it would need to be modified to support PR queries. If interested, see my rdma-dev.git (user_sa branch) and libibsa.git trees. You could also try modifying the rdma_cm code, so that rdma_resolve_addr() takes a DGID directly to avoid IP to GID mappings. The GID could be passed as an IPv6 address, with some sort of indication that it really is a GID (different address family, for example). - Sean From anuj01 at gmail.com Tue Dec 2 23:00:07 2008 From: anuj01 at gmail.com (=?UTF-8?B?4KSF4KSo4KWB4KSc?=) Date: Wed, 3 Dec 2008 12:30:07 +0530 Subject: [ofa-general] ***SPAM*** MPIR_Init_thread(310).......: Initialization failed Message-ID: Hi I have compiled mvapich2-1.2p1 for gen2. I tried to run IMB ( Intel MPI Benchmark) over it. But I'm getting the following error : Fatal error in MPI_Init_thread: Other MPI error, error stack: MPIR_Init_thread(310).......: Initialization failed MPID_Init(113)..............: channel initialization failed MPIDI_CH3_Init(168).........: MPIDI_CH3I_RDMA_init(138)...: rdma_setup_startup_ring(334): cannot create cq MPI process terminated unexpectedly Exit code -5 signaled from pnetib2 # here pnetib2 is the host name assigned to ipoib interface cleanupKilling remote processes...Signal 15 received. DONE Please tell me where is the problem. Or how can i debug this. Thanks Alot Regards, -- Anuj Aggarwal .''`. : :Ⓐ : # apt-get install hakuna-matata `. `'` `- -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Tue Dec 2 23:34:41 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 03 Dec 2008 09:34:41 +0200 Subject: [ofa-general] Re: [ewg] rhel 5.2 iSER support? In-Reply-To: <4935B02B.9020408@sun.com> References: <4935B02B.9020408@sun.com> Message-ID: <49363691.50609@voltaire.com> Sameer Mehta wrote: > Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_connect:connecting > to: 192.168.0.5, port 0xbc0c > Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_cma_handler:event 0 > conn ffff81015de00bc0 id ffff81017fc8e200 > Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_cma_handler:event 2 > conn ffff81015de00bc0 id ffff81017fc8e200 > Dec 2 16:44:52 nws-bur-25-46 kernel: iser: > iser_create_ib_conn_res:setting conn ffff81015de00bc0 cma_id > ffff81017fc8e200: fmr_pool ffff810140c9aec0 qp ffff810168974e00 > Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_cma_handler:event 8 > conn ffff81015de00bc0 id ffff81017fc8e200 > Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_cma_handler:event: 8, > error: 8 > > Am I missing something here? is iSER transport available in v1.4? You are getting REJECTED (8) event with the reject reason being INVALID_SERVICE_ID (8), see include/rdma/ib_cm.h. This means there's no one listening on the Service-ID you are attempting to connect to, eg your target didn't issue a listen call on the SID (service id) you are trying to connect to or there's some mismatch is the SID as constructed by the initiator, etc. Related inter-op issue has been brought by Jesse Butler from Sun couple of months ago, http://lists.openfabrics.org/pipermail/general/2008-October/054487.html but I am not sure where it stands. The code that builds the SID from the tcp port is cma_get_service_id (drivers/infiniband/core/cma.c, below) where in this case the resulted SID is 0x0000000001060cbc Or. > static __be64 cma_get_service_id(enum rdma_port_space ps, struct sockaddr *addr) > { > return cpu_to_be64(((u64)ps << 16) + be16_to_cpu(cma_port(addr))); > } From Zhen.Liang at Sun.COM Wed Dec 3 00:31:47 2008 From: Zhen.Liang at Sun.COM (Liang Zhen) Date: Wed, 03 Dec 2008 16:31:47 +0800 Subject: [ofa-general] must include kernel_addons/backport for software depending on OFED? Message-ID: <493643F3.20301@sun.com> Hi there, I saw openib/kernel_addons/backport after install kernel-ib-devel, looks we have to put "-I$BACKPORT_INCLUDES" in $LINUXINCLUDE for compile any kernel code depending on ofed(I failed to build without BACKPORT_INCLUDES), but kernel_addons/backport has a lot of macros and inlines, which will conflict with config.h we already got (We don't want to use these backports everwhere in our system because we have only one module depending on OFED). Any suggestion for this? Thanks! Liang From nicolas.morey-chaisemartin at ext.bull.net Wed Dec 3 00:35:12 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Wed, 03 Dec 2008 09:35:12 +0100 Subject: [ofa-general] ***SPAM*** git http url for ofed_1_3/management.git In-Reply-To: References: Message-ID: <493644C0.9010601@ext.bull.net> Don't you use a HTTP proxy? If you look on Google there are a some howtos to explain how to get git to work through a HTTP proxy. Nicolas Todd Bowman wrote: > Thanks for the reply. Our firewall blocks the git protocol so I can't > use the git url. I need to use the http url for git. > > > Todd > On Tue, Dec 2, 2008 at 7:33 AM, Hal Rosenstock > > wrote: > > Todd, > > On Tue, Dec 2, 2008 at 9:13 AM, Todd Bowman > wrote: > > Hal, > > > > You sent a reply but I don't see any new text. > > The url is git://git.openfabrics.org/ofed_1_3/management.git > > so something like > git clone git://git.openfabrics.org/ofed_1_3/management.git > management > > I think that there are some additional fixes which may not be there > but are in Sasha's tree (ofed 1.3 branch) if that matters. > > -- Hal > > > > > Todd > > > > On Mon, Dec 1, 2008 at 5:05 PM, Hal Rosenstock > > > > wrote: > >> > >> On Mon, Dec 1, 2008 at 6:28 PM, Todd Bowman > wrote: > >> > I am trying to download the ofed_1_3/mangement.git using > http. It seems > >> > that I do not have the right url > >> > > >> > I have used the following > >> > > >> > git clone http://git.openfabrics.org/ofed_1_3/mangement.git > >> > git://git.openfabrics.org/ofed_1_3/management.git > > >> > >> > git clone > >> > > "http://www.openfabrics.org/git/?p=ofed_1_3/management.git;a=tree" > >> > > >> > neither have been succesful. Can someone point me to the > right url. > >> > > >> > I need to use http due to firewall, and I have set proxy with > the env > >> > variable http_proxy. > >> > > >> > Thanks, > >> > Todd > >> > _______________________________________________ > >> > general mailing list > >> > general at lists.openfabrics.org > > >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > > >> > To unsubscribe, please visit > >> > http://openib.org/mailman/listinfo/openib-general > >> > > > > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From nicolas.morey-chaisemartin at ext.bull.net Wed Dec 3 01:28:08 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Wed, 03 Dec 2008 10:28:08 +0100 Subject: [ofa-general] [PATCH] UMAD: Correct unalign access bug on IA64 Message-ID: <49365128.20009@ext.bull.net> Signed-off-by: Nicolas Morey-Chaisemartin --- drivers/infiniband/core/user_mad.c | 21 +++++++++++++++++++-- 1 files changed, 19 insertions(+), 2 deletions(-) -------------- next part -------------- A non-text attachment was scrubbed... Name: c95143dd9b47b1ec2ac0e3f52c961cbbf52e4a67.diff Type: text/x-patch Size: 1593 bytes Desc: not available URL: From nicolas.morey-chaisemartin at ext.bull.net Wed Dec 3 01:43:19 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Wed, 03 Dec 2008 10:43:19 +0100 Subject: [ofa-general] [PATCH] OpenIBd: Try to open a kernel specific config file before the default one Message-ID: <493654B7.7060104@ext.bull.net> This patch is part of our packaging policy. As we have usually two kernel installed on a single server, we need to install two instances of the ofa_kernel. The install used to be quite a mess as the second installation (install on the second kernel) conflict with the first one on the configuration files, and the differents scripts. Therefore, we split the ofa_kernel into 2 packages in the spec file (a bit like redhat) with ofa_kernel containing the modules and ofa_kernel_scripts containing the udev files, configurating the modprobe.conf and so on... A problem remained with the /etc/infiniband/openib.conf as it is dependant from the ofa_kernel installed (which modules it was compiled with) but couldn't be directly put in the same package as it would conflict when doing an install from another kernel. Thus, we have made this match so openib tries first to load /etc/infiniband/openib_`uname -r`.conf and fallback to the regular /etc/infiniband/openib.conf if the previous one doesn't exists. Therefore, by modifying the spec file, we are able to install the openibd config file as part of the ofa_kernel (and not ofa_kernel_scripts) without any conflicts, and keeping openibd compatible with regular ofa_kernel packaging. This is definitely a patch made for our needs, but the base idea to split the kernel module and the scripts is quite good, so I'm sharing it ! Nicolas Signed-off-by: Nicolas Morey-Chaisemartin --- ofed_scripts/openibd | 13 +++++++++---- 1 files changed, 9 insertions(+), 4 deletions(-) -------------- next part -------------- A non-text attachment was scrubbed... Name: e278e67581940ac42a260a5416bdf9c405e76ed6.diff Type: text/x-patch Size: 699 bytes Desc: not available URL: From vlad at lists.openfabrics.org Wed Dec 3 03:32:00 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 3 Dec 2008 03:32:00 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081203-0200 daily build status Message-ID: <20081203113200.D1F4EE60C8E@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From aostvold at platform.com Wed Dec 3 04:58:15 2008 From: aostvold at platform.com (Asmund Ostvold) Date: Wed, 03 Dec 2008 13:58:15 +0100 Subject: [ofa-general] ibv_post_send fails when using malloc in a special way Message-ID: <49368267.40007@platform.com> Hi again, We understood that the last issue report was a bit minimalistic. We hope that report will be of more interest. If you have ANY question please send me email and I will answer! We have run into a strange problem: * RDMA(ibv_post_send) fails when using malloc in a special way * We have an issue which we have reduced down to the enclosed program. it is not neat; but is able to demonstrate what we think is a problem with either ibverbs, libc, or the kernel * The issue is related to the sending process * Receiver sees incorrect data * The receiver will use "private" receive buffers for all RDMAs, i.e., each RDMA "put" will place memory into a distinct receive memory area * The sender will re-use the memory area * To trig the problem, we need a malloc() attempting to allocate huge amount of memory, but which fails. Without this failing malloc(), everything is OK. Please note that malloc changes allocation policy after this failing malloc (see below), and this behavior is what we observed in a pthreads program where we first discovered the issue. It must also be noted that if we allocate buffers with malloc instead of valloc it works fine... * Someone reviewing this would probably say: "The problem comes from potential munmap()+mmap() or an mremap()". We acknowledge that the failing program is vulnerable in this context, but strace does not reveal any such change in virtual to physical mapping. (And be aware, this is a stripped down example of a much more complicated scenario) * We are concerned that the call to ibv_reg_mr() does not imply a call to madvise() * We have tested with rhel4.6, kernel: 2.6.9-67.ELsmp, x86_64 libibverbs-1.1.1-1.ofed1.3.1, Mellanox Technologies MT23108 InfiniHost (rev a1) and with: rhel5.2, kernel: 2.6.18-92.el5, x86_64 libibverbs-1.1.2-1.ofed1.4.rc6, Mellanox Technologies MT25418 (rev a0) Here is the special malloc behaviour: [ 3afb6c40bc] mmap(NULL, 39999000576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) [ 3afb6bfb0a] brk(0x95073b000) = 0x526000 [ 3afb6c40bc] mmap(NULL, 39999135744, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) [ 3afb6c40bc] mmap(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x46e8632000 [ 3afb6c40e9] munmap(0x46e8632000, 843776) = 0 [ 3afb6c40e9] munmap(0x46e8800000, 204800) = 0 [ 3afb6c4119] mprotect(0x46e8700000, 135168, PROT_READ|PROT_WRITE) = 0 [ 3afb6c40bc] mmap(NULL, 39999000576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) Please note the mmap() returning 0x46e8632000. Its uses PROT_NONE and MAP_NORESERVE. Not sure if it is related. The RDMA send buffer that fails, comes from this mmap. Here is a shortform description of the way we allocate/free buffers: * Buffers are allocated using valloc * Buffers are registered using ibv_reg_mr if it not already registered * Buffers are initiated with unique data * Data is copied to receiver with ibv_post_send * We wait with ibv_poll_cq * Buffers are freed using free * When we start getting same buffer addresses from valloc and we don't register memory, data becomes wrong at the receiver side. We get partial data from previous buffer. strace/ltrace are available if anyone is interested. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: bug.c URL: From monis at Voltaire.COM Wed Dec 3 06:27:12 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Wed, 03 Dec 2008 16:27:12 +0200 Subject: [ofa-general] [BUG REPORT] mlx4: Incorrect event is generated when SM is changing Message-ID: <49369740.7060603@Voltaire.COM> I have a small fabric with ConnectX HCAs and I run some tests with 2 Open SMs that take over of each other (either by discovering that the MASTER is not responding or by coming up with higher priority). I noticed that in 2.6.28-rc6 I get in IPoIB a LID_CHANGE event (while expecting CLIENT_REREGISTER) even though LID was not changed. I looked where the events are generated in drivers/infiniband/hw/mlx4/mad.c:smp_snoop() if (pinfo->clientrereg_resv_subnetto & 0x80) event.event = IB_EVENT_CLIENT_REREGISTER; else event.event = IB_EVENT_LID_CHANGE; and see that clientrereg bit is off. I checked this simultaneously with a host running kernel 2.6.27 and see that the bit is on and CLIENT_REREGISTER is generated. Things get worse of this patch (http://lists.openfabrics.org/pipermail/general/2008-November/055680.html) is applied since no event would be dispatched at all (BTW Jack, I think you forgot to send the second half for mthca) I'll appreciate a clue where does the reregister bit turned off. thanks MoniS From nicolas.morey-chaisemartin at ext.bull.net Wed Dec 3 06:47:15 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Wed, 03 Dec 2008 15:47:15 +0100 Subject: [ofa-general] [PATCH] OpenSM: Fixed GUID check against cn_guid_file usinf Ftree. Message-ID: <49369BF3.5040902@ext.bull.net> Port GUID was not converted using cl_ntoh64 before being searched in the CN cl_qmap. Therefore, the cn_guid_file needed to be reversed (bigendian<->littleendian conversion) so it would recognize the nodes. This patch makes it consistent with the root_guid_file and the log messages. Signed-off-by: Nicolas Morey-Chaisemartin --- opensm/opensm/osm_ucast_ftree.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) -------------- next part -------------- A non-text attachment was scrubbed... Name: f52d93cf205f3d6b886b62616d46a82f48231472.diff Type: text/x-patch Size: 614 bytes Desc: not available URL: From jsquyres at cisco.com Wed Dec 3 07:08:03 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 3 Dec 2008 10:08:03 -0500 Subject: [ofa-general] Re: [PATCH] uverbs: return ENOSYS for unimplemented commands (not EINVAL) In-Reply-To: <200812021943.44732.jackm@dev.mellanox.co.il> References: <200812021943.44732.jackm@dev.mellanox.co.il> Message-ID: +1 for this patch (returning ENOSYS instead of EINVAL): it makes life a bunch easier for Open MPI because we can tell the difference between "it isn't implemented" and a bug in certain versions of ConnectX firmware. Roland -- do you think you'll apply this patch? On Dec 2, 2008, at 12:43 PM, Jack Morgenstein wrote: > uverbs: return ENOSYS for unimplemented commands (not EINVAL) > > In the original commit (883a99c7024c5763d6d4f22d9239c133893e8d74) > (Add a mask of device methods allowed for userspace), > the driver returned EINVAL for unimplemented commands. > > This creates a problem that there is no way to differentiate between > an unimplemented command and an implemented one which is incorrectly > invoked (which also returns EINVAL). > > The fix is to have unimplemented commands return ENOSYS. > > Signed-off-by: Jack Morgenstein > > --- > > Roland, > We've got a bit of a salad here (d--ned if we do, d--ned if we don't). > > In userspace, we have low-level libraries put NULL in the virtual > function > table for unimplemented verbs. libibverbs then returns NULL for those > unimplemented verbs which expect a pointer return (e.g., > ibv_create_srq) > (also a problem since this does not differentiate between a missing > verb and > an incorrectly-invoked one), > and ENOSYS for verbs which expect an int returned (e.g., resize_cq). > > This is not consistent with what was done in the kernel for > unimplemented verbs > (where EINVAL is returned). > > Additionally, what was done in the kernel (returning EINVAL) is > problematic > in that it makes it impossible to differentiate an unimplemented > verb from one > which simply had errors in the calling parameters. > > IMHO, the correct fix is to have unimplemented kernel verbs return > ENOSYS. > **(MPI already checks for ENOSYS when deciding whether or not to use > resize-cq)**. > > I'm not sure, though, how to handle returns from older kernels -- > should all user > apps check for either ENOSYS or EINVAL? What about cases where the > verb actually > is implemented, but incorrectly called -- in older kernels this is > already a problem. > What about apps which checked for ENOSYS (consistent with userspace > usage), and > then run with a new low-level library over an older kernel, and > suddenly start getting > EINVAL returns (for resize_cq, there is no "activation" bit anywhere > which gets passed > from kernel to the user-level)? > > Ouch! > > Any ideas? > > Index: infiniband/drivers/infiniband/core/uverbs_main.c > =================================================================== > --- infiniband.orig/drivers/infiniband/core/uverbs_main.c > +++ infiniband/drivers/infiniband/core/uverbs_main.c > @@ -584,10 +584,12 @@ static ssize_t ib_uverbs_write(struct fi > > if (hdr.command < 0 || > hdr.command >= ARRAY_SIZE(uverbs_cmd_table) || > - !uverbs_cmd_table[hdr.command] || > - !(file->device->ib_dev->uverbs_cmd_mask & (1ull << > hdr.command))) > + !uverbs_cmd_table[hdr.command]) > return -EINVAL; > > + if (!(file->device->ib_dev->uverbs_cmd_mask & (1ull << > hdr.command))) > + return -ENOSYS; > + > if (!file->ucontext && > hdr.command != IB_USER_VERBS_CMD_GET_CONTEXT) > return -EINVAL; -- Jeff Squyres Cisco Systems From jackm at mellanox.co.il Wed Dec 3 07:24:40 2008 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Wed, 3 Dec 2008 17:24:40 +0200 Subject: [ofa-general] RE: [BUG REPORT] mlx4: Incorrect event is generated when SM is changing In-Reply-To: <49369740.7060603@Voltaire.COM> Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD011FDABF@mtlexch01.mtl.com> I did post the mthca patch: http://lists.openfabrics.org/pipermail/general/2008-November/055681.html - Jack > -----Original Message----- > From: Moni Shoua [mailto:monis at Voltaire.COM] > Sent: Wednesday, December 03, 2008 4:27 PM > To: Roland Dreier; Jack Morgenstein > Cc: OpenFabrics General; Olga Stern; Or Gerlitz > Subject: [BUG REPORT] mlx4: Incorrect event is generated when > SM is changing > > > I have a small fabric with ConnectX HCAs and I run some tests > with 2 Open SMs that take over of each other (either by > discovering that the MASTER is not responding or by coming up > with higher priority). > > I noticed that in 2.6.28-rc6 I get in IPoIB a LID_CHANGE > event (while expecting CLIENT_REREGISTER) even though LID was > not changed. I looked where the events are generated in > drivers/infiniband/hw/mlx4/mad.c:smp_snoop() > > if (pinfo->clientrereg_resv_subnetto & 0x80) > event.event = > IB_EVENT_CLIENT_REREGISTER; > else > event.event = IB_EVENT_LID_CHANGE; > > and see that clientrereg bit is off. > > I checked this simultaneously with a host running kernel > 2.6.27 and see that the bit is on and > CLIENT_REREGISTER is generated. > > Things get worse of this patch > (http://lists.openfabrics.org/pipermail/general/2008-November/ 055680.html) is applied since no event would be dispatched at all (BTW Jack, I think you forgot to send the second half for mthca) I'll appreciate a clue where does the reregister bit turned off. thanks MoniS From monis at Voltaire.COM Wed Dec 3 07:32:15 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Wed, 03 Dec 2008 17:32:15 +0200 Subject: [ofa-general] RE: [BUG REPORT] mlx4: Incorrect event is generated when SM is changing In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD011FDABF@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD011FDABF@mtlexch01.mtl.com> Message-ID: <4936A67F.5030304@Voltaire.COM> Jack Morgenstein wrote: > I did post the mthca patch: > http://lists.openfabrics.org/pipermail/general/2008-November/055681.html > You're right. I missed it. Sorry. From rdreier at cisco.com Wed Dec 3 07:51:03 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 03 Dec 2008 07:51:03 -0800 Subject: [ofa-general] [PATCH] UMAD: Correct unalign access bug on IA64 In-Reply-To: <49365128.20009@ext.bull.net> (Nicolas Morey Chaisemartin's message of "Wed, 03 Dec 2008 10:28:08 +0100") References: <49365128.20009@ext.bull.net> Message-ID: thanks... a couple comments, since I don't think we need to write our own helpers for this: > - if ((hdr->tid != sent_hdr->tid) || > + if ((!cmp_tid(&hdr->tid, &sent_hdr->tid)) || this could be done with get_unaligned() (are both tids possibly unaligned, or just one?) or even with memcmp() -- if we use memcmp() I guess a comment explaining it's for alignment reasons would be a good thing to have. > - rmpp_mad->mad_hdr.tid = *tid; > + copy_tid(&rmpp_mad->mad_hdr.tid ,tid); And this can just be put_unaligned(tid, &rmpp_mad->mad_hdr.tid); right? Also, a couple of minor issues that would make it easier to apply the patch: > Signed-off-by: Nicolas Morey-Chaisemartin > email should be on the same line here. Also checkpatch.pl warns about several things like trailing whitespace and bogus spacing around the ',' in "copy_tid(&rmpp_mad->mad_hdr.tid ,tid);" Even for simple patches like this I appreciated not having to fix things like that up by hand. - R. From jackm at mellanox.co.il Wed Dec 3 08:01:13 2008 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Wed, 3 Dec 2008 18:01:13 +0200 Subject: [ofa-general] RE: [BUG REPORT] mlx4: Incorrect event is generated when SM is changing In-Reply-To: <49369740.7060603@Voltaire.COM> Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD011FDB29@mtlexch01.mtl.com> > Things get worse if this patch > (http://lists.openfabrics.org/pipermail/general/2008-November/055680.htm l) > is applied since no event would be dispatched at all Does this mean that you do not want the patches applied? (I have put them into the upcoming OFED 1.4) - Jack > -----Original Message----- > From: Moni Shoua [mailto:monis at Voltaire.COM] > Sent: Wednesday, December 03, 2008 4:27 PM > To: Roland Dreier; Jack Morgenstein > Cc: OpenFabrics General; Olga Stern; Or Gerlitz > Subject: [BUG REPORT] mlx4: Incorrect event is generated when > SM is changing > > > I have a small fabric with ConnectX HCAs and I run some tests > with 2 Open SMs that take over of each other (either by > discovering that the MASTER is not responding or by coming up > with higher priority). > > I noticed that in 2.6.28-rc6 I get in IPoIB a LID_CHANGE > event (while expecting CLIENT_REREGISTER) even though LID was > not changed. I looked where the events are generated in > drivers/infiniband/hw/mlx4/mad.c:smp_snoop() > > if (pinfo->clientrereg_resv_subnetto & 0x80) > event.event = > IB_EVENT_CLIENT_REREGISTER; > else > event.event = IB_EVENT_LID_CHANGE; > > and see that clientrereg bit is off. > > I checked this simultaneously with a host running kernel > 2.6.27 and see that the bit is on and > CLIENT_REREGISTER is generated. > > Things get worse of this patch > (http://lists.openfabrics.org/pipermail/general/2008-November/ 055680.html) is applied since no event would be dispatched at all (BTW Jack, I think you forgot to send the second half for mthca) I'll appreciate a clue where does the reregister bit turned off. thanks MoniS From monis at Voltaire.COM Wed Dec 3 08:14:00 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Wed, 03 Dec 2008 18:14:00 +0200 Subject: [ofa-general] RE: [BUG REPORT] mlx4: Incorrect event is generated when SM is changing In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD011FDB29@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD011FDB29@mtlexch01.mtl.com> Message-ID: <4936B048.7090402@Voltaire.COM> Jack Morgenstein wrote: >> Things get worse if this patch >> > (http://lists.openfabrics.org/pipermail/general/2008-November/055680.htm > l) >> is applied since no event would be dispatched at all > > Does this mean that you do not want the patches applied? > (I have put them into the upcoming OFED 1.4) > Since OFED-1.4 baseline is linux-2.6.27, the patches work fine there. It will be a problem in **next** OFED (which will be probably based at least on linux-2.6.28) if the bug that is described here isn't solved. In the bottom line I'd like to keep the patches in OFED-1.4 but return to this issue in OFED-1.5 thanks From taylor at hpc.ufl.edu Wed Dec 3 08:14:41 2008 From: taylor at hpc.ufl.edu (Charles Taylor) Date: Wed, 3 Dec 2008 11:14:41 -0500 Subject: [ofa-general] CX4 Optical Cables Message-ID: Operational Question.... We have some storage nodes across the room from our IB switch and reach them with 10M copper cables. These 10M cables have been an ongoing issue for us as they fail (start showing symbol errors) fairly frequently. We decided to start replacing them with fiber (optics imbedded in CX4 connectors) cables. We got a couple of "intel- connect" 20m cables (note that I think Intel sold the business to EMCore). When we installed the first cable yesterday we were surprised to see that we could not get link status on the connection. The HCA in the host is a Cisco/TopSpin "LionCub" (board id=MT_00A0000001). We have some newer (by one year) Mellanox branded HCAs on some other nodes (MHEA28-1TC, Lion cub, board id=MT_02F0110001). When we ran the optical cable from the same switch port to the mellanox HCAs, we got link status and a LID was issued - no problem. We are just wondering if anyone knows what changed between these two card/chip versions such that the optical connectors work transparently with the the mellanox cards but not at all with the TopSpin/Cisco cards. Thanks, Charlie Taylor UF HPC Center From rdreier at cisco.com Wed Dec 3 08:28:14 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 03 Dec 2008 08:28:14 -0800 Subject: [ofa-general] CX4 Optical Cables In-Reply-To: (Charles Taylor's message of "Wed, 3 Dec 2008 11:14:41 -0500") References: Message-ID: > We are just wondering if anyone knows what changed between these two > card/chip versions such that the optical connectors work transparently > with the the mellanox cards but not at all with the TopSpin/Cisco > cards. optical cables require power from the HCA port (to run the optics, obviously). Not all HCAs have power available at the port. I don't know of a very good way to tell which do. The Topspin/Cisco HCAs are basically OEM versions of Mellanox HCAs so maybe Mellanox can tell you which versions support power. - R. From rdreier at cisco.com Wed Dec 3 08:31:54 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 03 Dec 2008 08:31:54 -0800 Subject: [ofa-general] Re: [BUG REPORT] mlx4: Incorrect event is generated when SM is changing In-Reply-To: <49369740.7060603@Voltaire.COM> (Moni Shoua's message of "Wed, 03 Dec 2008 16:27:12 +0200") References: <49369740.7060603@Voltaire.COM> Message-ID: Can you try 2.6.28-rc7? I think the fix 9a5aa622 ("mlx4_core: Save/restore default port IB capability mask") sounds related, and that went in after -rc6. - R. From sashak at voltaire.com Wed Dec 3 08:31:32 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 3 Dec 2008 18:31:32 +0200 Subject: [ofa-general] git http url for ofed_1_3/management.git In-Reply-To: References: Message-ID: <20081203163132.GB4586@sashak.voltaire.com> On 10:15 Tue 02 Dec , Hal Rosenstock wrote: > On Tue, Dec 2, 2008 at 9:43 AM, Todd Bowman wrote: > > Thanks for the reply. Our firewall blocks the git protocol so I can't use > > the git url. I need to use the http url for git. > > I'm not sure about whether http/https is supported for this on the OFA server. It looks http:// basically works when you are using http://www.openfabrics.org/... (http://git.openfabrics.org redirects to gitweb http://www.openfabrics.org/git/ ). The problem with http://www.openfabrics.org/ofed_1_3/* repositories is a lack of server-info needed for http/ftp protocols. You can ask Vlad to run 'git update-server-info' there. Alternatively you can use my repository http://www.openfabrics.org/~sashak/management.git , there is ofed_1_3 branch. Sasha From jackm at mellanox.co.il Wed Dec 3 08:42:58 2008 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Wed, 3 Dec 2008 18:42:58 +0200 Subject: [ofa-general] RE: [BUG REPORT] mlx4: Incorrect event is generated when SM is changing In-Reply-To: Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD011FDB83@mtlexch01.mtl.com> It is related, since one of the capability mask bits that was wiped out before the fix mentioned below was: 25: IsClientReregistrationSupported (IB Spec 1.2.1, table 146, page 831, Capability Mask bits) - Jack > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Wednesday, December 03, 2008 6:32 PM > To: Moni Shoua > Cc: Jack Morgenstein; OpenFabrics General; Olga Stern; Or Gerlitz > Subject: Re: [BUG REPORT] mlx4: Incorrect event is generated > when SM is changing > > > Can you try 2.6.28-rc7? I think the fix 9a5aa622 > ("mlx4_core: Save/restore default port IB capability mask") > sounds related, and that went in after -rc6. > > - R. > From monis at Voltaire.COM Wed Dec 3 09:17:03 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Wed, 03 Dec 2008 19:17:03 +0200 Subject: [ofa-general] RE: [BUG REPORT] mlx4: Incorrect event is generated when SM is changing In-Reply-To: <5D49E7A8952DC44FB38C38FA0D758EAD011FDB83@mtlexch01.mtl.com> References: <5D49E7A8952DC44FB38C38FA0D758EAD011FDB83@mtlexch01.mtl.com> Message-ID: <4936BF0F.1080801@Voltaire.COM> Jack Morgenstein wrote: > It is related, since one of the capability mask bits that was wiped out > before the fix mentioned below was: > 25: IsClientReregistrationSupported > (IB Spec 1.2.1, table 146, page 831, Capability Mask bits) > > - Jack >> -----Original Message----- >> From: Roland Dreier [mailto:rdreier at cisco.com] >> Sent: Wednesday, December 03, 2008 6:32 PM >> To: Moni Shoua >> Cc: Jack Morgenstein; OpenFabrics General; Olga Stern; Or Gerlitz >> Subject: Re: [BUG REPORT] mlx4: Incorrect event is generated >> when SM is changing >> >> >> Can you try 2.6.28-rc7? I think the fix 9a5aa622 >> ("mlx4_core: Save/restore default port IB capability mask") >> sounds related, and that went in after -rc6. >> >> - R. Tried rc7. It works OK now. Thanks. From Shainer at Mellanox.com Wed Dec 3 09:25:06 2008 From: Shainer at Mellanox.com (Gilad Shainer) Date: Wed, 3 Dec 2008 09:25:06 -0800 Subject: [ofa-general] CX4 Optical Cables Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F017A3B48@mtiexch01.mti.com> The Topspin cards might not have the power circuitry required for the fiber cables. We might be able to determine this with the PSID info from those cards. Feel free to send me that info directly. Gilad. -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Charles Taylor Sent: Wednesday, December 03, 2008 8:15 AM To: OpenFabrics General Cc: Craig Prescott Subject: [ofa-general] CX4 Optical Cables Operational Question.... We have some storage nodes across the room from our IB switch and reach them with 10M copper cables. These 10M cables have been an ongoing issue for us as they fail (start showing symbol errors) fairly frequently. We decided to start replacing them with fiber (optics imbedded in CX4 connectors) cables. We got a couple of "intel- connect" 20m cables (note that I think Intel sold the business to EMCore). When we installed the first cable yesterday we were surprised to see that we could not get link status on the connection. The HCA in the host is a Cisco/TopSpin "LionCub" (board id=MT_00A0000001). We have some newer (by one year) Mellanox branded HCAs on some other nodes (MHEA28-1TC, Lion cub, board id=MT_02F0110001). When we ran the optical cable from the same switch port to the mellanox HCAs, we got link status and a LID was issued - no problem. We are just wondering if anyone knows what changed between these two card/chip versions such that the optical connectors work transparently with the the mellanox cards but not at all with the TopSpin/Cisco cards. Thanks, Charlie Taylor UF HPC Center _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From Wayne.Glanfield at uk.renaultf1.com Wed Dec 3 09:37:33 2008 From: Wayne.Glanfield at uk.renaultf1.com (Glanfield, Wayne) Date: Wed, 3 Dec 2008 17:37:33 +0000 Subject: [ofa-general] Multiple job failures at same time Message-ID: We have just experienced a problem where 5 jobs failed at the same time ~15:50 GMT with similar messages in their respective output files. Does anybody have any idea what could have cause this and what the messages mean. One of the nodes "cfd-cnsl-0364" was found to have shutdown but could this take out other jobs? They were not running on this node, This is a commercial CFD code which is using hp-mpi 2.2.5, we are running ofed 1.3.1 and using verbs api with Mellanox ConnectX HCA Thanks Wayne JOB #1 starccm+: Rank 0:52: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:50: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:55: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:55: MPI_Test: self cfd-cnsl-0355 peer cfd-cnsl-0365 (rank: 126) starccm+: Rank 0:55: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:52: MPI_Test: self cfd-cnsl-0355 peer cfd-cnsl-0365 (rank: 120) starccm+: Rank 0:52: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:50: MPI_Test: self cfd-cnsl-0355 peer cfd-cnsl-0365 (rank: 127) starccm+: Rank 0:50: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:38: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:35: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:33: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:35: MPI_Test: self cfd-cnsl-0352 peer cfd-cnsl-0364 (rank: 119) starccm+: Rank 0:35: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:38: MPI_Test: self cfd-cnsl-0352 peer cfd-cnsl-0365 (rank: 121) starccm+: Rank 0:38: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:33: MPI_Test: self cfd-cnsl-0352 peer cfd-cnsl-0365 (rank: 121) starccm+: Rank 0:33: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:46: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:46: MPI_Test: self cfd-cnsl-0353 peer cfd-cnsl-0365 (rank: 123) starccm+: Rank 0:46: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:87: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:80: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:87: MPI_Test: self cfd-cnsl-0360 peer cfd-cnsl-0365 (rank: 121) starccm+: Rank 0:87: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:80: MPI_Test: self cfd-cnsl-0360 peer cfd-cnsl-0365 (rank: 124) starccm+: Rank 0:80: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:126: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:122: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:122: MPI_Test: self cfd-cnsl-0365 peer cfd-cnsl-0364 (rank: 116) starccm+: Rank 0:122: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:126: MPI_Test: self cfd-cnsl-0365 peer cfd-cnsl-0364 (rank: 117) starccm+: Rank 0:126: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:124: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:124: MPI_Test: self cfd-cnsl-0365 peer cfd-cnsl-0350 (rank: 21) starccm+: Rank 0:124: MPI_Test: error message: transport retry exceeded error JOB #2 starccm+: Rank 0:138: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:138: MPI_Test: self cfd-cnsl-0143 peer cfd-cnsl-0343 (rank: 103) starccm+: Rank 0:138: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:103: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:103: MPI_Test: self cfd-cnsl-0343 peer cfd-cnsl-0143 (rank: 138) starccm+: Rank 0:103: MPI_Test: error message: transport retry exceeded error Error: {'In': ['Machine::main', 'SimulationIterator::startIterating', 'SteadySolver::step', 'SegregatedFlowSolver::iterationUpdate'], 'Neo.Error': 'Error', 'Processor': 138, 'Severity': 'EXCEPTION', 'message': 'MPI Error : MPI_Test: Internal MPI error'}Synchronizing parallel nodes (attempt 0) starccm+: Rank 0:98: MPI_Cancel: ibv_poll_cq(): bad status 12 starccm+: Rank 0:99: MPI_Cancel: ibv_poll_cq(): bad status 12 starccm+: Rank 0:98: MPI_Cancel: self cfd-cnsl-0343 peer cfd-cnsl-0143 (rank: 140) starccm+: Rank 0:98: MPI_Cancel: error message: transport retry exceeded error starccm+: Rank 0:98: MPI_Cancel: ibv_poll_cq(): bad status 5 starccm+: Rank 0:100: MPI_Cancel: ibv_poll_cq(): bad status 12 starccm+: Rank 0:99: MPI_Cancel: self cfd-cnsl-0343 peer cfd-cnsl-0159 (rank: 179) starccm+: Rank 0:99: MPI_Cancel: error message: transport retry exceeded error starccm+: Rank 0:99: MPI_Cancel: ibv_poll_cq(): bad status 5 starccm+: Rank 0:101: MPI_Cancel: ibv_poll_cq(): bad status 12 starccm+: Rank 0:98: MPI_Cancel: self cfd-cnsl-0343 peer cfd-cnsl-0143 (rank: 140) starccm+: Rank 0:98: MPI_Cancel: error message: work request flushed error starccm+: Rank 0:99: MPI_Cancel: self cfd-cnsl-0343 peer cfd-cnsl-0159 (rank: 179) starccm+: Rank 0:99: MPI_Cancel: error message: work request flushed error starccm+: Rank 0:100: MPI_Cancel: self cfd-cnsl-0343 peer cfd-cnsl-0143 (rank: 139) starccm+: Rank 0:100: MPI_Cancel: error message: transport retry exceeded error starccm+: Rank 0:100: MPI_Cancel: ibv_poll_cq(): bad status 5 starccm+: Rank 0:101: MPI_Cancel: self cfd-cnsl-0343 peer cfd-cnsl-0159 (rank: 179) starccm+: Rank 0:101: MPI_Cancel: error message: transport retry exceeded error starccm+: Rank 0:101: MPI_Cancel: ibv_poll_cq(): bad status 5 starccm+: Rank 0:100: MPI_Cancel: self cfd-cnsl-0343 peer cfd-cnsl-0143 (rank: 139) starccm+: Rank 0:100: MPI_Cancel: error message: work request flushed error starccm+: Rank 0:100: MPI_Cancel: ibv_poll_cq(): bad status 5 starccm+: Rank 0:101: MPI_Cancel: self cfd-cnsl-0343 peer cfd-cnsl-0159 (rank: 179) starccm+: Rank 0:101: MPI_Cancel: error message: work request flushed error starccm+: Rank 0:100: MPI_Cancel: self cfd-cnsl-0343 peer cfd-cnsl-0143 (rank: 139) starccm+: Rank 0:100: MPI_Cancel: error message: work request flushed error starccm+: Rank 0:100: MPI_Cancel: ibv_poll_cq(): bad status 12 starccm+: Rank 0:100: MPI_Cancel: self cfd-cnsl-0343 peer cfd-cnsl-0143 (rank: 136) starccm+: Rank 0:100: MPI_Cancel: error message: transport retry exceeded error JOB #3 starccm+: Rank 0:219: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:222: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:222: MPI_Test: self cfd-cnsl-0339 peer cfd-cnsl-0337 (rank: 212) starccm+: Rank 0:222: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:219: MPI_Test: self cfd-cnsl-0339 peer cfd-cnsl-0337 (rank: 215) starccm+: Rank 0:219: MPI_Test: error message: transport retry exceeded error Error: {'In': ['Machine::main', 'SimulationIterator::startIterating', 'SteadySolver::step', 'SegregatedFlowSolver::iterationUpdate', 'AMGLinearSolver::solve'], 'Neo.Error': 'Error', 'Processor': 222, 'Severity': 'EXCEPTION', 'message': 'MPI Error : MPI_Test: Internal MPI error'} Synchronizing parallel nodes (attempt 0) starccm+: Rank 0:219: MPI_Cancel: ibv_poll_cq(): bad status 5 starccm+: Rank 0:219: MPI_Cancel: self cfd-cnsl-0339 peer cfd-cnsl-0337 (rank: 215) starccm+: Rank 0:219: MPI_Cancel: error message: work request flushed error starccm+: Rank 0:219: MPI_Cancel: ibv_poll_cq(): bad status 12 starccm+: Rank 0:222: MPI_Cancel: ibv_poll_cq(): bad status 5 starccm+: Rank 0:219: MPI_Cancel: self cfd-cnsl-0339 peer cfd-cnsl-0337 (rank: 214) starccm+: Rank 0:219: MPI_Cancel: error message: transport retry exceeded error starccm+: Rank 0:219: MPI_Cancel: ibv_poll_cq(): bad status 5 starccm+: Rank 0:222: MPI_Cancel: self cfd-cnsl-0339 peer cfd-cnsl-0337 (rank: 212) starccm+: Rank 0:222: MPI_Cancel: error message: work request flushed error starccm+: Rank 0:222: MPI_Cancel: MPI BUG: no requests done starccm+: Rank 0:219: MPI_Cancel: self cfd-cnsl-0339 peer cfd-cnsl-0337 (rank: 214) starccm+: Rank 0:219: MPI_Cancel: error message: work request flushed error starccm+: Rank 0:219: MPI_Cancel: ibv_poll_cq(): bad status 12 starccm+: Rank 0:219: MPI_Cancel: self cfd-cnsl-0339 peer cfd-cnsl-0337 (rank: 213) starccm+: Rank 0:219: MPI_Cancel: error message: transport retry exceeded error starccm+: Rank 0:219: MPI_Cancel: ibv_poll_cq(): bad status 5 starccm+: Rank 0:219: MPI_Cancel: self cfd-cnsl-0339 peer cfd-cnsl-0337 (rank: 213) starccm+: Rank 0:219: MPI_Cancel: error message: work request flushed error starccm+: Rank 0:219: MPI_Cancel: ibv_poll_cq(): bad status 12 starccm+: Rank 0:219: MPI_Cancel: self cfd-cnsl-0339 peer cfd-cnsl-0337 (rank: 212) starccm+: Rank 0:219: MPI_Cancel: error message: transport retry exceeded error starccm+: Rank 0:219: MPI_Cancel: ibv_poll_cq(): bad status 5 starccm+: Rank 0:219: MPI_Cancel: self cfd-cnsl-0339 peer cfd-cnsl-0337 (rank: 212) starccm+: Rank 0:219: MPI_Cancel: error message: work request flushed error starccm+: Rank 0:219: MPI_Cancel: ibv_poll_cq(): bad status 12 JOB #4 starccm+: Rank 0:25: MPI_Waitall: ibv_poll_cq(): bad status 12 starccm+: Rank 0:24: MPI_Waitall: ibv_poll_cq(): bad status 12 starccm+: Rank 0:28: MPI_Waitall: ibv_poll_cq(): bad status 12 starccm+: Rank 0:30: MPI_Waitall: ibv_poll_cq(): bad status 12 starccm+: Rank 0:27: MPI_Waitall: ibv_poll_cq(): bad status 12 starccm+: Rank 0:31: MPI_Waitall: ibv_poll_cq(): bad status 12 starccm+: Rank 0:29: MPI_Waitall: ibv_poll_cq(): bad status 12 starccm+: Rank 0:30: MPI_Waitall: self cfd-cnsl-0376 peer cfd-cnsl-0369 (rank: 47) starccm+: Rank 0:30: MPI_Waitall: error message: transport retry exceeded error starccm+: Rank 0:30: MPI_Allreduce: ibv_poll_cq(): bad status 5 starccm+: Rank 0:25: MPI_Waitall: self cfd-cnsl-0376 peer cfd-cnsl-0369 (rank: 40) starccm+: Rank 0:25: MPI_Waitall: error message: transport retry exceeded error starccm+: Rank 0:25: MPI_Allreduce: ibv_poll_cq(): bad status 5 starccm+: Rank 0:28: MPI_Waitall: self cfd-cnsl-0376 peer cfd-cnsl-0369 (rank: 44) starccm+: Rank 0:28: MPI_Waitall: error message: transport retry exceeded error starccm+: Rank 0:28: MPI_Allreduce: ibv_poll_cq(): bad status 5 starccm+: Rank 0:27: MPI_Waitall: self cfd-cnsl-0376 peer cfd-cnsl-0369 (rank: 46) starccm+: Rank 0:27: MPI_Waitall: error message: transport retry exceeded error starccm+: Rank 0:27: MPI_Allreduce: ibv_poll_cq(): bad status 5 starccm+: Rank 0:31: MPI_Waitall: self cfd-cnsl-0376 peer cfd-cnsl-0369 (rank: 46) starccm+: Rank 0:31: MPI_Waitall: error message: transport retry exceeded error starccm+: Rank 0:31: MPI_Allreduce: ibv_poll_cq(): bad status 5 starccm+: Rank 0:29: MPI_Waitall: self cfd-cnsl-0376 peer cfd-cnsl-0369 (rank: 44) starccm+: Rank 0:29: MPI_Waitall: error message: transport retry exceeded error starccm+: Rank 0:29: MPI_Allreduce: ibv_poll_cq(): bad status 5 starccm+: Rank 0:24: MPI_Waitall: self cfd-cnsl-0376 peer cfd-cnsl-0369 (rank: 47 JOB #5 starccm+: Rank 0:6: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:4: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:3: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:61: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:60: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:119: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:6: MPI_Test: self cfd-cnsl-0541 peer cfd-cnsl-0341 (rank: 18) starccm+: Rank 0:6: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:60: MPI_Test: self cfd-cnsl-0506 peer cfd-cnsl-0341 (rank: 22) starccm+: Rank 0:60: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:4: MPI_Test: self cfd-cnsl-0541 peer cfd-cnsl-0341 (rank: 16) starccm+: Rank 0:4: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:3: MPI_Test: self cfd-cnsl-0541 peer cfd-cnsl-0341 (rank: 22) starccm+: Rank 0:3: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:61: MPI_Test: self cfd-cnsl-0506 peer cfd-cnsl-0341 (rank: 23) starccm+: Rank 0:61: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:119: MPI_Test: self cfd-cnsl-0514 peer cfd-cnsl-0341 (rank: 22) starccm+: Rank 0:119: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:38: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:98: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:47: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:38: MPI_Test: self cfd-cnsl-0502 peer cfd-cnsl-0341 (rank: 23) starccm+: Rank 0:38: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:53: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:49: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:98: MPI_Test: self cfd-cnsl-0511 peer cfd-cnsl-0341 (rank: 23) starccm+: Rank 0:98: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:75: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:47: MPI_Test: self cfd-cnsl-0503 peer cfd-cnsl-0341 (rank: 18) starccm+: Rank 0:47: MPI_Test: error message: transport retry exceeded error Regards Wayne --------------------------------------------------------------------- For further information on the Renault F1 Team visit our web site at www.renaultf1.com. Renault F1 Team Limited Registered in England no. 1806337 Registered Office: 16 Old Bailey London EC4M 7EG WARNING: please ensure that you have adequate virus protection in place before you open or detach any documents attached to this email. This e-mail may constitute privileged information. If you are not the intended recipient, you have received this confidential email and any attachments transmitted with it in error and you must not disclose copy, circulate or in any other way use or rely on this information. E-mails to and from the Renault F1 Team are monitored for operational reasons and in accordance with lawful business practices. The contents of this email are those of the individual and do not necessarily represent the views of the company. Please note that this e-mail has been created in the knowledge that Internet e-mail is not a 100% secure communications medium. We advise that you understand and observe this lack of security when e-mailing us. If you have received this email in error please forward to: is.helpdesk at uk.renaultf1.com quoting the sender, then delete the message and any attached documents --------------------------------------------------------------------- From rdreier at cisco.com Wed Dec 3 09:44:59 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 03 Dec 2008 09:44:59 -0800 Subject: [ofa-general] ibv_post_send fails when using malloc in a special way In-Reply-To: <49368267.40007@platform.com> (Asmund Ostvold's message of "Wed, 03 Dec 2008 13:58:15 +0100") References: <49368267.40007@platform.com> Message-ID: The most obvious explanation is that the physical pages underlying your allocation are different after the free/re-valloc. This could happen without a system call I guess if a page is faulted in. - R. From rams at englobe-tec.com Wed Dec 3 09:58:06 2008 From: rams at englobe-tec.com (David Rioja) Date: Wed, 03 Dec 2008 18:58:06 +0100 Subject: [ofa-general] error building ofa_kernel RPMs Message-ID: <4936C8AE.3000304@englobe-tec.com> Dear all, I've tried to build OFED 1.2.5 in my server with fedora core 9. First I had a problem while building ofa_user RPMs but this got solved by installing zlib-static package. I think I have all the rest of dependencies installed, both x68_64 and i386 versions. Now the error comes in the next step: building ofa_kernel RPMs. Here I paste the echoes: *** Building InfiniBand Software RPMs. Please wait... Building ofa_user RPMs. Please wait... Running rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-libcxgb3 --with-libibcm --with-libibverbs --with-libmlx4 --with-libmthca --with-librdmacm --with-mstflint --with-perftest --sysconfdir=/etc --mandir=/usr/share/man' --define 'configure_options32 --with-libcxgb3 --with-libibcm --with-libibverbs --with-libmlx4 --with-libmthca --with-librdmacm --sysconfdir=/etc --mandir=/usr/share/man' --define 'build_32bit 1' --define '_mandir /usr/share/man' /root/InfiniBand/OFED-1.2.5.5/SRPMS/ofa_user-1.2.5.5-0.src.rpm Building ofa_kernel RPMs. Please wait... Running rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-cxgb3-mod --with-ipoib-mod --with-mthca-mod --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mlx4-mod ' --define 'KVERSION 2.6.25-14.fc9.x86_64' --define 'KSRC /lib/modules/2.6.25-14.fc9.x86_64/build' --define 'build_kernel_ib 1' --define 'build_kernel_ib_devel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 'modprobe_update 1' --define 'include_ipoib_conf 1' /root/InfiniBand/OFED-1.2.5.5/SRPMS/ofa_kernel-1.2.5.5-0.src.rpm - ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-cxgb3-mod --with-ipoib-mod --with-mthca-mod --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mlx4-mod ' --define 'KVERSION 2.6.25-14.fc9.x86_64' --define 'KSRC /lib/modules/2.6.25-14.fc9.x86_64/build' --define 'build_kernel_ib 1' --define 'build_kernel_ib_devel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 'modprobe_update 1' --define 'include_ipoib_conf 1' /root/InfiniBand/OFED-1.2.5.5/SRPMS/ofa_kernel-1.2.5.5-0.src.rpm" See log file: /tmp/OFED.build.32431.log *** The tail of /tmp/OFED.build.32431.log is as follows: *** echo " ERROR: Kernel configuration is invalid."; \ echo " include/linux/autoconf.h or include/config/auto.conf are missing."; \ echo " Run 'make oldconfig && make prepare' on kernel src to fix it."; \ echo; \ /bin/false) mkdir -p /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/.tmp_versions ; rm -f /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/.tmp_versions/* make -f scripts/Makefile.build obj=/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5 make -f scripts/Makefile.build obj=/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband make -f scripts/Makefile.build obj=/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core gcc -Wp,-MD,/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/.addr.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.3.0/include -D__KERNEL__ \ \ -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/include \ -I/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/include \ -Iinclude \ \ -include include/linux/autoconf.h \ -include /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/include/linux/autoconf.h \ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Os -fno-stack-protector -m64 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -Wdeclaration-after-statement -Wno-pointer-sign -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(addr)" -D"KBUILD_MODNAME=KBUILD_STR(ib_addr)" -c -o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.o /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c: In function ‘rdma_translate_ip’: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:113: warning: passing argument 1 of ‘ip_dev_find’ makes pointer from integer without a cast /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:113: error: too few arguments to function ‘ip_dev_find’ /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c: In function ‘addr_send_arp’: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:161: warning: passing argument 1 of ‘ip_route_output_key’ from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:161: warning: passing argument 2 of ‘ip_route_output_key’ from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:161: error: too few arguments to function ‘ip_route_output_key’ /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c: In function ‘addr_resolve_remote’: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:182: warning: passing argument 1 of ‘ip_route_output_key’ from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:182: warning: passing argument 2 of ‘ip_route_output_key’ from incompatible pointer type /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:182: error: too few arguments to function ‘ip_route_output_key’ /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c: In function ‘addr_resolve_local’: /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:264: warning: passing argument 1 of ‘ip_dev_find’ makes pointer from integer without a cast /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:264: error: too few arguments to function ‘ip_dev_find’ /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:268: error: implicit declaration of function ‘ZERONET’ /var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.c:272: error: implicit declaration of function ‘LOOPBACK’ make[4]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core/addr.o] Error 1 make[3]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband/core] Error 2 make[2]: *** [/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5/drivers/infiniband] Error 2 make[1]: *** [_module_/var/tmp/OFEDRPM/BUILD/ofa_kernel-1.2.5.5] Error 2 make[1]: Leaving directory `/usr/src/kernels/2.6.25-14.fc9.x86_64' make: *** [kernel] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.62604 (%install) RPM build errors: user vlad does not exist - using root group vlad does not exist - using root user vlad does not exist - using root group vlad does not exist - using root Bad exit status from /var/tmp/rpm-tmp.62604 (%install) ERROR: Failed executing "rpmbuild --rebuild --define '_topdir /var/tmp/OFEDRPM' --define '_prefix /usr' --define 'build_root /var/tmp/OFED' --define 'configure_options --with-cxgb3-mod --with-ipoib-mod --with-mthca-mod --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mlx4-mod ' --define 'KVERSION 2.6.25-14.fc9.x86_64' --define 'KSRC /lib/modules/2.6.25-14.fc9.x86_64/build' --define 'build_kernel_ib 1' --define 'build_kernel_ib_devel 1' --define 'NETWORK_CONF_DIR /etc/sysconfig/network-scripts' --define 'modprobe_update 1' --define 'include_ipoib_conf 1' /root/InfiniBand/OFED-1.2.5.5/SRPMS/ofa_kernel-1.2.5.5-0.src.rpm" *** My kernel version is 2.6.25-14.fc9.x86_64. Is this supported? If yes, has anyone idea of what can be going wrong? Thank you in advance, David From cameron at harr.org Wed Dec 3 10:01:32 2008 From: cameron at harr.org (Cameron Harr) Date: Wed, 03 Dec 2008 11:01:32 -0700 Subject: [ofa-general] CentOS non-OFED opensm package needs /dev/infiniband Message-ID: <4936C97C.5010405@harr.org> This is more of an FYI that can maybe help others, but perhaps it is a bug that needs to be filed somewhere. I've been trying to get the SM running without using OFED, rather "yum install opensm" and using the distro's kernel-provided IB drivers. I've banged my head around for a while trying to figure out why I was getting the following messages: --- Error from osm_opensm_bind (0x2A) Perhaps another instance of OpenSM is already running --- It turns out the SM is looking for devices in /dev/infiniband (umad0, uverbs0), but the kernel-provided ib_umad and ib_uverbs modules place the devices in /dev. By creating a link "infiniband" in the /dev/ directory to the /dev/ directory, things magically started to work. "cd /dev; ln -s . infiniband" From jackm at dev.mellanox.co.il Wed Dec 3 10:20:04 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 3 Dec 2008 20:20:04 +0200 Subject: [ofa-general] [PATCH V2] mlx4: check for FW version which properly supports resize_cq Message-ID: <200812032020.05883.jackm@dev.mellanox.co.il> mlx4: check for FW version which properly supports resize_cq. If the ConnectX cards have earlier FW installed, the resize_cq command will return -ENOSYS. Fixes Bugzilla 1415. Signed-off-by: Jack Morgenstein --- Roland, I decided to solve the resize_cq problem "locally", and not deal with the can of worms that my last patch opens. We already do something similar for qp's (NoErrorCompletion bit) I did this properly this time (#define in cq.h) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 1830849..a637f89 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -347,6 +347,9 @@ int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) int outst_cqe; int err; + if (dev->dev->caps.fw_ver < MLX4_FW_VER_RESIZE_CQ) + return -ENOSYS; + mutex_lock(&cq->resize_mutex); if (entries < 1 || entries > dev->dev->caps.max_cqes) { diff --git a/include/linux/mlx4/cq.h b/include/linux/mlx4/cq.h index 6f65b2c..af6740d 100644 --- a/include/linux/mlx4/cq.h +++ b/include/linux/mlx4/cq.h @@ -64,6 +64,9 @@ struct mlx4_err_cqe { u8 owner_sr_opcode; }; +/* Which firmware version adds support for Resize CQ */ +#define MLX4_FW_VER_RESIZE_CQ mlx4_fw_ver(2, 5, 0) + enum { MLX4_CQE_VLAN_PRESENT_MASK = 1 << 29, MLX4_CQE_QPN_MASK = 0xffffff, From ralph.campbell at qlogic.com Wed Dec 3 10:36:46 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 03 Dec 2008 10:36:46 -0800 Subject: [ofa-general] [PATCH 0/6] IB/ipath -- fixes for 2.6.29 Message-ID: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> The following patches fix some problems recently found: IB/ipath - Fix PSN of subsequent send WQEs after an RDMA read resend IB/ipath - Check the return value of dma_map_single for errors IB/ipath - Add locking for interrupt use of ipath_pd contexts vs free. IB/ipath - don't count IB symbol and link errors unless link is UP IB/ipath - Only do 1X workaround on rev1 chips. IB/ipath - fix spi_pioindex value. From ralph.campbell at qlogic.com Wed Dec 3 10:36:51 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 03 Dec 2008 10:36:51 -0800 Subject: [ofa-general] [PATCH 1/6] IB/ipath - Fix PSN of subsequent send WQEs after an RDMA read resend In-Reply-To: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> References: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> Message-ID: <20081203183651.575.36321.stgit@eng-46.mv.qlogic.com> The PSN of the first packet after an RDMA read is based on the size of the RDMA read request. This is calculated correctly for the WQE sent after the first request message but not on subsequent requests if the RDMA read is resent. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_rc.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c index 7b93cda..9170710 100644 --- a/drivers/infiniband/hw/ipath/ipath_rc.c +++ b/drivers/infiniband/hw/ipath/ipath_rc.c @@ -573,9 +573,8 @@ int ipath_make_rc_req(struct ipath_qp *qp) ohdr->u.rc.reth.length = cpu_to_be32(qp->s_len); qp->s_state = OP(RDMA_READ_REQUEST); hwords += sizeof(ohdr->u.rc.reth) / sizeof(u32); - bth2 = qp->s_psn++ & IPATH_PSN_MASK; - if (ipath_cmp24(qp->s_psn, qp->s_next_psn) > 0) - qp->s_next_psn = qp->s_psn; + bth2 = qp->s_psn & IPATH_PSN_MASK; + qp->s_psn = wqe->lpsn + 1; ss = NULL; len = 0; qp->s_cur++; From ralph.campbell at qlogic.com Wed Dec 3 10:36:56 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 03 Dec 2008 10:36:56 -0800 Subject: [ofa-general] [PATCH 2/6] IB/ipath - Check the return value of dma_map_single for errors In-Reply-To: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> References: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> Message-ID: <20081203183656.575.78548.stgit@eng-46.mv.qlogic.com> This fixes an obvious oversight where the return value is not checked for error. Signed-off-by: Ralph Campbell --- drivers/infiniband/hw/ipath/ipath_sdma.c | 21 ++++++++++++++++----- 1 files changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_sdma.c b/drivers/infiniband/hw/ipath/ipath_sdma.c index 284c9bc..8e255ad 100644 --- a/drivers/infiniband/hw/ipath/ipath_sdma.c +++ b/drivers/infiniband/hw/ipath/ipath_sdma.c @@ -698,10 +698,8 @@ retry: addr = dma_map_single(&dd->pcidev->dev, tx->txreq.map_addr, tx->map_len, DMA_TO_DEVICE); - if (dma_mapping_error(&dd->pcidev->dev, addr)) { - ret = -EIO; - goto unlock; - } + if (dma_mapping_error(&dd->pcidev->dev, addr)) + goto ioerr; dwoffset = tx->map_len >> 2; make_sdma_desc(dd, sdmadesc, (u64) addr, dwoffset, 0); @@ -741,6 +739,8 @@ retry: dw = (len + 3) >> 2; addr = dma_map_single(&dd->pcidev->dev, sge->vaddr, dw << 2, DMA_TO_DEVICE); + if (dma_mapping_error(&dd->pcidev->dev, addr)) + goto unmap; make_sdma_desc(dd, sdmadesc, (u64) addr, dw, dwoffset); /* SDmaUseLargeBuf has to be set in every descriptor */ if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_USELARGEBUF) @@ -798,7 +798,18 @@ retry: list_add_tail(&tx->txreq.list, &dd->ipath_sdma_activelist); if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_VL15) vl15_watchdog_enq(dd); - + goto unlock; + +unmap: + while (tail != dd->ipath_sdma_descq_tail) { + if (!tail) + tail = dd->ipath_sdma_descq_cnt - 1; + else + tail--; + unmap_desc(dd, tail); + } +ioerr: + ret = -EIO; unlock: spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags); fail: From ralph.campbell at qlogic.com Wed Dec 3 10:37:01 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 03 Dec 2008 10:37:01 -0800 Subject: [ofa-general] [PATCH 3/6] IB/ipath - don't count IB symbol and link errors unless link is UP In-Reply-To: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> References: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> Message-ID: <20081203183701.575.85116.stgit@eng-46.mv.qlogic.com> From: Dave Olson Implements the ignoring of ibsymbol errors and linkrecover errors while the link is at less than INIT (long needed), to get accurate counts. Particularly an issue when doing non-IBTA DDR negotiation with chips from vendors that do not support IBTA mode negotiation. If the driver is unloaded, and there is a delta, the adjusted counters are written back to the chip, so they stay adjusted across driver reload. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_iba6120.c | 61 ++++++++++++++++++++++ drivers/infiniband/hw/ipath/ipath_iba7220.c | 76 ++++++++++++++++++++++++++- drivers/infiniband/hw/ipath/ipath_kernel.h | 13 +++++ drivers/infiniband/hw/ipath/ipath_stats.c | 8 +++ 4 files changed, 155 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c index 421cc2a..fbf8c53 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba6120.c +++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c @@ -721,6 +721,12 @@ static int ipath_pe_bringup_serdes(struct ipath_devdata *dd) INFINIPATH_HWE_SERDESPLLFAILED); } + dd->ibdeltainprog = 1; + dd->ibsymsnap = + ipath_read_creg32(dd, dd->ipath_cregs->cr_ibsymbolerrcnt); + dd->iblnkerrsnap = + ipath_read_creg32(dd, dd->ipath_cregs->cr_iblinkerrrecovcnt); + val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_serdesconfig0); config1 = ipath_read_kreg64(dd, dd->ipath_kregs->kr_serdesconfig1); @@ -810,6 +816,36 @@ static void ipath_pe_quiet_serdes(struct ipath_devdata *dd) { u64 val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_serdesconfig0); + if (dd->ibsymdelta || dd->iblnkerrdelta || + dd->ibdeltainprog) { + u64 diagc; + /* enable counter writes */ + diagc = ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwdiagctrl); + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwdiagctrl, + diagc | INFINIPATH_DC_COUNTERWREN); + + if (dd->ibsymdelta || dd->ibdeltainprog) { + val = ipath_read_creg32(dd, + dd->ipath_cregs->cr_ibsymbolerrcnt); + if (dd->ibdeltainprog) + val -= val - dd->ibsymsnap; + val -= dd->ibsymdelta; + ipath_write_creg(dd, + dd->ipath_cregs->cr_ibsymbolerrcnt, val); + } + if (dd->iblnkerrdelta || dd->ibdeltainprog) { + val = ipath_read_creg32(dd, + dd->ipath_cregs->cr_iblinkerrrecovcnt); + if (dd->ibdeltainprog) + val -= val - dd->iblnkerrsnap; + val -= dd->iblnkerrdelta; + ipath_write_creg(dd, + dd->ipath_cregs->cr_iblinkerrrecovcnt, val); + } + + /* and disable counter writes */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwdiagctrl, diagc); + } val |= INFINIPATH_SERDC0_TXIDLE; ipath_dbg("Setting TxIdleEn on serdes (config0 = %llx)\n", (unsigned long long) val); @@ -1749,6 +1785,31 @@ static void ipath_pe_config_jint(struct ipath_devdata *dd, u16 a, u16 b) static int ipath_pe_ib_updown(struct ipath_devdata *dd, int ibup, u64 ibcs) { + if (ibup) { + if (dd->ibdeltainprog) { + dd->ibdeltainprog = 0; + dd->ibsymdelta += + ipath_read_creg32(dd, + dd->ipath_cregs->cr_ibsymbolerrcnt) - + dd->ibsymsnap; + dd->iblnkerrdelta += + ipath_read_creg32(dd, + dd->ipath_cregs->cr_iblinkerrrecovcnt) - + dd->iblnkerrsnap; + } + } else { + dd->ipath_lli_counter = 0; + if (!dd->ibdeltainprog) { + dd->ibdeltainprog = 1; + dd->ibsymsnap = + ipath_read_creg32(dd, + dd->ipath_cregs->cr_ibsymbolerrcnt); + dd->iblnkerrsnap = + ipath_read_creg32(dd, + dd->ipath_cregs->cr_iblinkerrrecovcnt); + } + } + ipath_setup_pe_setextled(dd, ipath_ib_linkstate(dd, ibcs), ipath_ib_linktrstate(dd, ibcs)); return 0; diff --git a/drivers/infiniband/hw/ipath/ipath_iba7220.c b/drivers/infiniband/hw/ipath/ipath_iba7220.c index 9839e20..3b38bc9 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba7220.c +++ b/drivers/infiniband/hw/ipath/ipath_iba7220.c @@ -951,6 +951,12 @@ static int ipath_7220_bringup_serdes(struct ipath_devdata *dd) INFINIPATH_HWE_SERDESPLLFAILED); } + dd->ibdeltainprog = 1; + dd->ibsymsnap = + ipath_read_creg32(dd, dd->ipath_cregs->cr_ibsymbolerrcnt); + dd->iblnkerrsnap = + ipath_read_creg32(dd, dd->ipath_cregs->cr_iblinkerrrecovcnt); + if (!dd->ipath_ibcddrctrl) { /* not on re-init after reset */ dd->ipath_ibcddrctrl = @@ -1084,6 +1090,37 @@ static void ipath_7220_config_jint(struct ipath_devdata *dd, static void ipath_7220_quiet_serdes(struct ipath_devdata *dd) { u64 val; + if (dd->ibsymdelta || dd->iblnkerrdelta || + dd->ibdeltainprog) { + u64 diagc; + /* enable counter writes */ + diagc = ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwdiagctrl); + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwdiagctrl, + diagc | INFINIPATH_DC_COUNTERWREN); + + if (dd->ibsymdelta || dd->ibdeltainprog) { + val = ipath_read_creg32(dd, + dd->ipath_cregs->cr_ibsymbolerrcnt); + if (dd->ibdeltainprog) + val -= val - dd->ibsymsnap; + val -= dd->ibsymdelta; + ipath_write_creg(dd, + dd->ipath_cregs->cr_ibsymbolerrcnt, val); + } + if (dd->iblnkerrdelta || dd->ibdeltainprog) { + val = ipath_read_creg32(dd, + dd->ipath_cregs->cr_iblinkerrrecovcnt); + if (dd->ibdeltainprog) + val -= val - dd->iblnkerrsnap; + val -= dd->iblnkerrdelta; + ipath_write_creg(dd, + dd->ipath_cregs->cr_iblinkerrrecovcnt, val); + } + + /* and disable counter writes */ + ipath_write_kreg(dd, dd->ipath_kregs->kr_hwdiagctrl, diagc); + } + dd->ipath_flags &= ~IPATH_IB_AUTONEG_INPROG; wake_up(&dd->ipath_autoneg_wait); cancel_delayed_work(&dd->ipath_autoneg_work); @@ -2325,7 +2362,7 @@ static void try_auto_neg(struct ipath_devdata *dd) static int ipath_7220_ib_updown(struct ipath_devdata *dd, int ibup, u64 ibcs) { - int ret = 0; + int ret = 0, symadj = 0; u32 ltstate = ipath_ib_linkstate(dd, ibcs); dd->ipath_link_width_active = @@ -2368,6 +2405,13 @@ static int ipath_7220_ib_updown(struct ipath_devdata *dd, int ibup, u64 ibcs) ipath_dbg("DDR negotiation try, %u/%u\n", dd->ipath_autoneg_tries, IPATH_AUTONEG_TRIES); + if (!dd->ibdeltainprog) { + dd->ibdeltainprog = 1; + dd->ibsymsnap = ipath_read_creg32(dd, + dd->ipath_cregs->cr_ibsymbolerrcnt); + dd->iblnkerrsnap = ipath_read_creg32(dd, + dd->ipath_cregs->cr_iblinkerrrecovcnt); + } try_auto_neg(dd); ret = 1; /* no other IB status change processing */ } else if ((dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) @@ -2388,6 +2432,7 @@ static int ipath_7220_ib_updown(struct ipath_devdata *dd, int ibup, u64 ibcs) set_speed_fast(dd, dd->ipath_link_speed_enabled); wake_up(&dd->ipath_autoneg_wait); + symadj = 1; } else if (dd->ipath_flags & IPATH_IB_AUTONEG_FAILED) { /* * clear autoneg failure flag, and do setup @@ -2403,6 +2448,7 @@ static int ipath_7220_ib_updown(struct ipath_devdata *dd, int ibup, u64 ibcs) IBA7220_IBC_IBTA_1_2_MASK; ipath_write_kreg(dd, IPATH_KREG_OFFSET(IBNCModeCtrl), 0); + symadj = 1; } } /* @@ -2416,9 +2462,13 @@ static int ipath_7220_ib_updown(struct ipath_devdata *dd, int ibup, u64 ibcs) IB_WIDTH_4X)) == (IB_WIDTH_1X | IB_WIDTH_4X) && dd->ipath_link_width_active == IB_WIDTH_1X && dd->ipath_x1_fix_tries < 3) { - if (++dd->ipath_x1_fix_tries == 3) + if (++dd->ipath_x1_fix_tries == 3) { dev_info(&dd->pcidev->dev, "IB link is in 1X mode\n"); + if (!(dd->ipath_flags & + IPATH_IB_AUTONEG_INPROG)) + symadj = 1; + } else { ipath_cdbg(VERBOSE, "IB 1X in " "auto-width, try %u to be " @@ -2429,7 +2479,8 @@ static int ipath_7220_ib_updown(struct ipath_devdata *dd, int ibup, u64 ibcs) dd->ipath_f_xgxs_reset(dd); ret = 1; /* skip other processing */ } - } + } else if (!(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG)) + symadj = 1; if (!ret) { dd->delay_mult = rate_to_delay @@ -2440,6 +2491,25 @@ static int ipath_7220_ib_updown(struct ipath_devdata *dd, int ibup, u64 ibcs) } } + if (symadj) { + if (dd->ibdeltainprog) { + dd->ibdeltainprog = 0; + dd->ibsymdelta += ipath_read_creg32(dd, + dd->ipath_cregs->cr_ibsymbolerrcnt) - + dd->ibsymsnap; + dd->iblnkerrdelta += ipath_read_creg32(dd, + dd->ipath_cregs->cr_iblinkerrrecovcnt) - + dd->iblnkerrsnap; + } + } else if (!ibup && !dd->ibdeltainprog + && !(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG)) { + dd->ibdeltainprog = 1; + dd->ibsymsnap = ipath_read_creg32(dd, + dd->ipath_cregs->cr_ibsymbolerrcnt); + dd->iblnkerrsnap = ipath_read_creg32(dd, + dd->ipath_cregs->cr_iblinkerrrecovcnt); + } + if (!ret) ipath_setup_7220_setextled(dd, ipath_ib_linkstate(dd, ibcs), ltstate); diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index 0bd8bcb..aa84153 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -355,6 +355,19 @@ struct ipath_devdata { /* errors masked because they occur too fast */ ipath_err_t ipath_maskederrs; u64 ipath_lastlinkrecov; /* link recoveries at last ACTIVE */ + /* these 5 fields are used to establish deltas for IB Symbol + * errors and linkrecovery errors. They can be reported on + * some chips during link negotiation prior to INIT, and with + * DDR when faking DDR negotiations with non-IBTA switches. + * The chip counters are adjusted at driver unload if there is + * a non-zero delta. + */ + u64 ibdeltainprog; + u64 ibsymdelta; + u64 ibsymsnap; + u64 iblnkerrdelta; + u64 iblnkerrsnap; + /* time in jiffies at which to re-enable maskederrs */ unsigned long ipath_unmasktime; /* count of egrfull errors, combined for all ports */ diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c index c8e3d65..f63e143 100644 --- a/drivers/infiniband/hw/ipath/ipath_stats.c +++ b/drivers/infiniband/hw/ipath/ipath_stats.c @@ -112,6 +112,14 @@ u64 ipath_snap_cntr(struct ipath_devdata *dd, ipath_creg creg) dd->ipath_lastrpkts = val; } val64 = dd->ipath_rpkts; + } else if (creg == dd->ipath_cregs->cr_ibsymbolerrcnt) { + if (dd->ibdeltainprog) + val64 -= val64 - dd->ibsymsnap; + val64 -= dd->ibsymdelta; + } else if (creg == dd->ipath_cregs->cr_iblinkerrrecovcnt) { + if (dd->ibdeltainprog) + val64 -= val64 - dd->iblnkerrsnap; + val64 -= dd->iblnkerrdelta; } else val64 = (u64) val; From ralph.campbell at qlogic.com Wed Dec 3 10:37:07 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 03 Dec 2008 10:37:07 -0800 Subject: [ofa-general] [PATCH 4/6] IB/ipath - Only do 1X workaround on rev1 chips. In-Reply-To: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> References: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> Message-ID: <20081203183706.575.38664.stgit@eng-46.mv.qlogic.com> From: Dave Olson Only do 1X workaround on rev1 chips. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_iba7220.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_iba7220.c b/drivers/infiniband/hw/ipath/ipath_iba7220.c index 3b38bc9..b2a9d4c 100644 --- a/drivers/infiniband/hw/ipath/ipath_iba7220.c +++ b/drivers/infiniband/hw/ipath/ipath_iba7220.c @@ -2452,13 +2452,14 @@ static int ipath_7220_ib_updown(struct ipath_devdata *dd, int ibup, u64 ibcs) } } /* - * if we are in 1X, and are in autoneg width, it - * could be due to an xgxs problem, so if we haven't + * if we are in 1X on rev1 only, and are in autoneg width, + * it could be due to an xgxs problem, so if we haven't * already tried, try twice to get to 4X; if we * tried, and couldn't, report it, since it will * probably not be what is desired. */ - if ((dd->ipath_link_width_enabled & (IB_WIDTH_1X | + if (dd->ipath_minrev == 1 && + (dd->ipath_link_width_enabled & (IB_WIDTH_1X | IB_WIDTH_4X)) == (IB_WIDTH_1X | IB_WIDTH_4X) && dd->ipath_link_width_active == IB_WIDTH_1X && dd->ipath_x1_fix_tries < 3) { From ralph.campbell at qlogic.com Wed Dec 3 10:37:12 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 03 Dec 2008 10:37:12 -0800 Subject: [ofa-general] [PATCH 5/6] IB/ipath - fix spi_pioindex value. In-Reply-To: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> References: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> Message-ID: <20081203183712.575.22002.stgit@eng-46.mv.qlogic.com> From: Dave Olson ipath_piobufbase was a single value offset, but is multiple values on newer chips, so use only the 32 bits for the 2K buffers (4K buffers are currently used only by the driver). Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_file_ops.c | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index 1af1f3a..ceab52c 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -223,8 +223,13 @@ static int ipath_get_base_info(struct file *fp, (unsigned long long) kinfo->spi_subport_rcvhdr_base); } - kinfo->spi_pioindex = (kinfo->spi_piobufbase - dd->ipath_piobufbase) / - dd->ipath_palign; + /* + * All user buffers are 2KB buffers. If we ever support + * giving 4KB buffers to user processes, this will need some + * work. + */ + kinfo->spi_pioindex = (kinfo->spi_piobufbase - + (dd->ipath_piobufbase & 0xffffffff)) / dd->ipath_palign; kinfo->spi_pioalign = dd->ipath_palign; kinfo->spi_qpair = IPATH_KD_QP; From ralph.campbell at qlogic.com Wed Dec 3 10:37:18 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Wed, 03 Dec 2008 10:37:18 -0800 Subject: [ofa-general] [PATCH 6/6] IB/ipath - Add locking for interrupt use of ipath_pd contexts vs free. In-Reply-To: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> References: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> Message-ID: <20081203183717.575.2020.stgit@eng-46.mv.qlogic.com> From: Dave Olson Fixes timing race resulting in panic. Not a performance sensitive path. Signed-off-by: Dave Olson --- drivers/infiniband/hw/ipath/ipath_driver.c | 49 +++++++++++++++---------- drivers/infiniband/hw/ipath/ipath_file_ops.c | 21 +++++------ drivers/infiniband/hw/ipath/ipath_init_chip.c | 1 + drivers/infiniband/hw/ipath/ipath_kernel.h | 2 + drivers/infiniband/hw/ipath/ipath_keys.c | 2 + drivers/infiniband/hw/ipath/ipath_mad.c | 2 + drivers/infiniband/hw/ipath/ipath_verbs.c | 3 +- 7 files changed, 49 insertions(+), 31 deletions(-) diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c index ad0aab6..69c0ce3 100644 --- a/drivers/infiniband/hw/ipath/ipath_driver.c +++ b/drivers/infiniband/hw/ipath/ipath_driver.c @@ -661,6 +661,8 @@ bail: static void __devexit cleanup_device(struct ipath_devdata *dd) { int port; + struct ipath_portdata **tmp; + unsigned long flags; if (*dd->ipath_statusp & IPATH_STATUS_CHIP_PRESENT) { /* can't do anything more with chip; needs re-init */ @@ -742,20 +744,21 @@ static void __devexit cleanup_device(struct ipath_devdata *dd) /* * free any resources still in use (usually just kernel ports) - * at unload; we do for portcnt, not cfgports, because cfgports - * could have changed while we were loaded. + * at unload; we do for portcnt, because that's what we allocate. + * We acquire lock to be really paranoid that ipath_pd isn't being + * accessed from some interrupt-related code (that should not happen, + * but best to be sure). */ + spin_lock_irqsave(&dd->ipath_uctxt_lock, flags); + tmp = dd->ipath_pd; + dd->ipath_pd = NULL; + spin_unlock_irqrestore(&dd->ipath_uctxt_lock, flags); for (port = 0; port < dd->ipath_portcnt; port++) { - struct ipath_portdata *pd = dd->ipath_pd[port]; - dd->ipath_pd[port] = NULL; + struct ipath_portdata *pd = tmp[port]; + tmp[port] = NULL; /* debugging paranoia */ ipath_free_pddata(dd, pd); } - kfree(dd->ipath_pd); - /* - * debuggability, in case some cleanup path tries to use it - * after this - */ - dd->ipath_pd = NULL; + kfree(tmp); } static void __devexit ipath_remove_one(struct pci_dev *pdev) @@ -2586,6 +2589,7 @@ int ipath_reset_device(int unit) { int ret, i; struct ipath_devdata *dd = ipath_lookup(unit); + unsigned long flags; if (!dd) { ret = -ENODEV; @@ -2611,18 +2615,21 @@ int ipath_reset_device(int unit) goto bail; } + spin_lock_irqsave(&dd->ipath_uctxt_lock, flags); if (dd->ipath_pd) for (i = 1; i < dd->ipath_cfgports; i++) { - if (dd->ipath_pd[i] && dd->ipath_pd[i]->port_cnt) { - ipath_dbg("unit %u port %d is in use " - "(PID %u cmd %s), can't reset\n", - unit, i, - pid_nr(dd->ipath_pd[i]->port_pid), - dd->ipath_pd[i]->port_comm); - ret = -EBUSY; - goto bail; - } + if (!dd->ipath_pd[i] || !dd->ipath_pd[i]->port_cnt) + continue; + spin_unlock_irqrestore(&dd->ipath_uctxt_lock, flags); + ipath_dbg("unit %u port %d is in use " + "(PID %u cmd %s), can't reset\n", + unit, i, + pid_nr(dd->ipath_pd[i]->port_pid), + dd->ipath_pd[i]->port_comm); + ret = -EBUSY; + goto bail; } + spin_unlock_irqrestore(&dd->ipath_uctxt_lock, flags); if (dd->ipath_flags & IPATH_HAS_SEND_DMA) teardown_sdma(dd); @@ -2656,9 +2663,12 @@ static int ipath_signal_procs(struct ipath_devdata *dd, int sig) { int i, sub, any = 0; struct pid *pid; + unsigned long flags; if (!dd->ipath_pd) return 0; + + spin_lock_irqsave(&dd->ipath_uctxt_lock, flags); for (i = 1; i < dd->ipath_cfgports; i++) { if (!dd->ipath_pd[i] || !dd->ipath_pd[i]->port_cnt) continue; @@ -2682,6 +2692,7 @@ static int ipath_signal_procs(struct ipath_devdata *dd, int sig) any++; } } + spin_unlock_irqrestore(&dd->ipath_uctxt_lock, flags); return any; } diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c index ceab52c..239d4e8 100644 --- a/drivers/infiniband/hw/ipath/ipath_file_ops.c +++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c @@ -2046,7 +2046,9 @@ static int ipath_close(struct inode *in, struct file *fp) struct ipath_filedata *fd; struct ipath_portdata *pd; struct ipath_devdata *dd; + unsigned long flags; unsigned port; + struct pid *pid; ipath_cdbg(VERBOSE, "close on dev %lx, private data %p\n", (long)in->i_rdev, fp->private_data); @@ -2079,14 +2081,13 @@ static int ipath_close(struct inode *in, struct file *fp) mutex_unlock(&ipath_mutex); goto bail; } + /* early; no interrupt users after this */ + spin_lock_irqsave(&dd->ipath_uctxt_lock, flags); port = pd->port_port; - - if (pd->port_hdrqfull) { - ipath_cdbg(PROC, "%s[%u] had %u rcvhdrqfull errors " - "during run\n", pd->port_comm, pid_nr(pd->port_pid), - pd->port_hdrqfull); - pd->port_hdrqfull = 0; - } + dd->ipath_pd[port] = NULL; + pid = pd->port_pid; + pd->port_pid = NULL; + spin_unlock_irqrestore(&dd->ipath_uctxt_lock, flags); if (pd->port_rcvwait_to || pd->port_piowait_to || pd->port_rcvnowait || pd->port_pionowait) { @@ -2143,13 +2144,11 @@ static int ipath_close(struct inode *in, struct file *fp) unlock_expected_tids(pd); ipath_stats.sps_ports--; ipath_cdbg(PROC, "%s[%u] closed port %u:%u\n", - pd->port_comm, pid_nr(pd->port_pid), + pd->port_comm, pid_nr(pid), dd->ipath_unit, port); } - put_pid(pd->port_pid); - pd->port_pid = NULL; - dd->ipath_pd[pd->port_port] = NULL; /* before releasing mutex */ + put_pid(pid); mutex_unlock(&ipath_mutex); ipath_free_pddata(dd, pd); /* after releasing the mutex */ diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c index 3e5baa4..64aeefb 100644 --- a/drivers/infiniband/hw/ipath/ipath_init_chip.c +++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c @@ -229,6 +229,7 @@ static int init_chip_first(struct ipath_devdata *dd) spin_lock_init(&dd->ipath_kernel_tid_lock); spin_lock_init(&dd->ipath_user_tid_lock); spin_lock_init(&dd->ipath_sendctrl_lock); + spin_lock_init(&dd->ipath_uctxt_lock); spin_lock_init(&dd->ipath_sdma_lock); spin_lock_init(&dd->ipath_gpio_lock); spin_lock_init(&dd->ipath_eep_st_lock); diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h index aa84153..6ba4861 100644 --- a/drivers/infiniband/hw/ipath/ipath_kernel.h +++ b/drivers/infiniband/hw/ipath/ipath_kernel.h @@ -477,6 +477,8 @@ struct ipath_devdata { spinlock_t ipath_kernel_tid_lock; spinlock_t ipath_user_tid_lock; spinlock_t ipath_sendctrl_lock; + /* around ipath_pd and (user ports) port_cnt use (intr vs free) */ + spinlock_t ipath_uctxt_lock; /* * IPATH_STATUS_*, diff --git a/drivers/infiniband/hw/ipath/ipath_keys.c b/drivers/infiniband/hw/ipath/ipath_keys.c index 8f32b17..c0e933f 100644 --- a/drivers/infiniband/hw/ipath/ipath_keys.c +++ b/drivers/infiniband/hw/ipath/ipath_keys.c @@ -132,6 +132,7 @@ int ipath_lkey_ok(struct ipath_qp *qp, struct ipath_sge *isge, * (see ipath_get_dma_mr and ipath_dma.c). */ if (sge->lkey == 0) { + /* always a kernel port, no locking needed */ struct ipath_pd *pd = to_ipd(qp->ibqp.pd); if (pd->user) { @@ -211,6 +212,7 @@ int ipath_rkey_ok(struct ipath_qp *qp, struct ipath_sge_state *ss, * (see ipath_get_dma_mr and ipath_dma.c). */ if (rkey == 0) { + /* always a kernel port, no locking needed */ struct ipath_pd *pd = to_ipd(qp->ibqp.pd); if (pd->user) { diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c index be4fc9a..17a1231 100644 --- a/drivers/infiniband/hw/ipath/ipath_mad.c +++ b/drivers/infiniband/hw/ipath/ipath_mad.c @@ -348,6 +348,7 @@ bail: */ static int get_pkeys(struct ipath_devdata *dd, u16 * pkeys) { + /* always a kernel port, no locking needed */ struct ipath_portdata *pd = dd->ipath_pd[0]; memcpy(pkeys, pd->port_pkeys, sizeof(pd->port_pkeys)); @@ -730,6 +731,7 @@ static int set_pkeys(struct ipath_devdata *dd, u16 *pkeys) int i; int changed = 0; + /* always a kernel port, no locking needed */ pd = dd->ipath_pd[0]; for (i = 0; i < ARRAY_SIZE(pd->port_pkeys); i++) { diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c index eabc424..cdf0e6a 100644 --- a/drivers/infiniband/hw/ipath/ipath_verbs.c +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c @@ -1852,7 +1852,7 @@ unsigned ipath_get_npkeys(struct ipath_devdata *dd) } /** - * ipath_get_pkey - return the indexed PKEY from the port 0 PKEY table + * ipath_get_pkey - return the indexed PKEY from the port PKEY table * @dd: the infinipath device * @index: the PKEY index */ @@ -1860,6 +1860,7 @@ unsigned ipath_get_pkey(struct ipath_devdata *dd, unsigned index) { unsigned ret; + /* always a kernel port, no locking needed */ if (index >= ARRAY_SIZE(dd->ipath_pd[0]->port_pkeys)) ret = 0; else From Jesse.Butler at Sun.COM Wed Dec 3 20:33:06 2008 From: Jesse.Butler at Sun.COM (Jesse Butler) Date: Wed, 03 Dec 2008 23:33:06 -0500 Subject: [ofa-general] Re: [ewg] rhel 5.2 iSER support? In-Reply-To: <49363691.50609@voltaire.com> References: <4935B02B.9020408@sun.com> <49363691.50609@voltaire.com> Message-ID: <71DB1A3F-05CB-4A75-BEA6-D51ABD084595@sun.com> For me, this ended up being that the CM service was not yet configured at the time that I was attempting to login. So, it is possible that you need to set the port settling time attribute to ensure that the port are configured properly. Sameer, ping me directly if you need further assistance. /jb On Dec 3, 2008, at 2:34 AM, Or Gerlitz wrote: > Sameer Mehta wrote: >> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_connect:connecting >> to: 192.168.0.5, port 0xbc0c >> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_cma_handler:event >> 0 conn ffff81015de00bc0 id ffff81017fc8e200 >> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_cma_handler:event >> 2 conn ffff81015de00bc0 id ffff81017fc8e200 >> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: >> iser_create_ib_conn_res:setting conn ffff81015de00bc0 cma_id >> ffff81017fc8e200: fmr_pool ffff810140c9aec0 qp ffff810168974e00 >> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_cma_handler:event >> 8 conn ffff81015de00bc0 id ffff81017fc8e200 >> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_cma_handler:event: >> 8, error: 8 >> >> Am I missing something here? is iSER transport available in v1.4? > You are getting REJECTED (8) event with the reject reason being > INVALID_SERVICE_ID (8), see include/rdma/ib_cm.h. This means there's > no one listening on the Service-ID you are attempting to connect to, > eg your target didn't issue a listen call on the SID (service id) > you are trying to connect to or there's some mismatch is the SID as > constructed by the initiator, etc. > > Related inter-op issue has been brought by Jesse Butler from Sun > couple of months ago, http://lists.openfabrics.org/pipermail/general/2008-October/054487.html > but I am not sure where it stands. > > The code that builds the SID from the tcp port is cma_get_service_id > (drivers/infiniband/core/cma.c, below) where in this case the > resulted SID is 0x0000000001060cbc > > Or. >> static __be64 cma_get_service_id(enum rdma_port_space ps, struct >> sockaddr *addr) >> { >> return cpu_to_be64(((u64)ps << 16) + be16_to_cpu(cma_port(addr))); >> } > > > > > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From vlad at dev.mellanox.co.il Wed Dec 3 23:45:53 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 04 Dec 2008 09:45:53 +0200 Subject: [ofa-general] error building ofa_kernel RPMs In-Reply-To: <4936C8AE.3000304@englobe-tec.com> References: <4936C8AE.3000304@englobe-tec.com> Message-ID: <49378AB1.1040505@dev.mellanox.co.il> David Rioja wrote: > Dear all, > > I've tried to build OFED 1.2.5 in my server with fedora core 9. First > I had a problem while building ofa_user RPMs but this got solved by > installing zlib-static package. I think I have all the rest of > dependencies installed, both x68_64 and i386 versions. > Hi David, Fedora Core 9 will be supported by OFED-1.4 release. See OFED-1.2.5/docs/OFED_release_notes.txt for the list of supported platforms and operating systems. Regards, Vladimir From rams at englobe-tec.com Thu Dec 4 03:15:43 2008 From: rams at englobe-tec.com (David Rioja) Date: Thu, 04 Dec 2008 12:15:43 +0100 Subject: [ofa-general] debuginfo packages problem Message-ID: <4937BBDF.8080600@englobe-tec.com> Hello, I successfully tried to install basic OFED 1.4 on my fedora9 node image. I liked more when you could separate the build and install process without having to install development packages in my node image (my nodes are stateless) but I think I can remove all that stuff when everything is done. Now I'm trying to build the full set of packages and I have the following problem with all the "debuginfo" ones. Perhaps customizing to avoid their installation will let me go on, but I'd like to understand what's the problem for documentation purposes. *** Build libibverbs-debuginfo RPM Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' --define 'dist %{nil}' --target x86_64 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /root/IB/SRPMS/libibverbs-1.1.2-1.ofed1.4.rc6.src.rpm libibverbs-debuginfo was not created *** Regards, David From vlad at lists.openfabrics.org Thu Dec 4 03:24:30 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 4 Dec 2008 03:24:30 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081204-0200 daily build status Message-ID: <20081204112431.3F74FE60C53@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From vlad at dev.mellanox.co.il Thu Dec 4 04:14:18 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Thu, 04 Dec 2008 14:14:18 +0200 Subject: [ofa-general] debuginfo packages problem In-Reply-To: <4937BBDF.8080600@englobe-tec.com> References: <4937BBDF.8080600@englobe-tec.com> Message-ID: <4937C99A.7060700@dev.mellanox.co.il> David Rioja wrote: > Hello, > I successfully tried to install basic OFED 1.4 on my fedora9 node > image. I liked more when you could separate the build and install > process without having to install development packages in my node > image (my nodes are stateless) but I think I can remove all that stuff > when everything is done. > > Now I'm trying to build the full set of packages and I have the > following problem with all the "debuginfo" ones. Perhaps customizing > to avoid their installation will let me go on, but I'd like to > understand what's the problem for documentation purposes. > > *** > Build libibverbs-debuginfo RPM > Running rpmbuild --rebuild --define '_topdir /var/tmp/OFED_topdir' > --define 'dist %{nil}' --target x86_64 --define '_prefix /usr' > --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define > '_usr /usr' /root/IB/SRPMS/libibverbs-1.1.2-1.ofed1.4.rc6.src.rpm > > libibverbs-debuginfo was not created > *** > > Regards, > David Hi David, Please check that you have 'redhat-rpm-config' rpm installed. Regards, Vladimir From kliteyn at dev.mellanox.co.il Thu Dec 4 04:58:17 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 04 Dec 2008 14:58:17 +0200 Subject: [ofa-general] [PATCH] opensm: release notes for OSM 3.2.4 Message-ID: <4937D3E9.3020804@dev.mellanox.co.il> Hi Sasha, Adding some info for OSM 3.2.4 release notes. Hope I didn't miss something. Signed-off-by: Yevgeny Kliteynik --- opensm/doc/opensm_release_notes-3.2.txt | 16 +++++++++++++--- 1 files changed, 13 insertions(+), 3 deletions(-) diff --git a/opensm/doc/opensm_release_notes-3.2.txt b/opensm/doc/opensm_release_notes-3.2.txt index ce7ad90..ae56868 100644 --- a/opensm/doc/opensm_release_notes-3.2.txt +++ b/opensm/doc/opensm_release_notes-3.2.txt @@ -1,16 +1,16 @@ OpenSM Release Notes 3.2 ============================= -Version: OpenSM 3.2.3 +Version: OpenSM 3.2.4 Repo: git://git.openfabrics.org/~sashak/management.git -Date: Oct 2008 +Date: Dec 2008 1 Overview ---------- This document describes the contents of the OpenSM 3.2 release. OpenSM is an InfiniBand compliant Subnet Manager and Administration, and runs on top of OpenIB. The OpenSM version for this release -is openib-3.2.3 +is openib-3.2.4 This document includes the following sections: 1 This Overview section (describing new features and software @@ -109,6 +109,12 @@ This document includes the following sections: * Respond to new trap 144 node description update flag +* When our SM is in Standby state and its priority is increased + (via console command), notify master SM by sending Trap 144. + +* When entering standby state (after discovery) notify master SM + with Trap 144. + * Add '--connect_roots' command line options. This preserves connectivity between root nodes in Up/Down routing algorithm @@ -158,6 +164,8 @@ This document includes the following sections: * Add 'reroute' console command +* Add 'dump_conf' console command + * Remove many install-exec-hook from Makefiles * Some cleanups in LASH routing algorithm code @@ -188,6 +196,8 @@ This document includes the following sections: * Unify options listing in OpenSM usage message +* OpenSM performs sweep on SIGCONT (coming out of suspend). + 1.3 Library API Changes None -- 1.5.1.4 From hrosenstock at obsidianresearch.com Thu Dec 4 06:01:24 2008 From: hrosenstock at obsidianresearch.com (Hal Rosenstock) Date: Thu, 04 Dec 2008 07:01:24 -0700 Subject: [ofa-general] [PATCH][TRIVIAL] opensm/osm_lid_mgr.c: Commentary fix Message-ID: <1228399284.29873.8.camel@bertha1.edm.orcorp.ca> Sasha, Trivial commentary typo patch. -- Hal -------------- next part -------------- opensm/osm_lid_mgr.c: Fix commentary typo Signed-off-by: Hal Rosenstock diff --git a/opensm/opensm/osm_lid_mgr.c b/opensm/opensm/osm_lid_mgr.c index c90292a..b74aba5 100644 --- a/opensm/opensm/osm_lid_mgr.c +++ b/opensm/opensm/osm_lid_mgr.c @@ -869,7 +869,7 @@ __osm_lid_mgr_set_remote_pi_state_to_init(IN osm_lid_mgr_t * const p_mgr, if (p_rem_physp == NULL) return; - /* but in some rare cases the remote side might be irresponsive */ + /* but in some rare cases the remote side might be non responsive */ ib_port_info_set_port_state(&p_rem_physp->port_info, IB_LINK_INIT); } From monis at Voltaire.COM Thu Dec 4 08:10:50 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Thu, 04 Dec 2008 18:10:50 +0200 Subject: [ofa-general] [PATCH] IB/IPoIB: Decrease the time that invalid paths stay useless Message-ID: <4938010A.40701@Voltaire.COM> If a remote LID change occurs (causing only IPOIB_FLUSH_LIGHT event on the node) or when path completion returns with an error, it might take a long time untill the next path lookup. This depends on when the kernel sneds an ARP probe packet that will trigger a path lookup. This patch adds a task that is responsible to restart the lookup of invalid paths. This taks is scheduled to run on 2 occasions 1. IPOIB_FLUSH_LIGHT event happens 2. Path completion returned with bad status Signed-off-by: Moni Shoua --- drivers/infiniband/ulp/ipoib/ipoib.h | 6 +++- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 2 - drivers/infiniband/ulp/ipoib/ipoib_main.c | 37 +++++++++++++++++++++++++----- 3 files changed, 37 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index e0c7dfa..98564c3 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -298,6 +298,7 @@ struct ipoib_dev_priv { struct work_struct flush_heavy; struct work_struct restart_task; struct delayed_work ah_reap_task; + struct delayed_work path_refresh_task; struct ib_device *ca; u8 port; @@ -378,7 +379,7 @@ struct ipoib_path { struct rb_node rb_node; struct list_head list; - int valid; + u8 stale; }; struct ipoib_neigh { @@ -442,8 +443,9 @@ int ipoib_add_umcast_attr(struct net_device *dev); void ipoib_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_ah *address, u32 qpn); void ipoib_reap_ah(struct work_struct *work); +void ipoib_refresh_paths(struct work_struct *work); -void ipoib_mark_paths_invalid(struct net_device *dev); +void ipoib_mark_paths_stale(struct net_device *dev); void ipoib_flush_paths(struct net_device *dev); struct ipoib_dev_priv *ipoib_intf_alloc(const char *format); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 28eb6f0..ff52314 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -962,7 +962,7 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, } if (level == IPOIB_FLUSH_LIGHT) { - ipoib_mark_paths_invalid(dev); + ipoib_mark_paths_stale(dev); ipoib_mcast_dev_flush(dev); } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 85257f6..c9b5890 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -352,7 +352,7 @@ void ipoib_path_iter_read(struct ipoib_path_iter *iter, #endif /* CONFIG_INFINIBAND_IPOIB_DEBUG */ -void ipoib_mark_paths_invalid(struct net_device *dev) +void ipoib_mark_paths_stale(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path, *tp; @@ -360,12 +360,15 @@ void ipoib_mark_paths_invalid(struct net_device *dev) spin_lock_irq(&priv->lock); list_for_each_entry_safe(path, tp, &priv->path_list, list) { - ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " invalid\n", + ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " stale\n", be16_to_cpu(path->pathrec.dlid), IPOIB_GID_ARG(path->pathrec.dgid)); - path->valid = 0; + path->stale = 1; } + if (!list_empty(&priv->path_list)) + queue_delayed_work(ipoib_workqueue, &priv->path_refresh_task, + round_jiffies_relative(HZ)); spin_unlock_irq(&priv->lock); } @@ -427,6 +430,10 @@ static void path_rec_completion(int status, if (!ib_init_ah_from_path(priv->ca, priv->port, pathrec, &av)) ah = ipoib_create_ah(dev, priv->pd, &av); + } else { + path->stale = 1; + queue_delayed_work(ipoib_workqueue, &priv->path_refresh_task, + round_jiffies_relative(HZ)); } spin_lock_irqsave(&priv->lock, flags); @@ -477,7 +484,6 @@ static void path_rec_completion(int status, while ((skb = __skb_dequeue(&neigh->queue))) __skb_queue_tail(&skqueue, skb); } - path->valid = 1; } path->query = NULL; @@ -551,9 +557,29 @@ static int path_rec_start(struct net_device *dev, return path->query_id; } + path->stale = 0; return 0; } +void ipoib_refresh_paths(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, path_refresh_task.work); + struct net_device *dev = priv->dev; + struct ipoib_path *path, *tp; + + spin_lock_irq(&priv->lock); + list_for_each_entry_safe(path, tp, &priv->path_list, list) { + ipoib_dbg(priv, "restart path LID 0x%04x GID " IPOIB_GID_FMT "\n", + be16_to_cpu(path->pathrec.dlid), + IPOIB_GID_ARG(path->pathrec.dgid)); + if (path->stale) + path_rec_start(dev, path); + } + + spin_unlock_irq(&priv->lock); +} + static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -656,7 +682,7 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, spin_lock_irqsave(&priv->lock, flags); path = __path_find(dev, phdr->hwaddr + 4); - if (!path || !path->valid) { + if (!path) { if (!path) path = path_rec_create(dev, phdr->hwaddr + 4); if (path) { @@ -1071,6 +1097,7 @@ static void ipoib_setup(struct net_device *dev) INIT_WORK(&priv->flush_heavy, ipoib_ib_dev_flush_heavy); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); + INIT_DELAYED_WORK(&priv->path_refresh_task, ipoib_refresh_paths); } struct ipoib_dev_priv *ipoib_intf_alloc(const char *name) From sashak at voltaire.com Thu Dec 4 08:23:00 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 4 Dec 2008 18:23:00 +0200 Subject: [ofa-general] Re: [PATCH] opensm: release notes for OSM 3.2.4 In-Reply-To: <4937D3E9.3020804@dev.mellanox.co.il> References: <4937D3E9.3020804@dev.mellanox.co.il> Message-ID: <20081204162300.GF6183@sashak.voltaire.com> On 14:58 Thu 04 Dec , Yevgeny Kliteynik wrote: > Hi Sasha, > > Adding some info for OSM 3.2.4 release notes. > Hope I didn't miss something. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Some more 3.2.4 related additions are below... Sasha >From 8c78b9b0b55e23b0f7d7ba0a27221522d34e9c03 Mon Sep 17 00:00:00 2001 From: Sasha Khapyorsky Date: Thu, 4 Dec 2008 18:18:32 +0200 Subject: [PATCH] opensm: more 3.2.4 related things for RN Add more OpenSM 3.2.4 related things to Release Notes. Signed-off-by: Sasha Khapyorsky --- opensm/doc/opensm_release_notes-3.2.txt | 29 +++++++++++++++++++++-------- 1 files changed, 21 insertions(+), 8 deletions(-) diff --git a/opensm/doc/opensm_release_notes-3.2.txt b/opensm/doc/opensm_release_notes-3.2.txt index ae56868..4e60113 100644 --- a/opensm/doc/opensm_release_notes-3.2.txt +++ b/opensm/doc/opensm_release_notes-3.2.txt @@ -109,12 +109,6 @@ This document includes the following sections: * Respond to new trap 144 node description update flag -* When our SM is in Standby state and its priority is increased - (via console command), notify master SM by sending Trap 144. - -* When entering standby state (after discovery) notify master SM - with Trap 144. - * Add '--connect_roots' command line options. This preserves connectivity between root nodes in Up/Down routing algorithm @@ -164,8 +158,6 @@ This document includes the following sections: * Add 'reroute' console command -* Add 'dump_conf' console command - * Remove many install-exec-hook from Makefiles * Some cleanups in LASH routing algorithm code @@ -196,8 +188,20 @@ This document includes the following sections: * Unify options listing in OpenSM usage message +* LFT buffers handling simplification + +* Add 'dump_conf' console command + * OpenSM performs sweep on SIGCONT (coming out of suspend). +* When our SM is in Standby state and its priority is increased + (via console command), notify master SM by sending Trap 144. + +* When entering standby state (after discovery) notify master SM + with Trap 144. + +* support more PortInfo:CapabilityMask bits + 1.3 Library API Changes None @@ -365,6 +369,15 @@ information regarding each compliance statement. * opensm/osm_mcast_mgr: fix memory leak +* opensm: fix qos config parsing bugs + +* opensm/osm_mcast_tbl.c: fix sending invalid MF block due to max mlid + overflow + +* opensm: log_max_size config parameter in MB + +* opensm/osm_ucast_lash: fix extra memory allocations + * Other less critical or visible bugs were also fixed. 5 Main Verification Flows -- 1.6.1.rc1.45.g123ed From sashak at voltaire.com Thu Dec 4 08:23:37 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 4 Dec 2008 18:23:37 +0200 Subject: [ofa-general] Re: [PATCH][TRIVIAL] opensm/osm_lid_mgr.c: Commentary fix In-Reply-To: <1228399284.29873.8.camel@bertha1.edm.orcorp.ca> References: <1228399284.29873.8.camel@bertha1.edm.orcorp.ca> Message-ID: <20081204162337.GG6183@sashak.voltaire.com> On 07:01 Thu 04 Dec , Hal Rosenstock wrote: > Sasha, > > Trivial commentary typo patch. > > -- Hal > opensm/osm_lid_mgr.c: Fix commentary typo > > Signed-off-by: Hal Rosenstock Applied. Thanks. Sasha From sashak at voltaire.com Thu Dec 4 08:38:00 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 4 Dec 2008 18:38:00 +0200 Subject: [ofa-general] Re: [PATCH] OpenSM: Fixed GUID check against cn_guid_file usinf Ftree. In-Reply-To: <49369BF3.5040902@ext.bull.net> References: <49369BF3.5040902@ext.bull.net> Message-ID: <20081204163759.GH6183@sashak.voltaire.com> On 15:47 Wed 03 Dec , Nicolas Morey Chaisemartin wrote: > Port GUID was not converted using cl_ntoh64 before being searched in the CN > cl_qmap. > Therefore, the cn_guid_file needed to be reversed (bigendian<->littleendian > conversion) so it would recognize the nodes. > This patch makes it consistent with the root_guid_file and the log > messages. > > > Signed-off-by: Nicolas Morey-Chaisemartin > Applied. Thanks. Sasha From chas at cmf.nrl.navy.mil Thu Dec 4 09:06:27 2008 From: chas at cmf.nrl.navy.mil (Chas Williams (CONTRACTOR)) Date: Thu, 04 Dec 2008 12:06:27 -0500 Subject: [ofa-general] hca sma non-responsive but link still Active Message-ID: <200812041706.mB4H6RnF028155@cmf.nrl.navy.mil> if i load the attached module on my host, the link winds up in a curious state. the intent of the module is to duplicate a particular type of kernel hang that blocks all the cpus from handling any work. what happens is that the sma stops responding: # ibportstate 90 1 ibportstate: iberror: failed: smp query nodeinfo failed but the switch port on the other end of the link still reports a valid state: # ibportstate 70 18 PortInfo: # Port info: Lid 70 port 18 LinkState:.......................Active PhysLinkState:...................LinkUp LinkWidthSupported:..............1X or 4X LinkWidthEnabled:................1X or 4X LinkWidthActive:.................4X LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps LinkSpeedEnabled:................2.5 Gbps LinkSpeedActive:.................2.5 Gbps ibwarn: [6758] _do_madrpc: recv failed: Connection timed out ibportstate: iberror: failed: smp query nodeinfo failed we believe that the link layer is handled entirely in the firmware which has no idea that the sma part in the kernel has gone to sleep. the periodic light sweeps by the opensm dont seem to discover this problem either. this type of failure tends to make the ib utilities that scan the network run rather slowly. ibdiagnet does indeed spot this broken host, but perhaps the sm could be extended to attempt to something about this host, like reset the switch port? should it really require manual intervention to clear this error? /* doom.c -- reliably wedge an smp kernel * * build: * echo 'obj-m += doom.o' > Makefile * make -C /lib/modules/`uname -r`/build M=`pwd` * * usage: * insmod doom.ko */ #include #include #include #include #include static void wedge(void *data) { unsigned long flags; spinlock_t lock; printk(KERN_ERR "goodbye cruel world...\n"); spin_lock_init(&lock); spin_lock_irqsave(&lock, flags); while (1) /* do nothing */; } static int __init doom_init(void) { int i; for_each_possible_cpu(i) { if (i != smp_processor_id()) smp_call_function_single(i, wedge, 0, 0, 0); } smp_call_function_single(smp_processor_id(), wedge, 0, 0, 0); return 0; } module_init(doom_init); MODULE_AUTHOR("chas williams "); MODULE_DESCRIPTION("wedge the kernel but good"); MODULE_LICENSE("GPL"); From hal.rosenstock at gmail.com Thu Dec 4 10:22:16 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 4 Dec 2008 13:22:16 -0500 Subject: ***SPAM*** Re: [ofa-general] hca sma non-responsive but link still Active In-Reply-To: <200812041706.mB4H6RnF028155@cmf.nrl.navy.mil> References: <200812041706.mB4H6RnF028155@cmf.nrl.navy.mil> Message-ID: On Thu, Dec 4, 2008 at 12:06 PM, Chas Williams (CONTRACTOR) wrote: > if i load the attached module on my host, the link winds up in a curious > state. the intent of the module is to duplicate a particular type of > kernel hang that blocks all the cpus from handling any work. > > what happens is that the sma stops responding: > > # ibportstate 90 1 > ibportstate: iberror: failed: smp query nodeinfo failed > > but the switch port on the other end of the link still reports a valid > state: > > # ibportstate 70 18 > PortInfo: > # Port info: Lid 70 port 18 > LinkState:.......................Active > PhysLinkState:...................LinkUp > LinkWidthSupported:..............1X or 4X > LinkWidthEnabled:................1X or 4X > LinkWidthActive:.................4X > LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps > LinkSpeedEnabled:................2.5 Gbps > LinkSpeedActive:.................2.5 Gbps > ibwarn: [6758] _do_madrpc: recv failed: Connection timed out > ibportstate: iberror: failed: smp query nodeinfo failed > > we believe that the link layer is handled entirely in the firmware Mostly but some control is from the host (e.g. in terms of setting port physical state) being passed down by the host SMA (at least in terms of Linux kernel on Mellanox HCAs). > which has no idea that the sma part in the kernel has gone to sleep. Right; the part of the SMA in firmware is mainly passive and requires the host interaction but does not detect it's mis or non behavior. > the periodic light sweeps by the opensm dont seem to discover this > problem either. Light sweep only polls SwitchInfo looking to see if there is more to be done. If SwitchInfo doesn't indicate some port state change (which it doesn't for this case), then it won't see this. > this type of failure tends to make the ib utilities that scan the network > run rather slowly. ibdiagnet does indeed spot this broken host, but > perhaps the sm could be extended to attempt to something about this > host, like reset the switch port? IMO the best approach would be for the firmware to drop the link when the host becomes non responsive (and only allow it to come back when the host is responsive) rather than putting additional policy (detection/reaction/etc) into OpenSM. > should it really require manual intervention to clear this error? Ideally no but this node is violating its "contract" in that if the physical state is INIT or beyond, it's required to respond to SMA packets. -- Hal > /* doom.c -- reliably wedge an smp kernel > * > * build: > * echo 'obj-m += doom.o' > Makefile > * make -C /lib/modules/`uname -r`/build M=`pwd` > * > * usage: > * insmod doom.ko > */ > > #include > #include > #include > #include > #include > > static void wedge(void *data) > { > unsigned long flags; > spinlock_t lock; > > printk(KERN_ERR "goodbye cruel world...\n"); > > spin_lock_init(&lock); > spin_lock_irqsave(&lock, flags); > > while (1) > /* do nothing */; > } > > static int __init doom_init(void) > { > int i; > > for_each_possible_cpu(i) { > if (i != smp_processor_id()) > smp_call_function_single(i, wedge, 0, 0, 0); > } > > smp_call_function_single(smp_processor_id(), wedge, 0, 0, 0); > > return 0; > } > > module_init(doom_init); > > MODULE_AUTHOR("chas williams "); > MODULE_DESCRIPTION("wedge the kernel but good"); > MODULE_LICENSE("GPL"); > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Thu Dec 4 10:52:09 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 4 Dec 2008 20:52:09 +0200 Subject: [ofa-general] [PATCH] opensm: fix race in main OpenSM flow. Message-ID: <20081204185209.GI6183@sashak.voltaire.com> wait_for_pending_transactions() is heavily used during OpenSM heavy and light sweep, it checks opensm.stats.qp0_mads_outstanding value and blocks (with pthread_cond_wait()) until it will reach zero, this event is signaled by pthread_cond_signal() and should wake waiting thread up. However this code (qp0_mads_outstanding decrease and signaling) is not protected by the same mutex as check code in wait_for_pending_transactions() is. As result there is an easily reproducible race condition (when qp0_mads_outstanding is decreased and signaled after the check and before pthread_cond_wait() call in wait_for_pending_transactions()), which causes OpenSM to dead lock. This patch addresses this issue - it will use the same mutex which is used in wait_for_pending_transactions() for all qp0_mads_outstanding change operations. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_stats.h | 34 ++++++++++++++++++++++++++++++++++ opensm/opensm/osm_sm_mad_ctrl.c | 21 +++++---------------- opensm/opensm/osm_vl15intf.c | 4 ++-- 3 files changed, 41 insertions(+), 18 deletions(-) diff --git a/opensm/include/opensm/osm_stats.h b/opensm/include/opensm/osm_stats.h index 2b06e3f..4331cfa 100644 --- a/opensm/include/opensm/osm_stats.h +++ b/opensm/include/opensm/osm_stats.h @@ -146,5 +146,39 @@ typedef struct osm_stats { * SEE ALSO ***************/ +static inline uint32_t osm_stats_inc_qp0_outstanding(osm_stats_t *stats) +{ + uint32_t outstanding; + +#ifdef HAVE_LIBPTHREAD + pthread_mutex_lock(&stats->mutex); + outstanding = ++stats->qp0_mads_outstanding; + pthread_mutex_unlock(&stats->mutex); +#else + outstanding = cl_atomic_inc(&stats->qp0_mads_outstanding); +#endif + + return outstanding; +} + +static inline uint32_t osm_stats_dec_qp0_outstanding(osm_stats_t *stats) +{ + uint32_t outstanding; + +#ifdef HAVE_LIBPTHREAD + pthread_mutex_lock(&stats->mutex); + outstanding = --stats->qp0_mads_outstanding; + if (!outstanding) + pthread_cond_signal(&stats->cond); + pthread_mutex_unlock(&stats->mutex); +#else + outstanding = cl_atomic_dec(&stats->qp0_mads_outstanding); + if (!outstanding) + cl_event_signal(&stats->event); +#endif + + return outstanding; +} + END_C_DECLS #endif /* _OSM_STATS_H_ */ diff --git a/opensm/opensm/osm_sm_mad_ctrl.c b/opensm/opensm/osm_sm_mad_ctrl.c index fa588cf..267ec85 100644 --- a/opensm/opensm/osm_sm_mad_ctrl.c +++ b/opensm/opensm/osm_sm_mad_ctrl.c @@ -64,6 +64,7 @@ * * SYNOPSIS */ + static void __osm_sm_mad_ctrl_retire_trans_mad(IN osm_sm_mad_ctrl_t * const p_ctrl, IN osm_madw_t * const p_madw) @@ -82,23 +83,11 @@ __osm_sm_mad_ctrl_retire_trans_mad(IN osm_sm_mad_ctrl_t * const p_ctrl, osm_mad_pool_put(p_ctrl->p_mad_pool, p_madw); - outstanding = cl_atomic_dec(&p_ctrl->p_stats->qp0_mads_outstanding); - - OSM_LOG(p_ctrl->p_log, OSM_LOG_DEBUG, "%u QP0 MADs outstanding\n", - p_ctrl->p_stats->qp0_mads_outstanding); + outstanding = osm_stats_dec_qp0_outstanding(p_ctrl->p_stats); - if (outstanding == 0) { - /* - The wire is clean. - Signal the subnet manager. - */ - OSM_LOG(p_ctrl->p_log, OSM_LOG_DEBUG, "wire is clean.\n"); -#ifdef HAVE_LIBPTHREAD - pthread_cond_signal(&p_ctrl->p_stats->cond); -#else - cl_event_signal(&p_ctrl->p_stats->event); -#endif - } + OSM_LOG(p_ctrl->p_log, OSM_LOG_DEBUG, "%u QP0 MADs outstanding%s\n", + p_ctrl->p_stats->qp0_mads_outstanding, + outstanding ? "" : ": wire is clean."); OSM_LOG_EXIT(p_ctrl->p_log); } diff --git a/opensm/opensm/osm_vl15intf.c b/opensm/opensm/osm_vl15intf.c index 0cd88ec..0703a4f 100644 --- a/opensm/opensm/osm_vl15intf.c +++ b/opensm/opensm/osm_vl15intf.c @@ -325,7 +325,7 @@ void osm_vl15_post(IN osm_vl15_t * const p_vl, IN osm_madw_t * const p_madw) cl_spinlock_acquire(&p_vl->lock); if (p_madw->resp_expected == TRUE) { cl_qlist_insert_tail(&p_vl->rfifo, &p_madw->list_item); - cl_atomic_inc(&p_vl->p_stats->qp0_mads_outstanding); + osm_stats_inc_qp0_outstanding(p_vl->p_stats); } else cl_qlist_insert_tail(&p_vl->ufifo, &p_madw->list_item); cl_spinlock_release(&p_vl->lock); @@ -374,7 +374,7 @@ osm_vl15_shutdown(IN osm_vl15_t * const p_vl, "Releasing Request p_madw = %p\n", p_madw); osm_mad_pool_put(p_mad_pool, p_madw); - cl_atomic_dec(&p_vl->p_stats->qp0_mads_outstanding); + osm_stats_dec_qp0_outstanding(p_vl->p_stats); p_madw = (osm_madw_t *) cl_qlist_remove_head(&p_vl->rfifo); } -- 1.6.1.rc1.45.g123ed From sashak at voltaire.com Thu Dec 4 11:04:50 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 4 Dec 2008 21:04:50 +0200 Subject: [ofa-general] CentOS non-OFED opensm package needs /dev/infiniband In-Reply-To: <4936C97C.5010405@harr.org> References: <4936C97C.5010405@harr.org> Message-ID: <20081204190450.GJ6183@sashak.voltaire.com> On 11:01 Wed 03 Dec , Cameron Harr wrote: > > It turns out the SM is looking for devices in /dev/infiniband (umad0, > uverbs0), It is not OpenSM really but libibumad (which is used by OpenSM). > but the kernel-provided ib_umad and ib_uverbs modules place the > devices in /dev. By creating a link "infiniband" in the /dev/ directory to > the /dev/ directory, things magically started to work. > > "cd /dev; ln -s . infiniband" Another approach is to create file like (similar to one used in OFED): [root at grumz m]# cat /etc/udev/rules.d/100-infiniband.rules KERNEL=="umad*", NAME="infiniband/%k" KERNEL=="issm*", NAME="infiniband/%k" KERNEL=="ucm*", NAME="infiniband/%k", MODE="0666" KERNEL=="uverbs*", NAME="infiniband/%k", MODE="0666" KERNEL=="ucma", NAME="infiniband/%k", MODE="0666" KERNEL=="rdma_cm", NAME="infiniband/%k", MODE="0666" Sasha From michael.oevermann at tu-berlin.de Thu Dec 4 12:24:50 2008 From: michael.oevermann at tu-berlin.de (Michael Oevermann) Date: Thu, 04 Dec 2008 21:24:50 +0100 Subject: [ofa-general] opensm, OFED 1.3/1.4 Message-ID: <49383C92.6000708@tu-berlin.de> Hi all, I am trying to set up a cluster (with OSCAR) with infiniband using CentOS/RHEL 5.2. Only the compute nodes have IB NICS, the head node doesn't. At this point I want to use the packages of the CentOS repositories which are - as far as I know - based on OFED 1.3 to avoid package conflicts and dependency problems I run into if I compile OFED 1.4 from source. I have two questions: 1) can I still use the OFED 1.4 source to compile the most recent MPI versions (e.g. MVAPICH2, which is not in the CentOS repository). Or is it just a bad idea to mix versions here? 2) As far as I understand, I need the opensm subnet manager. Can I run this manager on the head node (which has no IB NIC) or do I need to run it on one of the nodes. I would prefer to run it on the head node in order to keep all nodes exactly identical. Best regards and thanks for info Michael From hal.rosenstock at gmail.com Thu Dec 4 12:34:09 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 4 Dec 2008 15:34:09 -0500 Subject: [ofa-general] opensm, OFED 1.3/1.4 In-Reply-To: <49383C92.6000708@tu-berlin.de> References: <49383C92.6000708@tu-berlin.de> Message-ID: Hi, On Thu, Dec 4, 2008 at 3:24 PM, Michael Oevermann wrote: > Hi all, > > I am trying to set up a cluster (with OSCAR) with infiniband using > CentOS/RHEL 5.2. Only the compute nodes have > IB NICS, the head node doesn't. At this point I want to use the packages of > the CentOS repositories which > are - as far as I know - based on OFED 1.3 to avoid package conflicts and > dependency problems I run > into if I compile OFED 1.4 from source. I have two questions: > > 1) can I still use the OFED 1.4 source to compile the most recent MPI > versions (e.g. MVAPICH2, which is not in the > CentOS repository). Or is it just a bad idea to mix versions here? > > 2) As far as I understand, I need the opensm subnet manager. Can I run this > manager on the head node (which has > no IB NIC) An IB NIC (HCA) is needed to run OpenSM or any host based SM. -- Hal > or do I need to run it on one of the nodes. I would prefer to run > it on the head node in order to > keep all nodes exactly identical. > > Best regards and thanks for info > > Michael > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Thu Dec 4 12:34:04 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 4 Dec 2008 22:34:04 +0200 Subject: [ofa-general] opensm, OFED 1.3/1.4 In-Reply-To: <49383C92.6000708@tu-berlin.de> References: <49383C92.6000708@tu-berlin.de> Message-ID: <20081204203404.GK6183@sashak.voltaire.com> Hi Michael, On 21:24 Thu 04 Dec , Michael Oevermann wrote: > > 2) As far as I understand, I need the opensm subnet manager. Can I run this > manager on the head node (which has > no IB NIC) or do I need to run it on one of the nodes. You need to run OpenSM on IB node. Sasha From cameron at harr.org Thu Dec 4 12:36:41 2008 From: cameron at harr.org (Cameron Harr) Date: Thu, 04 Dec 2008 13:36:41 -0700 Subject: [ofa-general] opensm, OFED 1.3/1.4 In-Reply-To: References: <49383C92.6000708@tu-berlin.de> Message-ID: <49383F59.9050703@harr.org> An HTML attachment was scrubbed... URL: From hal.rosenstock at gmail.com Thu Dec 4 13:03:40 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 4 Dec 2008 16:03:40 -0500 Subject: [ofa-general] ***SPAM*** Re: [PATCH] opensm: fix race in main OpenSM flow. In-Reply-To: <20081204185209.GI6183@sashak.voltaire.com> References: <20081204185209.GI6183@sashak.voltaire.com> Message-ID: On Thu, Dec 4, 2008 at 1:52 PM, Sasha Khapyorsky wrote: > > wait_for_pending_transactions() is heavily used during OpenSM heavy and > light sweep, it checks opensm.stats.qp0_mads_outstanding value and blocks > (with pthread_cond_wait()) until it will reach zero, this event is > signaled by pthread_cond_signal() and should wake waiting thread up. > However this code (qp0_mads_outstanding decrease and signaling) is not > protected by the same mutex as check code in > wait_for_pending_transactions() is. As result there is an easily > reproducible race condition (when qp0_mads_outstanding is decreased and > signaled after the check and before pthread_cond_wait() call in > wait_for_pending_transactions()), which causes OpenSM to dead lock. > > This patch addresses this issue - it will use the same mutex which is > used in wait_for_pending_transactions() for all qp0_mads_outstanding > change operations. Nice catch! Looks to me like this has been there from around the following commit or some related changes shortly thereafter: commit 1b2eb3daddbfa9fc555488cddbea12b01f6635a3 Date: Mon Jan 28 03:10:18 2008 +0200 opensm: wait_for_pending_transaction() generalization Function wait_for_pending_transaction() is global now and moved from PerfMgr to StateMgr, all related objects are generalized. If so, this is applicable to 3.1 and maybe also 3.0 based OpenSMs. -- Hal From kliteyn at dev.mellanox.co.il Thu Dec 4 13:26:17 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 04 Dec 2008 23:26:17 +0200 Subject: [ofa-general] [PATCH] OFED docs/QoS_architecture.txt: fixes for OFED 1.4 Message-ID: <49384AF9.10304@dev.mellanox.co.il> Tziporet, Some fixes for OFED 1.4 release in QoS_architecture.txt. Signed-off-by: Yevgeny Kliteynik --- QoS_architecture.txt | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/QoS_architecture.txt b/QoS_architecture.txt index 72f9e83..1c19a98 100644 --- a/QoS_architecture.txt +++ b/QoS_architecture.txt @@ -114,14 +114,14 @@ I) Port Group: a set of CAs, Routers or Switches that share the same settings. list of GUIDs, or list of port names based on NodeDescription. II) Fabric Setup: Defines how the SL2VL and VLArb tables should be setup. - NOTE: In OFED 1.3 this part of the policy is ignored. SL2VL and VLArb + NOTE: Currently this part of the policy is ignored. SL2VL and VLArb tables should be configured in the OpenSM options file (opensm.opts). III) QoS-Levels Definition: This section defines the possible sets of parameters for QoS that a client might be mapped to. Each set holds SL and optionally: Max MTU, Max Rate, Packet Lifetime and Path Bits. - NOTE: Path Bits are not implemented in OFED 1.3 + NOTE: Currently, Path Bits are not implemented. IV) Matching Rules: A list of rules that match an incoming PR/MPR request to a QoS-Level. The rules are processed in order such as the first match @@ -166,7 +166,7 @@ holding the remote TCP/IP Port Number to connect to. 7. RDS ============================================================================== -RDS uses CMA and thus it is very close to SDP. The Service-ID for RDS is +RDS uses CMA and thus it is very close to SDP. The Service-ID for RDS is 0x000000000106PPPP, where PPPP are 4 hex digits holding the TCP/IP Port Number that the protocol connects to. Default port number for RDS is 0x48CA, which makes a default Service-ID @@ -191,8 +191,8 @@ SA reports its ability to handle QoS PR/MPRs. ============================================================================== Similar to RDS, iSER also uses CMA. The Service-ID for iSER is similar to RDS -(0x000000000106PPPP), with default port number 0x035C, which makes a default -Service-ID 0x000000000106035C. +(0x000000000106PPPP), with default port number 0x0CBC, which makes a default +Service-ID 0x0000000001060CBC. ============================================================================== -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Thu Dec 4 13:26:37 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Thu, 04 Dec 2008 23:26:37 +0200 Subject: [ofa-general] [PATCH] OFED docs/QoS_management_in_OpenSM.txt: fixes for OFED 1.4 Message-ID: <49384B0D.6000301@dev.mellanox.co.il> Tziporet, Some fixes for OFED 1.4 release in QoS_management_in_OpenSM.txt. Signed-off-by: Yevgeny Kliteynik --- QoS_management_in_OpenSM.txt | 58 +++++++++++++++++++++--------------------- 1 files changed, 29 insertions(+), 29 deletions(-) diff --git a/QoS_management_in_OpenSM.txt b/QoS_management_in_OpenSM.txt index 1c2f59d..8c9915f 100644 --- a/QoS_management_in_OpenSM.txt +++ b/QoS_management_in_OpenSM.txt @@ -65,9 +65,9 @@ matching rules (see below). Port group lists ports by: II) QoS Setup (denoted by qos-setup). This section describes how to set up SL2VL and VL Arbitration tables on various nodes in the fabric. -However, this is not supported in OFED 1.3.1. +However, this is not supported in OpenSM currently. SL2VL and VLArb tables should be configured in the OpenSM options file -(default location - /var/cache/opensm/opensm.opts). +(default location - /usr/local/etc/opensm/opensm.conf). III) QoS Levels (denoted by qos-levels). Each QoS Level defines Service Level (SL) and a few optional fields: @@ -203,9 +203,9 @@ policy file and their syntax: qos-setup # This section of the policy file describes how to set up SL2VL and VL # Arbitration tables on various nodes in the fabric. - # However, this is not supported in OFED 1.3.1 - the section is parsed - # and ignored. SL2VL and VLArb tables should be configured in the - # OpenSM options file (by default - /var/cache/opensm/opensm.opts). + # However, this is not supported in OpenSM currently - the section is + # parsed and ignored. SL2VL and VLArb tables should be configured in the + # OpenSM options file (by default - /usr/local/etc/opensm/opensm.conf). end-qos-setup qos-levels @@ -378,12 +378,12 @@ equivalent: 6.4 iSER Similar to RDS, iSER query is matched by Service ID, where the the Service ID -is also 0x000000000106PPPP. Default port number for iSER is 0x035C, which makes -a default Service-ID 0x000000000106035C. The following two match rules are +is also 0x000000000106PPPP. Default port number for iSER is 0x0CBC, which makes +a default Service-ID 0x0000000001060CBC. The following two match rules are equivalent: iser : - any, service-id 0x000000000106035C : + any, service-id 0x0000000001060CBC : 6.5 SRP Service ID for SRP varies from storage vendor to vendor, thus SRP query is @@ -432,17 +432,17 @@ of the currently supported sets: Here's the example of typical default values for CAs and switches' external ports (hard-coded in OpenSM initialization): - qos_ca_max_vls=15 - qos_ca_high_limit=0 - qos_ca_vlarb_high=0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 - qos_ca_vlarb_low=0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 - qos_ca_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 + qos_ca_max_vls 15 + qos_ca_high_limit 0 + qos_ca_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 + qos_ca_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 + qos_ca_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 - qos_swe_max_vls=15 - qos_swe_high_limit=0 - qos_swe_vlarb_high=0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 - qos_swe_vlarb_low=0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 - qos_swe_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 + qos_swe_max_vls 15 + qos_swe_high_limit 0 + qos_swe_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 + qos_swe_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 + qos_swe_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 VL arbitration tables (both high and low) are lists of VL/Weight pairs. Each list entry contains a VL number (values from 0-14), and a weighting value @@ -473,17 +473,17 @@ values for each VL should be multiples of 64. Below is an example of SL2VL and VL Arbitration configuration on subnet: - qos_ca_max_vls=15 - qos_ca_high_limit=6 - qos_ca_vlarb_high=0:4 - qos_ca_vlarb_low=0:0,1:64,2:128,3:192,4:0,5:64,6:64,7:64 - qos_ca_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 - - qos_swe_max_vls=15 - qos_swe_high_limit=6 - qos_swe_vlarb_high=0:4 - qos_swe_vlarb_low=0:0,1:64,2:128,3:192,4:0,5:64,6:64,7:64 - qos_swe_sl2vl=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 + qos_ca_max_vls 15 + qos_ca_high_limit 6 + qos_ca_vlarb_high 0:4 + qos_ca_vlarb_low 0:0,1:64,2:128,3:192,4:0,5:64,6:64,7:64 + qos_ca_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 + + qos_swe_max_vls 15 + qos_swe_high_limit 6 + qos_swe_vlarb_high 0:4 + qos_swe_vlarb_low 0:0,1:64,2:128,3:192,4:0,5:64,6:64,7:64 + qos_swe_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 In this example, there are 8 VLs configured on subnet: VL0 to VL7. VL0 is defined as a high priority VL, and it is limited to 6 x 4KB = 24KB in a single -- 1.5.1.4 From ddiss at sgi.com Thu Dec 4 19:28:44 2008 From: ddiss at sgi.com (David Disseldorp) Date: Fri, 5 Dec 2008 14:28:44 +1100 Subject: [ofa-general] Re: [PATCH] iser: avoid recv buf exhaustion v2 In-Reply-To: <1227676762-23505-1-git-send-email-ddiss@sgi.com> References: <20081126161213.000065c3@snort.melbourne.sgi.com> <1227676762-23505-1-git-send-email-ddiss@sgi.com> Message-ID: <20081205142844.00006b78@snort.melbourne.sgi.com> Ping, anyone had a chance to look over this one? Cheers, Dave On Wed, 26 Nov 2008 16:19:22 +1100 David Disseldorp wrote: > iSCSI/iSER targets may send PDUs without a prior request from the initiator, > RFC 5046 refers to these PDUs as "unexpected". NOP-In PDUs with itt=RESERVED > and Asynchronous Message PDUs occupy this category. > > The amount of active "unexpected" PDU's an iSER target may have at any time is > governed by the MaxOutstandingUnexpectedPDUs key, which is not yet supported. > > Currently when an iSER target sends an "unexpected" PDU, the initiators recv > buffer consumed by the PDU is not replaced. If over initial_post_recv_bufs_num > "unexpected" PDUs are received then the receive queue will run out of receive > work requests. > > This patch ensures recv buffers consumed by "unexpected" PDUs are replaced > in the next iser_post_receive_control() call. > > Version 2: > o replace unexpected recv bufs in iser_post_receive_control, transparent > to iser_send_* functions. From wangwhao at cn.ibm.com Thu Dec 4 23:35:35 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Fri, 5 Dec 2008 15:35:35 +0800 Subject: [ofa-general] Automation Test Tool of OFED and opensm Message-ID: Hi all: Is there any test or diagnostic tool, especially automation test tool to check the functions of OFED and opensm? The only thing here I know is infiniband-diags and libibverbs-utils. Thanks. Wen Hao Wang Email: wangwhao at cn.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From yossi.openib at gmail.com Fri Dec 5 03:06:22 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Fri, 05 Dec 2008 13:06:22 +0200 Subject: ***SPAM*** Re: [ofa-general] [PATCH] IB/IPoIB: Decrease the time that invalid paths stay useless In-Reply-To: <4938010A.40701@Voltaire.COM> References: <4938010A.40701@Voltaire.COM> Message-ID: <49390B2E.60203@gmail.com> > @@ -360,12 +360,15 @@ void ipoib_mark_paths_invalid(struct net_device *dev) > spin_lock_irq(&priv->lock); > > list_for_each_entry_safe(path, tp, &priv->path_list, list) { > - ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " invalid\n", > + ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " stale\n", > be16_to_cpu(path->pathrec.dlid), > IPOIB_GID_ARG(path->pathrec.dgid)); > - path->valid = 0; > + path->stale = 1; > } > > + if (!list_empty(&priv->path_list)) > + queue_delayed_work(ipoib_workqueue, &priv->path_refresh_task, > + round_jiffies_relative(HZ)); > spin_unlock_irq(&priv->lock); > } > What if there is already an outstanding path query on one of the paths you mark stale? ipoib_refresh_paths() will issue another query, making it two queries on the same path. Then, if you bring the device down (call ipoib_flush_paths()) it will wait for completion of one query, causing a crash. From vlad at lists.openfabrics.org Fri Dec 5 03:17:52 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 5 Dec 2008 03:17:52 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081205-0200 daily build status Message-ID: <20081205111752.6BACCE60C5C@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From perkinjo at cse.ohio-state.edu Fri Dec 5 06:33:15 2008 From: perkinjo at cse.ohio-state.edu (Jonathan Perkins) Date: Fri, 5 Dec 2008 09:33:15 -0500 Subject: [ofa-general] ***SPAM*** MPIR_Init_thread(310).......: Initialization failed In-Reply-To: References: Message-ID: <20081205143314.GE2900@cse.ohio-state.edu> Hello, At first glance it appears that this is a an issue with an inability to pin the memory required. I'm cc'ing mvapich-discuss as well since this message is specific to the MVAPICH/MVAPICH2 packages. The following is a snippet from the MVAPICH2 Userguide... http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.2.html#x1-580009.3.4 A possible reason could be inability to pin the memory required. Make sure the following steps are taken. 1. In /etc/security/limits.conf add the following * soft memlock phys_mem_in_KB 2. After this, add the following to /etc/init.d/sshd ulimit -l phys_mem_in_KB 3. Restart sshd With some distros, we’ve found that adding the ulimit -l line to the sshd init script is no longer necessary. For instance, the following steps work for our rhel5 systems. 1. Add the following lines to /etc/security/limits.conf * soft memlock unlimited * hard memlock unlimited 2. Restart sshd On Wed, Dec 03, 2008 at 12:30:07PM +0530, अनुज wrote: > Hi > > I have compiled mvapich2-1.2p1 for gen2. > > I tried to run IMB ( Intel MPI Benchmark) over it. > > But I'm getting the following error : > > Fatal error in MPI_Init_thread: > Other MPI error, error stack: > MPIR_Init_thread(310).......: Initialization failed > MPID_Init(113)..............: channel initialization failed > MPIDI_CH3_Init(168).........: > MPIDI_CH3I_RDMA_init(138)...: > rdma_setup_startup_ring(334): cannot create cq > MPI process terminated unexpectedly > Exit code -5 signaled from pnetib2 > # here pnetib2 is the host name assigned to ipoib interface > cleanupKilling remote processes...Signal 15 received. > DONE > > Please tell me where is the problem. Or how can i debug this. > > Thanks Alot > > Regards, > > -- > Anuj Aggarwal > > .''`. > : :Ⓐ : # apt-get install hakuna-matata > `. `'` > `- > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Jonathan Perkins http://www.cse.ohio-state.edu/~perkinjo From hal.rosenstock at gmail.com Fri Dec 5 06:37:36 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 5 Dec 2008 09:37:36 -0500 Subject: [ofa-general] Automation Test Tool of OFED and opensm In-Reply-To: References: Message-ID: Hi, On Fri, Dec 5, 2008 at 2:35 AM, Wen Hao Wang wrote: > Hi all: > > Is there any test or diagnostic tool, especially automation test tool to > check the functions of OFED and opensm? The only thing here I know is > infiniband-diags and libibverbs-utils. In terms of OpenSM, there is osmtest. Also, ibsim and ibmgtsim (in ibutils) might help simulations of various subnet topologies. In terms of additional diagnostics, there is ibutils (ibdiagnet). There are also a number of examples in the various OFED packages which can be used for regression of those components. As far as an automation harness/test tool goes, I believe some vendors have worked on this but to the best of my knowledge none of this has been open sourced. -- Hal > Thanks. > > Wen Hao Wang > Email: wangwhao at cn.ibm.com > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From jmulik at desu.edu Fri Dec 5 08:20:09 2008 From: jmulik at desu.edu (Jaiwant Mulik) Date: Fri, 5 Dec 2008 11:20:09 -0500 Subject: [ofa-general] ***SPAM*** libibverb-utils. Message-ID: <04CB02EA-2F7A-4240-B61D-AFF4DF310CDC@desu.edu> Hi all, Do the ibv_rc_pingpong, ibv_srq_pingpong, ibv_uc-pingpong and ibv_ud_pingpong utilities work only for IB HCAs and not for iWARP cards? I have a Chelso s302X card and ibv_devices and ibv_devinfo correctly detects them. Also rping works fine so I know that the cards and RDMA are working. [root at iwarp1 ~]# ibv_rc_pingpong Couldn't get local LID [root at iwarp1 ~]# ibv_srq_pingpong Couldn't create SRQ [root at iwarp1 ~]# ibv_srq_pingpong Couldn't create SRQ [root at iwarp1 ~]# ibv_uc_pingpong Couldn't create QP [root at iwarp1 ~]# ibv_ud_pingpong Couldn't create QP [root at iwarp1 ~]# ibv_devices device node GUID ------ ---------------- cxgb3_0 000743050b6b0000 [root at iwarp1 ~]# ibv_devinfo hca_id: cxgb3_0 fw_ver: 0.0.0 node_guid: 0007:4305:0b6b:0000 sys_image_guid: 0007:4305:0b6b:0000 vendor_id: 0x1425 vendor_part_id: 50 hw_ver: 0x0 board_id: 1425.32 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: invalid MTU (160) sm_lid: 0 port_lid: 0 port_lmc: 0x00 port: 2 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: invalid MTU (160) sm_lid: 0 port_lid: 0 port_lmc: 0x00 [root at iwarp1 ~]# ------------------------------------------------------------------ Assistant Professor Computer and Information Sciences Department Delaware State University, Dover, DE (302) 857-7910/6640, http://netlab.cis.desu.edu ------------------------------------------------------------------ Lekin woh zindagi hi kya jisme koi namumkin sapna na ho? From rdreier at cisco.com Fri Dec 5 10:04:54 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 Dec 2008 10:04:54 -0800 Subject: [ofa-general] Re: [PATCH] iser: avoid recv buf exhaustion v2 In-Reply-To: <20081205142844.00006b78@snort.melbourne.sgi.com> (David Disseldorp's message of "Fri, 5 Dec 2008 14:28:44 +1100") References: <20081126161213.000065c3@snort.melbourne.sgi.com> <1227676762-23505-1-git-send-email-ddiss@sgi.com> <20081205142844.00006b78@snort.melbourne.sgi.com> Message-ID: > Ping, anyone had a chance to look over this one? I never saw the original go by... I wonder if a spam filter somewhere ate it? I'll wait to see the patch and wait for the iSER guys to ACK it, but in principle it sounds fine to apply. From rdreier at cisco.com Fri Dec 5 11:05:18 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 Dec 2008 11:05:18 -0800 Subject: [ofa-general] Re: [PATCH 10/10] RDMA/nes: Add loopback check to make_cm_node() In-Reply-To: <20081121205104.GA3720@ctung-MOBL> (Chien Tung's message of "Fri, 21 Nov 2008 14:51:04 -0600") References: <20081121205104.GA3720@ctung-MOBL> Message-ID: Thanks, I applied the following patches: Chien Tung (2): RDMA/nes: Add loopback check to make_cm_node() RDMA/nes: Cleanup warnings Faisal Latif (6): RDMA/nes: Cleanup cqp_request list usage RDMA/nes: Lock down connected_nodes list while processing it RDMA/nes: Avoid race between MPA request and reset event to rdma_cm RDMA/nes: Forward packets for a new connection with stale APBVT entry RDMA/nes: Fix TCP compliance test failures RDMA/nes: Check cqp_avail_reqs is empty after locking the list A couple of trivial things you could do that would make my life a little easier: the body of your patch email currently looks like: > From: Chien Tung > > RDMA/nes: Add loopback check to make_cm_node() > > Check for loopback connection in make_cm_node() > > Signed-off-by: Chien Tung > -- > diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c The "From:" line is perfect, it gets the author correct in git automatically. However the line after that with the changelog subject shouldn't be there -- git will automatically use the "Subject:" line of the email itself for the patch title, and so I just have to strip the duplicate subject by hand to avoid it showing up in git twice. Also the "--" is not quite right -- it needs to be "---" (three '-'s, not two) for git to strip it automatically; as it stands I get a "--" in the changelog after your signoff unless I strip it by hand. And also including a diffstat after the "---" is helpful, so I can see what the patch touches. Using "git format-patch" possibly combined with "git send-email" is an easy way to get things right. - R. From rdreier at cisco.com Fri Dec 5 11:15:46 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 Dec 2008 11:15:46 -0800 Subject: [ofa-general] Re: [PATCH 6/6] IB/ipath - Add locking for interrupt use of ipath_pd contexts vs free. In-Reply-To: <20081203183717.575.2020.stgit@eng-46.mv.qlogic.com> (Ralph Campbell's message of "Wed, 03 Dec 2008 10:37:18 -0800") References: <20081203183645.575.74389.stgit@eng-46.mv.qlogic.com> <20081203183717.575.2020.stgit@eng-46.mv.qlogic.com> Message-ID: thanks, applied all 6. One quick request: for uniformity, instead of making your subjects start "IB/ipath -", could you use "IB/ipath:"? That matches what I use for the rest of the RDMA stack and also what the rest of the kernel uses. Thanks, Roland From rdreier at cisco.com Fri Dec 5 11:17:13 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 05 Dec 2008 11:17:13 -0800 Subject: [ofa-general] [PATCH] cma_zero_addr In-Reply-To: <1228222680.14862.13.camel@alst60.voltaire.com> (Aleksey Senin's message of "Tue, 02 Dec 2008 14:58:00 +0200") References: <1228222680.14862.13.camel@alst60.voltaire.com> Message-ID: OK, I'm officially confused by all the acks, comments and patch versions floating around (and stuff like this, which as far as I can tell is a change that should just be rolled up into the original IPv6 support patch). Can someone please send me the final, final versions of the 2 IPv6 support patches so I know exactly what to apply? - R. From wangwhao at cn.ibm.com Fri Dec 5 15:57:50 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Sat, 6 Dec 2008 07:57:50 +0800 Subject: [ofa-general] Automation Test Tool of OFED and opensm In-Reply-To: Message-ID: Tel: 86-10-82451055 Fax: 86-10-82782244 ext. 2312 Address: 1/F, IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing, 100193, P.R.China "Hal Rosenstock" 写于 2008-12-05 22:37:36: > Hi, > > On Fri, Dec 5, 2008 at 2:35 AM, Wen Hao Wang wrote: > > Hi all: > > > > Is there any test or diagnostic tool, especially automation test tool to > > check the functions of OFED and opensm? The only thing here I know is > > infiniband-diags and libibverbs-utils. > > In terms of OpenSM, there is osmtest. Also, ibsim and ibmgtsim (in > ibutils) might help simulations of various subnet topologies. In terms > of additional diagnostics, there is ibutils (ibdiagnet). > OK. I know osmtest, but not ibsim/ibmgtsim. Will check them.: > There are also a number of examples in the various OFED packages which > can be used for regression of those components. > Would you point our one or two these examples for my reference? > As far as an automation harness/test tool goes, I believe some vendors > have worked on this but to the best of my knowledge none of this has > been open sourced. > Thanks a lot! You feedback is really helpful. Wen Hao Wang! wangwhao at cn.ibm.com > -- Hal > > > Thanks. > > > > Wen Hao Wang > > Email: wangwhao at cn.ibm.com > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralph.campbell at qlogic.com Fri Dec 5 17:25:32 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Fri, 05 Dec 2008 17:25:32 -0800 Subject: [ofa-general] [PATCH] IB/cm: change cma_modify_qp_err() to handle QP in RESET state Message-ID: <20081206012532.17073.64349.stgit@eng-46.mv.qlogic.com> Since the IBTA 1.2.1 spec. clarified that the RESET to ERROR QP state transition is not valid but earlier the openfabrics code supported it, the code in cma_modify_qp_err() will now return an error if the QP is in the RESET state. This can cause RDS to go into a loop trying to call rdma_disconnect() continuously. Signed-off-by: Ralph Campbell --- drivers/infiniband/core/cma.c | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d951896..a1f9781 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -545,6 +545,7 @@ out: static int cma_modify_qp_err(struct rdma_id_private *id_priv) { + struct ib_qp_init_attr qp_init_attr; struct ib_qp_attr qp_attr; int ret; @@ -554,8 +555,12 @@ static int cma_modify_qp_err(struct rdma_id_private *id_priv) goto out; } - qp_attr.qp_state = IB_QPS_ERR; - ret = ib_modify_qp(id_priv->id.qp, &qp_attr, IB_QP_STATE); + ret = ib_query_qp(id_priv->id.qp, &qp_attr, IB_QP_STATE, &qp_init_attr); + if (!ret && qp_attr.qp_state != IB_QPS_RESET && + qp_attr.qp_state != IB_QPS_ERR) { + qp_attr.qp_state = IB_QPS_ERR; + ret = ib_modify_qp(id_priv->id.qp, &qp_attr, IB_QP_STATE); + } out: mutex_unlock(&id_priv->qp_mutex); return ret; From Jesse.Butler at Sun.COM Fri Dec 5 19:02:32 2008 From: Jesse.Butler at Sun.COM (Jesse Butler) Date: Fri, 05 Dec 2008 20:02:32 -0700 Subject: [ofa-general] Re: [ewg] rhel 5.2 iSER support? In-Reply-To: <71DB1A3F-05CB-4A75-BEA6-D51ABD084595@sun.com> References: <4935B02B.9020408@sun.com> <49363691.50609@voltaire.com> <71DB1A3F-05CB-4A75-BEA6-D51ABD084595@sun.com> Message-ID: <351F29F0-39DA-4CF1-81A7-719043E34B34@sun.com> We seem to be working with a given configuration just fine on RHEL 5.2 w/ OFED 1.3 and it's bundled Open iSCSI. We are failing to login with the same configuration parameters running OFED v1.4 RC6. Is anyone having issues with iSER on this newer build on RHEL 5.2? Did the configuration options change? Thanks /jb On Dec 3, 2008, at 9:33 PM, Jesse Butler wrote: > For me, this ended up being that the CM service was not yet > configured at the time that I was attempting to login. So, it is > possible that you need to set the port settling time attribute to > ensure that the port are configured properly. Sameer, ping me > directly if you need further assistance. > /jb > > On Dec 3, 2008, at 2:34 AM, Or Gerlitz wrote: > >> Sameer Mehta wrote: >>> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: >>> iser_connect:connecting to: 192.168.0.5, port 0xbc0c >>> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_cma_handler:event >>> 0 conn ffff81015de00bc0 id ffff81017fc8e200 >>> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_cma_handler:event >>> 2 conn ffff81015de00bc0 id ffff81017fc8e200 >>> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: >>> iser_create_ib_conn_res:setting conn ffff81015de00bc0 cma_id >>> ffff81017fc8e200: fmr_pool ffff810140c9aec0 qp ffff810168974e00 >>> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: iser_cma_handler:event >>> 8 conn ffff81015de00bc0 id ffff81017fc8e200 >>> Dec 2 16:44:52 nws-bur-25-46 kernel: iser: >>> iser_cma_handler:event: 8, error: 8 >>> >>> Am I missing something here? is iSER transport available in v1.4? >> You are getting REJECTED (8) event with the reject reason being >> INVALID_SERVICE_ID (8), see include/rdma/ib_cm.h. This means >> there's no one listening on the Service-ID you are attempting to >> connect to, eg your target didn't issue a listen call on the SID >> (service id) you are trying to connect to or there's some mismatch >> is the SID as constructed by the initiator, etc. >> >> Related inter-op issue has been brought by Jesse Butler from Sun >> couple of months ago, http://lists.openfabrics.org/pipermail/general/2008-October/054487.html >> but I am not sure where it stands. >> >> The code that builds the SID from the tcp port is >> cma_get_service_id (drivers/infiniband/core/cma.c, below) where in >> this case the resulted SID is 0x0000000001060cbc >> >> Or. >>> static __be64 cma_get_service_id(enum rdma_port_space ps, struct >>> sockaddr *addr) >>> { >>> return cpu_to_be64(((u64)ps << 16) + be16_to_cpu(cma_port(addr))); >>> } >> >> >> >> >> >> >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From vlad at lists.openfabrics.org Sat Dec 6 03:16:34 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 6 Dec 2008 03:16:34 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081206-0200 daily build status Message-ID: <20081206111634.2136AE60CF0@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From jmulik at desu.edu Sat Dec 6 11:21:11 2008 From: jmulik at desu.edu (Jaiwant Mulik) Date: Sat, 6 Dec 2008 14:21:11 -0500 Subject: [ofa-general] ***SPAM*** libibverb-utils. References: <04CB02EA-2F7A-4240-B61D-AFF4DF310CDC@desu.edu> Message-ID: <8CF21BE2-ACBE-4787-9E2D-9B01931EEC23@desu.edu> [I am resending this message as the first attempt got marked as SPAM. No clue why] Hi all, Do the ibv_rc_pingpong, ibv_srq_pingpong, ibv_uc-pingpong and ibv_ud_pingpong utilities work only for IB HCAs and not for iWARP cards? I have a Chelso s302X card and ibv_devices and ibv_devinfo correctly detects them. Also rping works fine so I know that the cards and RDMA are working. [root at iwarp1 ~]# ibv_rc_pingpong Couldn't get local LID [root at iwarp1 ~]# ibv_srq_pingpong Couldn't create SRQ [root at iwarp1 ~]# ibv_srq_pingpong Couldn't create SRQ [root at iwarp1 ~]# ibv_uc_pingpong Couldn't create QP [root at iwarp1 ~]# ibv_ud_pingpong Couldn't create QP [root at iwarp1 ~]# ibv_devices device node GUID ------ ---------------- cxgb3_0 000743050b6b0000 [root at iwarp1 ~]# ibv_devinfo hca_id: cxgb3_0 fw_ver: 0.0.0 node_guid: 0007:4305:0b6b:0000 sys_image_guid: 0007:4305:0b6b:0000 vendor_id: 0x1425 vendor_part_id: 50 hw_ver: 0x0 board_id: 1425.32 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: invalid MTU (160) sm_lid: 0 port_lid: 0 port_lmc: 0x00 port: 2 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: invalid MTU (160) sm_lid: 0 port_lid: 0 port_lmc: 0x00 [root at iwarp1 ~]# ------------------------------------------------------------------ Assistant Professor Computer and Information Sciences Department Delaware State University, Dover, DE (302) 857-7910/6640, http://netlab.cis.desu.edu ------------------------------------------------------------------ Lekin woh zindagi hi kya jisme koi namumkin sapna na ho? From sashak at voltaire.com Sat Dec 6 14:28:06 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 7 Dec 2008 00:28:06 +0200 Subject: [ofa-general] [PATCH] opensm: add RN to distributed docs list Message-ID: <20081206222806.GA27505@sashak.voltaire.com> Add Release Notes to distributed documentation list. Signed-off-by: Sasha Khapyorsky --- opensm/Makefile.am | 3 ++- opensm/opensm.spec.in | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/opensm/Makefile.am b/opensm/Makefile.am index 02c693d..2287edd 100644 --- a/opensm/Makefile.am +++ b/opensm/Makefile.am @@ -21,7 +21,8 @@ endif man_MANS = man/opensm.8 man/osmtest.8 various_scripts = $(wildcard scripts/*) -docs = doc/performance-manager-HOWTO.txt doc/QoS_management_in_OpenSM.txt +docs = doc/performance-manager-HOWTO.txt doc/QoS_management_in_OpenSM.txt \ + doc/opensm_release_notes-3.2.txt EXTRA_DIST = autogen.sh opensm.spec $(various_scripts) $(man_MANS) $(docs) diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in index da07a73..9c23f47 100644 --- a/opensm/opensm.spec.in +++ b/opensm/opensm.spec.in @@ -124,7 +124,7 @@ fi %{_sbindir}/opensm %{_sbindir}/osmtest %{_mandir}/man8/* -%doc AUTHORS COPYING README doc/performance-manager-HOWTO.txt doc/QoS_management_in_OpenSM.txt +%doc AUTHORS COPYING README doc/performance-manager-HOWTO.txt doc/QoS_management_in_OpenSM.txt doc/opensm_release_notes-3.2.txt %{_sysconfdir}/init.d/opensmd %{_sbindir}/sldd.sh %config(noreplace) %{_sysconfdir}/logrotate.d/opensm -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 6 14:28:40 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 7 Dec 2008 00:28:40 +0200 Subject: [ofa-general] [PATCH] opensm: Release Notes update for next version (3.2.5) Message-ID: <20081206222840.GB27505@sashak.voltaire.com> Release Notes update for the next OpenSM version - 3.2.5. Signed-off-by: Sasha Khapyorsky --- opensm/doc/opensm_release_notes-3.2.txt | 22 ++++++++++++++++++++-- 1 files changed, 20 insertions(+), 2 deletions(-) diff --git a/opensm/doc/opensm_release_notes-3.2.txt b/opensm/doc/opensm_release_notes-3.2.txt index 4e60113..3356e95 100644 --- a/opensm/doc/opensm_release_notes-3.2.txt +++ b/opensm/doc/opensm_release_notes-3.2.txt @@ -1,7 +1,7 @@ OpenSM Release Notes 3.2 ============================= -Version: OpenSM 3.2.4 +Version: OpenSM 3.2.x Repo: git://git.openfabrics.org/~sashak/management.git Date: Dec 2008 @@ -10,7 +10,7 @@ Date: Dec 2008 This document describes the contents of the OpenSM 3.2 release. OpenSM is an InfiniBand compliant Subnet Manager and Administration, and runs on top of OpenIB. The OpenSM version for this release -is openib-3.2.4 +is opensm-3.2.5 This document includes the following sections: 1 This Overview section (describing new features and software @@ -202,6 +202,9 @@ This document includes the following sections: * support more PortInfo:CapabilityMask bits +* When babbling port policy is on disable the port with the least hop + count. + 1.3 Library API Changes None @@ -378,6 +381,21 @@ information regarding each compliance statement. * opensm/osm_ucast_lash: fix extra memory allocations +* opensm: fix race in main OpenSM flow + +* opensm/ftree: fix GUID check against cn_guid_file + +* opensm/ftree: save FLT buffers memory allocations + +* opensm/osm_sa_link_record.c: prevent potential endless recursion + +* opensm: remove SM from sm_guid_tbl when IsSM port capability flag is + not set + +* opensm: fix QoS config bug + +* opensm: don't reassign zeroed params from config file + * Other less critical or visible bugs were also fixed. 5 Main Verification Flows -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 6 18:35:31 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 7 Dec 2008 04:35:31 +0200 Subject: [ofa-general] [ANNOUNCE] management tarballs release Message-ID: <20081207023531.GC27505@sashak.voltaire.com> Hi, There is a new release of the management (OpenSM and infiniband diagnostics) tarballs available in: http://www.openfabrics.org/downloads/management/ md5sum: 0cbae1312313c46b9a7d6abae11d9e13 opensm-3.2.5.tar.gz 50b8b23a800e6a703fd8eddd274f94f0 infiniband-diags-1.4.4.tar.gz All component versions are from recent master branch. Full change log is below. Sasha Eli Dorfman (1): opensm/osm_trap_rcv.c disable the port with the least hop count Hal Rosenstock (4): opensm/osm_port_info_rcv.c: Remove SM from sm_guid_tbl when IsSM is not opensm.8.in: Update email address opensm/osm_prefix_route.h: prefix and guid are in network rather than host endian order opensm/osm_lid_mgr.c: Commentary fix Nicolas Morey Chaisemartin (1): OpenSM: Fixed GUID check against cn_guid_file usinf Ftree. Sasha Khapyorsky (23): opensm/osm_sa_sminfo_record: remove unused variable opensm/osm_subnet.c: fix high_limit sign when printing opensm/osm_subnet: don't reassign zeroed config params opensm: fix QoS config bug opensm/osm_trap_rcv.c: separate port disabling code opensm: disable switch ports only opensm: remove function names in OSM_LOG() string opensm/osm_sa_link_record: prevent potential endless recursion opensm/osm_sw_info_rcv: eliminate osm_node_get_any_physp_ptr() use opensm: remove osm_node_get_any_dr_part_ptr() function opensm: remove osm_node_get_any_physp_ptr() function opensm: free lft_buf right after use opensm: rename switch lft_buf to new_lft opensm: fix possible crash when disabling babbling port opensm/ftree: save lft_buf memory allocations opensm/man/opensm.8: add some missing stuff infiniband-diags/grouping: add 10G IP router devid opensm/osm_sa_path_record.c: minor indentation fix opensm: more 3.2.4 related things for RN opensm: fix race in main OpenSM flow. opensm: add RN to distributed docs list opensm: Release Notes update for next version (3.2.5) management: update package versions Yevgeny Kliteynik (3): opensm/osm_lid_mgr.c: cosmetics in log message opensm/osm_state_mgr.c: bug fix in unicast cache opensm: release notes for OSM 3.2.4 From sashak at voltaire.com Sat Dec 6 19:31:13 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 7 Dec 2008 05:31:13 +0200 Subject: [ofa-general] [ANNOUNCE] ibsim-0.5 tarball release Message-ID: <20081207033112.GD27505@sashak.voltaire.com> Hi, There is a new release of the ibsim tarball available in: http://www.openfabrics.org/downloads/management/ md5sum: d5383979b5728bb24b9fedc59dd2dc32 ibsim-0.5.tar.gz The version is from recent master branch. Full change log is below. Sasha Al Chu (3): ibsim: parse sim cmds via full name fix error message typo ibsim: add ReLink command Hal Rosenstock (15): ibsim/sim_net.c: Fix some typos ibsim/ibsim.c: Fix usage display ibsim/README: Clarify point of attachment/SIM_HOST use ibsim/README: Clarify point of attachment/SIM_HOST use ibsim/sim.h: Fix NodeDescription size ibsim/sim_cmd.c: Cosmetic changes to help ibsim/sim_mad.c: Cosmetic change to some debug messages ibsim/ibsim.c: Cosmetic change to some debug messages ibsim/README: Cosmetic commentary changes ibsim/TODO: Eliminate inet sockets from todo per previous README change ibsim: Remove some unused routines ibsim/sim.h: Allow max ports based on 36 port switches ibsim/sim_mad.c: Fix some typos ibsim/sim_mad.c: Cosmetic changes ibsim: Add support for vendor ID and system image GUID Ira Weiny (1): ibsim: Add per attribute drop error. Sasha Khapyorsky (50): defs.mk: append CFLAGS, LDFLAGS umad2sim: prevent recursive read()s ibsim.spec.in: export CFLAGS and LDFLAGS ibsim.spec.in: export CFLAGS and LDFLAGS with make install ibsim: prevent print buffer overflow ibsim: match client by nodeid and nodedesc ibsim: drop duplicated parameters from link_ports() tests/mcast_storm.c: test program example tests/mcast_storm.c: indentation fixes dist.sh: add tests directory Remove trailing whitespaces umad2sim/sim_client: fix debug prints format strings ibsim: fix do_cmd() prototype ibsim: remove out file from disconnect_client() parameter list ibsim: drop port capmask IsSM bit when SM client is disconnected ibsim: disconnect client when connection failed ibsim: reply timeout if peer client is unreachable umad2sim/sim_client: force port's capmask IsSM bit set option Update License: field in spec files tests/mcast_storm: fix length value passed to umad_send() tests/mcast_storm: check fopen() return status tests/mcast_storm: ib_gid_t to ibmad_gid_t rename ibsim: rename defport to default_port ibsim: fix nodeid size usages ibsim: don't fail on "unknown" options README: inet socket rediness support for short RMPP transactions umad2sim: fix snprintf() usage ibsim: fix snprintf() usage Merge branch 'master' of ssh://git.openfabrics.org/~sashak/scm/ibsim ibsim/sim_cmd: consolidate flows umad2sim: cosmetic update Voltaire copyright dates Encode agent id in request transaction id. ibsim: fix compilation warning ibsim: drop mad when attr method failed ibsim: make some stuff static tests/mcast_storm: fix uninitialized subnet prefix tests/mcast_storm: consolidate parameters tests/mcast_storm: various improvements tests/mcast_storm: add command line processor tests/mcast_storm: Add -G, -M and -I options tests/mcast_storm: add yet trivial single join test tests/mcast_storm: fix return value of make_gids_list() tests/mcast_storm: fix MGID command line value parsing tests/mcast_storm: use as default guid of self port tests/mcast_storm: 'joins' test sends a lot of requests tests/mcast_storm: make 'leave' test tests/mcast_storm: various improvements ibsim: version 0.5 Vincent Ficet (1): ibsim: handle EAGAIN error From sashak at voltaire.com Sat Dec 6 19:40:53 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 7 Dec 2008 05:40:53 +0200 Subject: [ofa-general] Re: [PATCH] opensm: fix race in main OpenSM flow. In-Reply-To: References: <20081204185209.GI6183@sashak.voltaire.com> Message-ID: <20081207034053.GE27505@sashak.voltaire.com> Hi Hal, On 16:03 Thu 04 Dec , Hal Rosenstock wrote: > > Looks to me like this has been there from around the following commit > or some related changes shortly thereafter: > > commit 1b2eb3daddbfa9fc555488cddbea12b01f6635a3 > Date: Mon Jan 28 03:10:18 2008 +0200 > > opensm: wait_for_pending_transaction() generalization > > Function wait_for_pending_transaction() is global now and moved from > PerfMgr to StateMgr, all related objects are generalized. > > If so, this is applicable to 3.1 and maybe also 3.0 based OpenSMs. No, it doesn't really affect main flow of 3.1 (and 3.0) - it was used only in discovery phase of PerfMgr (and when it is running in standby SM mode) which has experimental status in 3.1. But it does affect almost all OpenSM-3.2.x versions. Sasha From monis at Voltaire.COM Sat Dec 6 23:26:27 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Sun, 07 Dec 2008 09:26:27 +0200 Subject: ***SPAM*** Re: [ofa-general] [PATCH] IB/IPoIB: Decrease the time that invalid paths stay useless In-Reply-To: <49390B2E.60203@gmail.com> References: <4938010A.40701@Voltaire.COM> <49390B2E.60203@gmail.com> Message-ID: <493B7AA3.5000209@Voltaire.COM> Yossi Etigin wrote: > >> @@ -360,12 +360,15 @@ void ipoib_mark_paths_invalid(struct net_device >> *dev) >> spin_lock_irq(&priv->lock); >> >> list_for_each_entry_safe(path, tp, &priv->path_list, list) { >> - ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " >> invalid\n", >> + ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " >> stale\n", >> be16_to_cpu(path->pathrec.dlid), >> IPOIB_GID_ARG(path->pathrec.dgid)); >> - path->valid = 0; >> + path->stale = 1; >> } >> >> + if (!list_empty(&priv->path_list)) >> + queue_delayed_work(ipoib_workqueue, &priv->path_refresh_task, >> + round_jiffies_relative(HZ)); >> spin_unlock_irq(&priv->lock); >> } >> > > What if there is already an outstanding path query on one > of the paths you mark stale? ipoib_refresh_paths() will issue another > query, making it two > queries on the same path. Then, if you bring the device > down (call ipoib_flush_paths()) it will wait for completion > of one query, causing a crash. > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > Thanks. You're right. I think that the change to the patch below should fix what you pointed at Do you agree? @@ -551,9 +557,29 @@ static int path_rec_start(struct net_device *dev, return path->query_id; } + path->stale = 0; return 0; } +void ipoib_refresh_paths(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, path_refresh_task.work); + struct net_device *dev = priv->dev; + struct ipoib_path *path, *tp; + + spin_lock_irq(&priv->lock); + list_for_each_entry_safe(path, tp, &priv->path_list, list) { + ipoib_dbg(priv, "restart path LID 0x%04x GID " IPOIB_GID_FMT "\n", + be16_to_cpu(path->pathrec.dlid), + IPOIB_GID_ARG(path->pathrec.dgid)); + if (path->stale && !path->query) <<<<<<<<<<<<<<<< CHANGE IS HERE + path_rec_start(dev, path); + } + + spin_unlock_irq(&priv->lock); +} + static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); From dotanba at gmail.com Sat Dec 6 23:32:36 2008 From: dotanba at gmail.com (Dotan Barak) Date: Sun, 7 Dec 2008 09:32:36 +0200 Subject: ***SPAM*** Re: [ofa-general] ***SPAM*** libibverb-utils. In-Reply-To: <04CB02EA-2F7A-4240-B61D-AFF4DF310CDC@desu.edu> References: <04CB02EA-2F7A-4240-B61D-AFF4DF310CDC@desu.edu> Message-ID: <2f3bf9a60812062332h36275ecbndd2a4d5fd1597374@mail.gmail.com> On Fri, Dec 5, 2008 at 6:20 PM, Jaiwant Mulik wrote: > Hi all, > > Do the ibv_rc_pingpong, ibv_srq_pingpong, ibv_uc-pingpong and > ibv_ud_pingpong utilities work only for IB HCAs and not for iWARP cards? > Since those examples do not connect using the CMA, they are not expected to work over iWARP fabric. Dotan From alekseys at voltaire.com Sun Dec 7 00:47:49 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Sun, 07 Dec 2008 10:47:49 +0200 Subject: [ofa-general] [PATCH] cma_zero_addr In-Reply-To: References: <1228222680.14862.13.camel@alst60.voltaire.com> Message-ID: <1228639669.3833.10.camel@alst60.voltaire.com> PATCHv6 is the latest version. Should we use it? http://lists.openfabrics.org/pipermail/general/2008-December/055727.html And it could be nice if cma_zero_addr patch, will be accepted too. Ultimately, you word will be the final decision. On Fri, 2008-12-05 at 11:17 -0800, Roland Dreier wrote: > OK, I'm officially confused by all the acks, comments and patch versions > floating around (and stuff like this, which as far as I can tell is a > change that should just be rolled up into the original IPv6 support > patch). > > Can someone please send me the final, final versions of the 2 IPv6 > support patches so I know exactly what to apply? > > - R. From vlad at lists.openfabrics.org Sun Dec 7 03:20:58 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 7 Dec 2008 03:20:58 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081207-0200 daily build status Message-ID: <20081207112058.D5C51E60D12@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From ogerlitz at voltaire.com Sun Dec 7 05:18:43 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Sun, 07 Dec 2008 15:18:43 +0200 Subject: [ofa-general] Re: [PATCH] iser: avoid recv buf exhaustion v2 In-Reply-To: References: <20081126161213.000065c3@snort.melbourne.sgi.com> <1227676762-23505-1-git-send-email-ddiss@sgi.com> <20081205142844.00006b78@snort.melbourne.sgi.com> Message-ID: <493BCD33.6000805@voltaire.com> Roland Dreier wrote: > I'll wait to see the patch and wait for the iSER guys to ACK it, but in principle it sounds fine to apply. Yes, I'll get to review this and provide feedback tomorrow or the day after, sorry for the delay. Or. From ddiss at sgi.com Sun Dec 7 17:12:52 2008 From: ddiss at sgi.com (David Disseldorp) Date: Mon, 8 Dec 2008 12:12:52 +1100 Subject: [ofa-general] [PATCH] iser: avoid recv buf exhaustion v2 (resend) In-Reply-To: References: Message-ID: <1228698773-26528-1-git-send-email-ddiss@sgi.com> iSCSI/iSER targets may send PDUs without a prior request from the initiator, RFC 5046 refers to these PDUs as "unexpected". NOP-In PDUs with itt=RESERVED and Asynchronous Message PDUs occupy this category. The amount of active "unexpected" PDU's an iSER target may have at any time is governed by the MaxOutstandingUnexpectedPDUs key, which is not yet supported. Currently when an iSER target sends an "unexpected" PDU, the initiators recv buffer consumed by the PDU is not replaced. If over initial_post_recv_bufs_num "unexpected" PDUs are received then the receive queue will run out of receive work requests. This patch ensures recv buffers consumed by "unexpected" PDUs are replaced in the next iser_post_receive_control() call. Version 2: o replace unexpected recv bufs in iser_post_receive_control, transparent to iser_send_* functions. Signed-off-by: David Disseldorp Signed-off-by: Ken Sandars --- drivers/infiniband/ulp/iser/iscsi_iser.h | 3 + drivers/infiniband/ulp/iser/iser_initiator.c | 134 ++++++++++++++++++-------- drivers/infiniband/ulp/iser/iser_verbs.c | 1 + 3 files changed, 97 insertions(+), 41 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 81a8262..8611195 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -252,6 +252,9 @@ struct iser_conn { wait_queue_head_t wait; /* waitq for conn/disconn */ atomic_t post_recv_buf_count; /* posted rx count */ atomic_t post_send_buf_count; /* posted tx count */ + atomic_t unexpected_pdu_count;/* count of received * + * unexpected pdus * + * not yet retired */ char name[ISER_OBJECT_NAME_SIZE]; struct iser_page_vec *page_vec; /* represents SG to fmr maps* * maps serialized as tx is*/ diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c index cdd2831..a0c56a4 100644 --- a/drivers/infiniband/ulp/iser/iser_initiator.c +++ b/drivers/infiniband/ulp/iser/iser_initiator.c @@ -183,14 +183,8 @@ static int iser_post_receive_control(struct iscsi_conn *conn) struct iser_regd_buf *regd_data; struct iser_dto *recv_dto = NULL; struct iser_device *device = iser_conn->ib_conn->device; - int rx_data_size, err = 0; - - rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_NOIO); - if (rx_desc == NULL) { - iser_err("Failed to alloc desc for post recv\n"); - return -ENOMEM; - } - rx_desc->type = ISCSI_RX; + int rx_data_size, err; + int posts, outstanding_unexp_pdus; /* for the login sequence we must support rx of upto 8K; login is done * after conn create/bind (connect) and conn stop/bind (reconnect), @@ -201,46 +195,80 @@ static int iser_post_receive_control(struct iscsi_conn *conn) else /* FIXME till user space sets conn->max_recv_dlength correctly */ rx_data_size = 128; - rx_desc->data = kmalloc(rx_data_size, GFP_NOIO); - if (rx_desc->data == NULL) { - iser_err("Failed to alloc data buf for post recv\n"); - err = -ENOMEM; - goto post_rx_kmalloc_failure; - } + outstanding_unexp_pdus = + atomic_xchg(&iser_conn->ib_conn->unexpected_pdu_count, 0); - recv_dto = &rx_desc->dto; - recv_dto->ib_conn = iser_conn->ib_conn; - recv_dto->regd_vector_len = 0; + /* + * in addition to the response buffer, replace those consumed by + * unexpected pdus. + */ + for (posts = 0; posts < 1 + outstanding_unexp_pdus; posts++) { + rx_desc = kmem_cache_alloc(ig.desc_cache, GFP_NOIO); + if (rx_desc == NULL) { + iser_err("Failed to alloc desc for post recv %d\n", + posts); + err = -ENOMEM; + goto post_rx_cache_alloc_failure; + } + rx_desc->type = ISCSI_RX; + rx_desc->data = kmalloc(rx_data_size, GFP_NOIO); + if (rx_desc->data == NULL) { + iser_err("Failed to alloc data buf for post recv %d\n", + posts); + err = -ENOMEM; + goto post_rx_kmalloc_failure; + } - regd_hdr = &rx_desc->hdr_regd_buf; - memset(regd_hdr, 0, sizeof(struct iser_regd_buf)); - regd_hdr->device = device; - regd_hdr->virt_addr = rx_desc; /* == &rx_desc->iser_header */ - regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; + recv_dto = &rx_desc->dto; + recv_dto->ib_conn = iser_conn->ib_conn; + recv_dto->regd_vector_len = 0; - iser_reg_single(device, regd_hdr, DMA_FROM_DEVICE); + regd_hdr = &rx_desc->hdr_regd_buf; + memset(regd_hdr, 0, sizeof(struct iser_regd_buf)); + regd_hdr->device = device; + regd_hdr->virt_addr = rx_desc; /* == &rx_desc->iser_header */ + regd_hdr->data_size = ISER_TOTAL_HEADERS_LEN; - iser_dto_add_regd_buff(recv_dto, regd_hdr, 0, 0); + iser_reg_single(device, regd_hdr, DMA_FROM_DEVICE); - regd_data = &rx_desc->data_regd_buf; - memset(regd_data, 0, sizeof(struct iser_regd_buf)); - regd_data->device = device; - regd_data->virt_addr = rx_desc->data; - regd_data->data_size = rx_data_size; + iser_dto_add_regd_buff(recv_dto, regd_hdr, 0, 0); - iser_reg_single(device, regd_data, DMA_FROM_DEVICE); + regd_data = &rx_desc->data_regd_buf; + memset(regd_data, 0, sizeof(struct iser_regd_buf)); + regd_data->device = device; + regd_data->virt_addr = rx_desc->data; + regd_data->data_size = rx_data_size; - iser_dto_add_regd_buff(recv_dto, regd_data, 0, 0); + iser_reg_single(device, regd_data, DMA_FROM_DEVICE); - err = iser_post_recv(rx_desc); - if (!err) - return 0; + iser_dto_add_regd_buff(recv_dto, regd_data, 0, 0); - /* iser_post_recv failed */ + err = iser_post_recv(rx_desc); + if (err) { + iser_err("Failed iser_post_recv for post %d\n", posts); + goto post_rx_post_recv_failure; + } + } + /* all posts successful */ + return 0; + +post_rx_post_recv_failure: iser_dto_buffs_release(recv_dto); kfree(rx_desc->data); post_rx_kmalloc_failure: kmem_cache_free(ig.desc_cache, rx_desc); +post_rx_cache_alloc_failure: + if (posts > 0) { + /* + * response buffer posted, but did not replace all unexpected + * pdu recv bufs. Ignore error, retry occurs next send + */ + outstanding_unexp_pdus -= (posts - 1); + err = 0; + } + atomic_add(outstanding_unexp_pdus, + &iser_conn->ib_conn->unexpected_pdu_count); + return err; } @@ -274,8 +302,10 @@ int iser_conn_set_full_featured_mode(struct iscsi_conn *conn) struct iscsi_iser_conn *iser_conn = conn->dd_data; int i; - /* no need to keep it in a var, we are after login so if this should - * be negotiated, by now the result should be available here */ + /* + * FIXME this value should be declared to the target during login with + * the MaxOutstandingUnexpectedPDUs key when supported + */ int initial_post_recv_bufs_num = ISER_MAX_RX_MISC_PDUS; iser_dbg("Initially post: %d\n", initial_post_recv_bufs_num); @@ -478,6 +508,7 @@ int iser_send_control(struct iscsi_conn *conn, int err = 0; struct iser_regd_buf *regd_buf; struct iser_device *device; + unsigned char opcode; if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); @@ -512,10 +543,16 @@ int iser_send_control(struct iscsi_conn *conn, data_seg_len); } - if (iser_post_receive_control(conn) != 0) { - iser_err("post_rcv_buff failed!\n"); - err = -ENOMEM; - goto send_control_error; + opcode = task->hdr->opcode & ISCSI_OPCODE_MASK; + + /* post recv buffer for response if one is expected */ + if (!((opcode == ISCSI_OP_NOOP_OUT) + && (task->hdr->itt == RESERVED_ITT))) { + if (iser_post_receive_control(conn) != 0) { + iser_err("post_rcv_buff failed!\n"); + err = -ENOMEM; + goto send_control_error; + } } err = iser_post_send(mdesc); @@ -586,6 +623,21 @@ void iser_rcv_completion(struct iser_desc *rx_desc, * parallel to the execution of iser_conn_term. So the code that waits * * for the posted rx bufs refcount to become zero handles everything */ atomic_dec(&conn->ib_conn->post_recv_buf_count); + + /* + * if an unexpected PDU was received then the recv wr consumed must + * be replaced, this is done in the next send of a control-type PDU + */ + if ((opcode == ISCSI_OP_NOOP_IN) + && (hdr->itt == RESERVED_ITT)) { + /* nop-in with itt = 0xffffffff */ + atomic_inc(&conn->ib_conn->unexpected_pdu_count); + } + else if (opcode == ISCSI_OP_ASYNC_EVENT) { + /* asyncronous message */ + atomic_inc(&conn->ib_conn->unexpected_pdu_count); + } + /* a reject PDU consumes the recv buf posted for the response */ } void iser_snd_completion(struct iser_desc *tx_desc) diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 26ff621..6dc6b17 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -498,6 +498,7 @@ void iser_conn_init(struct iser_conn *ib_conn) init_waitqueue_head(&ib_conn->wait); atomic_set(&ib_conn->post_recv_buf_count, 0); atomic_set(&ib_conn->post_send_buf_count, 0); + atomic_set(&ib_conn->unexpected_pdu_count, 0); atomic_set(&ib_conn->refcount, 1); INIT_LIST_HEAD(&ib_conn->conn_list); spin_lock_init(&ib_conn->lock); -- 1.5.4.5 From nicolas.morey-chaisemartin at ext.bull.net Mon Dec 8 00:24:21 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Mon, 08 Dec 2008 09:24:21 +0100 Subject: [ofa-general] Multipath and IB Bonding Message-ID: <493CD9B5.60108@ext.bull.net> Hello everyone! In our current project, we are working with nodes containing 2 IB QDR HCA (2 cards with 1 port each). Both of these ports will be connected on the same interconnect/subnet. Currently, most of our applications (Lustre, NFS...) only use one the ports as both are on the same subnet. It works, however, half of the available bandwidth is not used and when a link is lost, the applications are not able to fail over on the second port. To provide a unique solution to this problem, we are exploring the possibility of "IB Bonding". More precisely, virtualizing the libverbs (kernel mode only) so the applications (kernel modules in fact) see only one interface and QP (virtual) while there may be many underneath. As we only work with RC QP we will only try to virtualize this protocol. we don't have any needs for virtual RD, UC or UD yet. We have some ideas about the way to implement it but before starting, we'd like to have some opinions from you about what might go wrong, what might be impossible and so on... Eventually, if people are interested in this, we'd be glad to share our results and get some help on this. Best Regards Nicolas Morey-Chaisemartin From tziporet at mellanox.co.il Mon Dec 8 01:30:08 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 8 Dec 2008 11:30:08 +0200 Subject: [ofa-general] RE: [PATCH] OFED docs/QoS_management_in_OpenSM.txt: fixes for OFED 1.4 In-Reply-To: <49384B0D.6000301@dev.mellanox.co.il> Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD01256AFB@mtlexch01.mtl.com> applied Tziporet From dorfman.eli at gmail.com Mon Dec 8 01:41:18 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Mon, 08 Dec 2008 11:41:18 +0200 Subject: [ofa-general] ***SPAM*** [PATCH] opensm/osm_inform.c report IB traps to plugin Message-ID: <493CEBBE.2020407@gmail.com> report IB traps to plugin Signed-off-by: Eli Dorfman --- opensm/opensm/osm_inform.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c index f3c8ed7..bb16e3a 100644 --- a/opensm/opensm/osm_inform.c +++ b/opensm/opensm/osm_inform.c @@ -565,7 +565,8 @@ osm_report_notice(IN osm_log_t * const p_log, } /* an official Event information log */ - if (ib_notice_is_generic(p_ntc)) + if (ib_notice_is_generic(p_ntc)) { + osm_opensm_report_event(p_subn->p_osm, OSM_EVENT_ID_TRAP, p_ntc); OSM_LOG(p_log, OSM_LOG_INFO, "Reporting Generic Notice type:%u num:%u (%s)" " from LID:%u GID:%s\n", @@ -575,6 +576,7 @@ osm_report_notice(IN osm_log_t * const p_log, cl_ntoh16(p_ntc->issuer_lid), inet_ntop(AF_INET6, p_ntc->issuer_gid.raw, gid_str, sizeof gid_str)); + } else OSM_LOG(p_log, OSM_LOG_INFO, "Reporting Vendor Notice type:%u vend:%u dev:%u" -- 1.5.5 From PHF at zurich.ibm.com Mon Dec 8 02:26:37 2008 From: PHF at zurich.ibm.com (Philip Frey1) Date: Mon, 8 Dec 2008 11:26:37 +0100 Subject: [ofa-general] ibv_create_cq: what does comp_vector argument stand for? Message-ID: Hi, what is the semantical meaning of the last argument (int comp_vector) with regard to iWARP? How is the context->num_comp_vectors specified? What would be a use case for anything other than 0? In all the example code I have read it was always set to 0. Many thanks for your help, Philip -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at lists.openfabrics.org Mon Dec 8 03:22:38 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 8 Dec 2008 03:22:38 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081208-0200 daily build status Message-ID: <20081208112238.C5FB3E60870@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From kliteyn at dev.mellanox.co.il Mon Dec 8 03:27:04 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 08 Dec 2008 13:27:04 +0200 Subject: [ofa-general] [PATCH v2] OFED docs/QoS_architecture.txt: fixes for OFED 1.4 Message-ID: <493D0488.4090206@dev.mellanox.co.il> Tziporet, Some fixes for OFED 1.4 release in QoS_architecture.txt. Signed-off-by: Yevgeny Kliteynik --- QoS_architecture.txt | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/QoS_architecture.txt b/QoS_architecture.txt index 5dffff4..1c19a98 100644 --- a/QoS_architecture.txt +++ b/QoS_architecture.txt @@ -114,14 +114,14 @@ I) Port Group: a set of CAs, Routers or Switches that share the same settings. list of GUIDs, or list of port names based on NodeDescription. II) Fabric Setup: Defines how the SL2VL and VLArb tables should be setup. - NOTE: In OFED 1.3 this part of the policy is ignored. SL2VL and VLArb + NOTE: Currently this part of the policy is ignored. SL2VL and VLArb tables should be configured in the OpenSM options file (opensm.opts). III) QoS-Levels Definition: This section defines the possible sets of parameters for QoS that a client might be mapped to. Each set holds SL and optionally: Max MTU, Max Rate, Packet Lifetime and Path Bits. - NOTE: Path Bits are not implemented in OFED 1.3 + NOTE: Currently, Path Bits are not implemented. IV) Matching Rules: A list of rules that match an incoming PR/MPR request to a QoS-Level. The rules are processed in order such as the first match @@ -191,8 +191,8 @@ SA reports its ability to handle QoS PR/MPRs. ============================================================================== Similar to RDS, iSER also uses CMA. The Service-ID for iSER is similar to RDS -(0x000000000106PPPP), with default port number 0x035C, which makes a default -Service-ID 0x000000000106035C. +(0x000000000106PPPP), with default port number 0x0CBC, which makes a default +Service-ID 0x0000000001060CBC. ============================================================================== -- 1.5.1.4 From kliteyn at dev.mellanox.co.il Mon Dec 8 03:31:04 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 08 Dec 2008 13:31:04 +0200 Subject: [ofa-general] [PATCH] OFED docs/QoS_architecture.txt: remove trailing white space Message-ID: <493D0578.2040904@dev.mellanox.co.il> Signed-off-by: Yevgeny Kliteynik --- QoS_architecture.txt | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/QoS_architecture.txt b/QoS_architecture.txt index 72f9e83..5dffff4 100644 --- a/QoS_architecture.txt +++ b/QoS_architecture.txt @@ -166,7 +166,7 @@ holding the remote TCP/IP Port Number to connect to. 7. RDS ============================================================================== -RDS uses CMA and thus it is very close to SDP. The Service-ID for RDS is +RDS uses CMA and thus it is very close to SDP. The Service-ID for RDS is 0x000000000106PPPP, where PPPP are 4 hex digits holding the TCP/IP Port Number that the protocol connects to. Default port number for RDS is 0x48CA, which makes a default Service-ID -- 1.5.1.4 From tziporet at dev.mellanox.co.il Mon Dec 8 06:02:05 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 08 Dec 2008 16:02:05 +0200 Subject: [ofa-general] Re: [PATCH v2] OFED docs/QoS_architecture.txt: fixes for OFED 1.4 In-Reply-To: <493D0488.4090206@dev.mellanox.co.il> References: <493D0488.4090206@dev.mellanox.co.il> Message-ID: <493D28DD.2020303@mellanox.co.il> Yevgeny Kliteynik wrote: > Tziporet, > > Some fixes for OFED 1.4 release in QoS_architecture.txt. > > > thanks Applied Tziporet From yossi.openib at gmail.com Mon Dec 8 06:38:38 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Mon, 08 Dec 2008 16:38:38 +0200 Subject: ***SPAM*** Re: [ofa-general] [PATCH] IB/IPoIB: Decrease the time that invalid paths stay useless In-Reply-To: <493B7AA3.5000209@Voltaire.COM> References: <4938010A.40701@Voltaire.COM> <49390B2E.60203@gmail.com> <493B7AA3.5000209@Voltaire.COM> Message-ID: <493D316E.6010203@gmail.com> Looks OK to me Moni Shoua wrote: > Yossi Etigin wrote: >>> @@ -360,12 +360,15 @@ void ipoib_mark_paths_invalid(struct net_device >>> *dev) >>> spin_lock_irq(&priv->lock); >>> >>> list_for_each_entry_safe(path, tp, &priv->path_list, list) { >>> - ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " >>> invalid\n", >>> + ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " >>> stale\n", >>> be16_to_cpu(path->pathrec.dlid), >>> IPOIB_GID_ARG(path->pathrec.dgid)); >>> - path->valid = 0; >>> + path->stale = 1; >>> } >>> >>> + if (!list_empty(&priv->path_list)) >>> + queue_delayed_work(ipoib_workqueue, &priv->path_refresh_task, >>> + round_jiffies_relative(HZ)); >>> spin_unlock_irq(&priv->lock); >>> } >>> >> What if there is already an outstanding path query on one >> of the paths you mark stale? ipoib_refresh_paths() will issue another >> query, making it two >> queries on the same path. Then, if you bring the device >> down (call ipoib_flush_paths()) it will wait for completion >> of one query, causing a crash. >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > Thanks. You're right. I think that the change to the patch below should fix what you pointed at > Do you agree? > > @@ -551,9 +557,29 @@ static int path_rec_start(struct net_device *dev, > return path->query_id; > } > > + path->stale = 0; > return 0; > } > > +void ipoib_refresh_paths(struct work_struct *work) > +{ > + struct ipoib_dev_priv *priv = > + container_of(work, struct ipoib_dev_priv, path_refresh_task.work); > + struct net_device *dev = priv->dev; > + struct ipoib_path *path, *tp; > + > + spin_lock_irq(&priv->lock); > + list_for_each_entry_safe(path, tp, &priv->path_list, list) { > + ipoib_dbg(priv, "restart path LID 0x%04x GID " IPOIB_GID_FMT "\n", > + be16_to_cpu(path->pathrec.dlid), > + IPOIB_GID_ARG(path->pathrec.dgid)); > + if (path->stale && !path->query) <<<<<<<<<<<<<<<< CHANGE IS HERE > + path_rec_start(dev, path); > + } > + > + spin_unlock_irq(&priv->lock); > +} > + > static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); > > From tziporet at mellanox.co.il Mon Dec 8 07:23:30 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 8 Dec 2008 17:23:30 +0200 Subject: [ofa-general] Agenda for OFED meeting today - Dec 8, 08 Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD0129F140@mtlexch01.mtl.com> Hi, The agenda for the OFED meeting today: 1. Release date: Due to 2 more fixes that are under verification I wish to have the release on Wed - Dec 10 and not tomorrow as we planned. Major bugs status: 1383 blo jackm at mellanox.co.il Local protection error on transmit from ipoib datagram to... - we have a fix - need more testing 1395 maj vu at mellanox.com kernel panic during SRP HA test - we have a fix - patch to be sent by Vu today 1434 maj andy.grover at oracle.com RDS RDMA mode does not work on QLogic HCAs - ?? 2. Logo program report status - Rupert 3. Documents update - all the following docs must be updated: iser_release_notes.txt: - Doron mlx4_release_notes.txt: - Jack & Yevgeny P MPI_README.txt: - Pasha and Jeff S. mvapich_release_notes.txt: - Pasha ib-bonding.txt - Moni Shoua open_mpi_release_notes.txt: - Jeff S. opensm_release_notes.txt:Date: - Sasha PERF_TEST_README.txt: - Oren M. rdma_cm_release_notes.txt: - Sean srp_release_notes.txt: - Vu Pham diags_release_notes.txt: - Oren K. MSTFLINT_README.txt - Oren K. mpi-selector_release_notes.txt: - Jeff S. mthca_release_notes.txt: - Jack qperf_release_notes.txt: - Johann sdp_release_notes.txt: - Amir - OK NFS-RDMA - a new document - Jeff Backer iSER target - - a new document - Voltaire (Doron) 4. Open discussion Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From ogerlitz at voltaire.com Mon Dec 8 07:39:17 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Mon, 08 Dec 2008 17:39:17 +0200 Subject: [ofa-general] [PATCH] iser: avoid recv buf exhaustion v2 (resend) In-Reply-To: <1228698773-26528-1-git-send-email-ddiss@sgi.com> References: <1228698773-26528-1-git-send-email-ddiss@sgi.com> Message-ID: <493D3FA5.8070406@voltaire.com> David Disseldorp wrote: > --- a/drivers/infiniband/ulp/iser/iser_initiator.c > +++ b/drivers/infiniband/ulp/iser/iser_initiator.c > @@ -478,6 +508,7 @@ int iser_send_control(struct iscsi_conn *conn, > int err = 0; > struct iser_regd_buf *regd_buf; > struct iser_device *device; > + unsigned char opcode; > > if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) { > iser_err("Failed to send, conn: 0x%p is not up\n", iser_conn->ib_conn); > @@ -512,10 +543,16 @@ int iser_send_control(struct iscsi_conn *conn, > data_seg_len); > } > > - if (iser_post_receive_control(conn) != 0) { > - iser_err("post_rcv_buff failed!\n"); > - err = -ENOMEM; > - goto send_control_error; > + opcode = task->hdr->opcode & ISCSI_OPCODE_MASK; > + > + /* post recv buffer for response if one is expected */ > + if (!((opcode == ISCSI_OP_NOOP_OUT) > + && (task->hdr->itt == RESERVED_ITT))) { > + if (iser_post_receive_control(conn) != 0) { > + iser_err("post_rcv_buff failed!\n"); > + err = -ENOMEM; > + goto send_control_error; > + } > } This logic will not let us refill the receive buffers consumes by unexpected PDUs when a Nop-Out with reserved ITT is sent. So the next refill will take place when the next scsi command is issued or when a Nop-Out will be sent with non-reserved ITT. I am fine with this. Acked-by: Or Gerlitz From kliteyn at dev.mellanox.co.il Mon Dec 8 08:06:09 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Mon, 08 Dec 2008 18:06:09 +0200 Subject: [ofa-general] ***SPAM*** [PATCH] OFED docs/opensm_release_notes.txt: OpenSM 3.2.5 RN Message-ID: <493D45F1.3040904@dev.mellanox.co.il> OpenSM 3.2.5 Release Notes for OFED 1.4 docs. Signed-off-by: Yevgeny Kliteynik --- opensm_release_notes.txt | 402 ++++++++++++++++++++++++++++++---------------- 1 files changed, 264 insertions(+), 138 deletions(-) diff --git a/opensm_release_notes.txt b/opensm_release_notes.txt index 007c85a..3356e95 100644 --- a/opensm_release_notes.txt +++ b/opensm_release_notes.txt @@ -1,121 +1,209 @@ - OpenSM Release Notes 3.1.11 + OpenSM Release Notes 3.2 ============================= -Version: OpenFabrics Enterprise Distribution (OFED) 1.3 -Repo: git://git.openfabrics.org/~ofed_1_3/management.git (release) - git://git.openfabrics.org/~sashak/management.git (development) -Date: June 2008 +Version: OpenSM 3.2.x +Repo: git://git.openfabrics.org/~sashak/management.git +Date: Dec 2008 1 Overview ---------- -This document describes the contents of the OpenSM OFED 1.3 release. +This document describes the contents of the OpenSM 3.2 release. OpenSM is an InfiniBand compliant Subnet Manager and Administration, and runs on top of OpenIB. The OpenSM version for this release -is openib-3.1.11 +is opensm-3.2.5 This document includes the following sections: 1 This Overview section (describing new features and software dependencies) 2 Known Issues And Limitations 3 Unsupported IB compliance statements -4 Major Bug Fixes +4 Bug Fixes 5 Main Verification Flows -6 Qualified software stacks and devices +6 Qualified Software Stacks and Devices 1.1 Major New Features -* QoS manager (experimental) - This QoS manager implementation is in accordance with IBA QoS Annex. - Highly configurable QoS Policy is parsed from OpenSM QoS policy file. - Valid QoS parameters will be reported in SA PathRecord and - MultiPathRecord. In addition simple QoS levels per ULPs configuration - is supported too. - -* Performance Manager - When enabled it collects a fabric port counters and able to log it or - to pass to external program via event plugin interface. It handles - counters overflow, supports LID/QP redirection and is able to work - when OpenSM is in master, standby, and inactive states. - -* Dimension Order routing (DOR) algorithm - DOR Unicast routing algorithm - based on the Min Hop algorithm, but - avoids port equalization except for redundant links between the - same two switches. This provides deadlock free routes for hypercubes - when the fabric is cabled as a hypercube and for meshes when cabled - as a mesh (see details in OpenSM man page). - -* Routing improvements - Speedup the current routing algorithms default MinHops, Up/Down and - LASH and lid matrix generation. Fat Tree routing engine is able to work - with not pure fat free topology. - -* Multiple IB routers support - OpenSM now able to keep configurable subnet prefix to router table. - SA will report path to this routers when SA PathRecord was issued with - non-local DGID. - -* Node map - This is possible to name nodes in this config file. Those names will be - used for logging and by QoS configuration. - -* PKey index support - Proper support for PKey index in GSI queries. - -* Incremental LFTs, PKey, SL2VL, and VLarbitration table updates - OpenSM will only fetch those tables in first heavy sweep and then - will maintain this internally. - -* Fast port and switch detector - When port and/or switch was externally reset and it was fast so sweep - doesn't find this device as disconnected OpenSM will detect this by - changed port states and handle accordingly. - -* Duplicated GUIDs/port moving detector - OpenSM will be able to detect port moving during a fabric discovery - and will not report duplicated GUIDs in this case. - -* Multicast rerouting speedup - Now OpenSM will calculate and setup multicast forwarding tables for - all altered multicast groups and not for each one. - -* Event plugin API - OpenSM allows to load dynamically various plugin modules. - -* Many generic improvements +* Cached Routing + OpenSM provides an optional unicast routing cache (enabled by '-A' or + '--ucast_cache' options). When enabled, unicast routing cache prevents + routing recalculation (which is a heavy task in a large cluster) when + there was no topology change detected during the heavy sweep, or when + the topology change does not require new routing calculation, e.g. when + one or more CAs/RTRs/leaf switches going down, or one or more of these + nodes coming back after being down. + +* Routing Chaining + Routing chaining is the ability to configure the order in which routing + algorithms are applied in opensm, i.e. '-R ftree,updn,minhop' - try + using ftree routing. If ftree fails, try updn. If updn fails, try + minhop. + +* IPv6 Solicited Node Multicast addresses consolidation + When this mode is used (enabled with --consolidate_ipv6_snm_req option) + OpenSM will map all IPv6 Solicited Node Multicast address join requests + into a single Multicast group with address ff10:601b::1:ff00:0. In this + way limited MLID space is saved. This IBA noncompliant feature is very + useful with large (~> 1024 nodes) clusters. + +* OpenSM sweep state machine rework + Huge and buggy OpenSM sweep state machine was fully rewritten in safer + and more effective synchronous manner. + +* Multi lid routing balancing for updn/minhop routing algorithms + When LMC > 0 is used OpenSM will ensure to generate routing paths via + different switches and when possible chassis. + +* Preserve base lid routes when LMC > 0 + When LMC > 0 is used OpenSM will preserve routing paths for base lids + as it would be with LMC = 0. In this way traffic on each LID level is + not affected by LMC changes. + +* Ordered routing paths balancing + This adds ability to predefine the port order in which routing paths + balancing is performed by OpenSM. Helps to improve performance + dramatically (40-50%) for applications with known communication + pattern. Activated with --guid_routing_order_file command line option. + +* Unified OpenSM configuration + Now there is "conventional" config file instead of hidden option cache + file (opensm.opts). OpenSM will find this in a default place (consult + man page for exact value) or the file name can be specified with '-F' + command line option. Also there is an option ('-c') to generate config + file template. + +* Query remote SMs during light sweep + Master OpenSM will query remote standby SMs periodically to catch its + possible state changes and react accordingly (as required by IBA spec). + +* Predefined port ids for Up/Down algorithm + This is useful as Up/Down fine tuning tool - the algorithm will use + predefined port IDs instead of GUIDs for its decision about direction. + Activated with --ids_guid_file command line option. + +* Improved plugin API version 2. + Now OpenSM will provide to plugins the access to all data structures. + This make it possible to implement powerful multi purpose plugins. All + OpenSM header files are installed now and specific configuration/build + options are exported via generated osm_config.h header file. + +* Many code improvements, optimizations and cleanups + +* Automatic daily snapshots generation. + This is is not a "feature", but simplifies the access to recent OpenSM + bits. 1.2 Minor New Features: -* Daemon mode can be activated with -B option. +* Cleanup cl_qlock_pool memory allocator - speedup memory allocations -* Support multiple scopes for IPoIB multicast groups in partition config. +* Support for configurable (via OSM_UMAD_MAX_PENDING environment variable) + size of pending MADs pool. -* Loopback connection handling - Loopback connection is not interpreted as duplicated GUID anymore. +* Set packet life time to subnet timeout option rather than default -* Connect root nodes option for Up/Down routing engine. - When this option is specified Up/Down will create routing paths between - its root nodes. +* Enforce routing paths rebalancing on switch reconnection -* Dump and log filenames changed from osm* to opensm*. +* In Up/Down routing algorithm compare GUID values in host byte order -* Support loopback console - Socket console with only local access. +* Add 'switchbalance' and 'lidbalance' commands for OpenSM console -* Configurable config directory (the default value is /etc/opensm) and - configurable default values of OpenSM config filenames. +* Respond to new trap 144 node description update flag -* Add option for force SDR link speed - Add option to opensm.opts to force link speed. Currently, only forcing - to SDR link speed is supported. This option is not supported as a - command line option. +* Add '--connect_roots' command line options. This preserves connectivity + between root nodes in Up/Down routing algorithm -* Better packaging - Building and RPM packaging were improved and simplified. +* Setting SL in the IPoIB MCast groups in accordance with QoS policy -* Handle "babbling" ports - When a babbling port (port which causes a frequent trap generation) is - detected, OpenSM will disable the port which should terminate the trap - storm. +* Dump auto detected root node guids in Up/Down routing algorithm + +* Unify OpenSM dumpers code + +* Unify various guid files parsers - add generic nodenamemap style parser + +* When root node guids were provided in file update the list on each + Up/Down run + +* During ./configure show values of configuration dirs and files + +* Make prefix routes config file name configurable + +* Add a Performance Manager HOWTO to the docs and the dist + +* Support separate SA and SM keys as clarified in IBA 1.2.1 + +* Remove AM_MAINTAINER_MODE in ./configure + +* Make vendor type OSM_VENDOR_INTF_OPENIB (libibumad) to be default + +* Build osm_perfmgr_db.* content only when PerfMgr is enabled. + +* Move PerfMgr event_db_dump_file to common OpenSM dump dir + +* Allow space separated strings as values in OpenSM config + +* Support for multiple event plugins + +* Add '--version' command line option + +* Add '--create-config ' command line option + +* Speedup and simplify logging code + +* Speedup multicast processing in SA DB + +* In log messages convert unicast LIDs from hex to decimal format and + GIDs from hex to IPv6 address format + +* Handle all possible ports in "ignore-guids" file + +* Add 'reroute' console command + +* Remove many install-exec-hook from Makefiles + +* Some cleanups in LASH routing algorithm code + +* In Makefiles remove -rpath and explicit -lpthread, -ldl from LDFLAGS + (move to configurator) + +* Install all OpenSM header files + +* Improve locking in SM Info receiver + +* Add new OSM_EVENT_ID_SUBNET_UP event for plugins + +* Redo lex and yacc files generation in conventional way + +* Add a missing Node Description check on light sweep. + +* Move vendor specific compilation defines from command to generated + config.h file + +* Provide useful error message when log file opening fails + +* Add generated osm_config.h file with OpenSM specific defines + +* Display port number in decimal in log messages + +* Replace osm_vendor_select.h by generated osm_config.h + +* Unify options listing in OpenSM usage message + +* LFT buffers handling simplification + +* Add 'dump_conf' console command + +* OpenSM performs sweep on SIGCONT (coming out of suspend). + +* When our SM is in Standby state and its priority is increased + (via console command), notify master SM by sending Trap 144. + +* When entering standby state (after discovery) notify master SM + with Trap 144. + +* support more PortInfo:CapabilityMask bits + +* When babbling port policy is on disable the port with the least hop + count. 1.3 Library API Changes @@ -123,12 +211,12 @@ This document includes the following sections: 1.4 Software Dependencies -OpenSM depends on the installation of either OFED 1.3, OFED 1.2, OFED 1.1, -OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD -distribution), or Mellanox VAPI stacks. The qualified driver versions -are provided in Table 2, "Qualified IB Stacks". +OpenSM depends on the installation of either OFED 1.x, OpenIB gen2 (e.g. +IBG2 distribution), OpenIB gen1 (e.g. IBGD distribution), or Mellanox +VAPI stacks. The qualified driver versions are provided in Table 2, +"Qualified IB Stacks". -Also building of QoS manager policy file parser requires flex, and either +Also, building of QoS manager policy file parser requires flex, and either bison or byacc installed. 1.5 Supported Devices Firmware @@ -210,76 +298,105 @@ information regarding each compliance statement. * C15-0.1.14 (Services): Provide means to associate service name and ServiceKeys. -4 Major Bug Fixes ------------------ +4 Bug Fixes +----------- + +4.1 Major Bug Fixes -The following is a list of bugs that were fixed. Note that other less critical -or visible bugs were also fixed. +* Set SA attribute offset to 0 when no records are returned -* osm_ucast_ftree.c: do load-leveling of non-CN routes +* Send trap 64 only after new ports are in ACTIVE state. -* osm_ucast_ftree.c: ignore port 0 and loopbacks on switches +* Fix in sending client reregistration bit -* lash: fix possible segfault in osm_get_lash_sl() +* Fix default OpenSM SM (and SA) Key byte order -* osm_ucast_ftree.c: fixing coredump in fat-tree routing +* Fix in sending Multicast groups creation/deletion notification (Traps + 66,67) -* osm_sa_slvl_record: fix overflow crash +* Don't startup automatically on SuSE based systems -* Break multicast rerouting requests processing when heavy sweep is - scheduled. +4.2 Other Bug Fixes -* updn: report fallback properly +* opensm/osm_console.c: fix seg fault when running "portstatus ca" in + the console -* Fix incorrect identification of routing engine used +* opensm: fix potential core dumps where osm_node_get_physp_ptr can + return NULL -* Don't zero base LID when invalid value is received +* opensm/osm_mcast_mgr: limit spanning tree creation recursion to value + of max hops (64) -* lash: fix wrong allocation size +* opensm: switch LFTs incremental update fix -* Fixing broken logic in 'process world' part of LinkRecord processing +* opensm/osm_state_mgr.c: fix segmentation fault -* Fix lmc_mask bit order in osm_sa_link_record.c +* opensm: eliminate some potential NULL pointer dereferences -* Adding missing comparison by to_lid/from_lid in LinkRecord processing +* opensm/osm_console.c: fix guid parsing -* Broken logic when scanning subnet for PIR request +* opensm: fix off by 1 issue with max_lid and max_multicat_lid_ho -* No interactive games in daemon mode +* opensm: fix potentially wrong port_guid initialization -* Fixing memory leak in node description +* opensm/configure.in: fix wrong HAVE_DEFAULT_OPENSM_CONFIG_FILE define + generation -* Fix PortInfo update issues for switch port 0 +* opensm: fix snprintf() usage -* Changed method_mask type in user_mad interface in accordance with - kernel ABI +* opensm/osm_sa_lft_record: validate LFT block number -* Use umad_get_issm_path() in osm_vendor_set_sm() +* opensm/osm_sa_lft_record: pass block parameter in host byte order -* Report message fix +* opensm/include/Makefile.am: don't duplicate header files in EXTRA_DIST -* Uninitialized variables usage fix +* opensm/osm_sa_class_port_info.c: fix over bound array access -* osm_ucast_ftree.c: Possible NULL ptr seg fault +* osmtest/osmt_service.c: fix over bound array access -* osm_mcast_mgr.c: Possible NULL ptr seg fault +* osmtest: fix qpn encoding in osmtest_informinfo_request() -* TrapRepress was failing for mkey != 0 +* opensm/osm_vendor_mlx_sa.c: handling attribute offset of 0 -* IB_PR_COMPMASK was used in MPR +* opensm: fix segfault corner case when osm_console_init fails -* Set hop limit when creating ipoib multicast groups +* opensm/console: close console socket on cleanup path -* Fix outstanding mad counters tracking on the error paths. +* opensm/osm_ucast_lash: fix buffer overflow -* Report new ports before handover mastership +* opensm: fix broken IPv6 SNM consolidation code -* Fix opvls and neighbormtu when remote port invalid. +* opensm/osm_sa_lft_record.c: fix block number encoding byte order -* Bug in coding trying to set vl_arb_high_limit when PortInfo.base_lid - was still zero. +* opensm/osm_sa: fix memory leak in SA responder -* Protect SMInfo response against port moving issue. +* opensm/osm_mcast_mgr: fix memory leak + +* opensm: fix qos config parsing bugs + +* opensm/osm_mcast_tbl.c: fix sending invalid MF block due to max mlid + overflow + +* opensm: log_max_size config parameter in MB + +* opensm/osm_ucast_lash: fix extra memory allocations + +* opensm: fix race in main OpenSM flow + +* opensm/ftree: fix GUID check against cn_guid_file + +* opensm/ftree: save FLT buffers memory allocations + +* opensm/osm_sa_link_record.c: prevent potential endless recursion + +* opensm: remove SM from sm_guid_tbl when IsSM port capability flag is + not set + +* opensm: fix QoS config bug + +* opensm: don't reassign zeroed params from config file + +* Other less critical or visible bugs were also fixed. 5 Main Verification Flows ------------------------- @@ -439,14 +556,24 @@ interface. The test procedure includes: * Trap injection and recovery -6 Qualification ----------------- +6 Qualified Software Stacks and Devices +--------------------------------------- + +OpenSM Compatibility +-------------------- +Note that OpenSM version 3.2.1 and earlier used a value of 1 in host +byte order for the default SM_Key, so there is a compatibility issue +with these earlier versions of OpenSM when the 3.2.2 or later version +is running on a little endian machine. This affects SM handover as well +as SA queries (saquery tool in infiniband-diags). + Table 2 - Qualified IB Stacks ============================= Stack | Version -----------------------------------------|-------------------------- +OFED | 1.4 OFED | 1.3 OFED | 1.2 OFED | 1.1 @@ -463,6 +590,7 @@ Device | FW versions ------------------------------------|------------------------------- InfiniScale | fw-43132 5.2.000 (and later) InfiniScale III | fw-47396 0.5.000 (and later) +InfiniScale IV | fw-48436 7.1.000 (and later) InfiniHost | fw-23108 3.5.000 (and later) InfiniHost III Lx | fw-25204 1.2.000 (and later) InfiniHost III Ex (InfiniHost Mode) | fw-25208 4.8.200 (and later) @@ -483,10 +611,8 @@ QP0 and QP1. However, it does support it as a device on the subnet. Note 2: QoS firmware and Mellanox devices -HCAs: QoS supported by ConnectX. The current FW release -doesn't support QoS. QoS-enabled FW release (2_5_000) is -planned for May. If someone wishes to get QoS-enabled FW -before the official release, they should contact Mellanox FAE. +HCAs: QoS supported by ConnectX. QoS-enabled FW release is 2_5_000 and +later. Switches: QoS supported by InfiniScale III Any InfiniScale III FW that is supported by OpenSM supports QoS. -- 1.5.1.4 From sean.hefty at intel.com Mon Dec 8 09:58:36 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 8 Dec 2008 09:58:36 -0800 Subject: [ofa-general] [PATCH] IB/cm: change cma_modify_qp_err() to handle QP in RESET state In-Reply-To: <20081206012532.17073.64349.stgit@eng-46.mv.qlogic.com> References: <20081206012532.17073.64349.stgit@eng-46.mv.qlogic.com> Message-ID: <000101c9595e$97a3ebb0$3758180a@amr.corp.intel.com> >Since the IBTA 1.2.1 spec. clarified that the RESET to ERROR QP state >transition is not valid but earlier the openfabrics code supported it, >the code in cma_modify_qp_err() will now return an error if the QP >is in the RESET state. This can cause RDS to go into a loop trying to >call rdma_disconnect() continuously. Can you explain when the QP is in the RESET state when cma_modify_qp_err() is called? - Sean From amitoj at cs.uh.edu Mon Dec 8 10:27:17 2008 From: amitoj at cs.uh.edu (Amitoj G Singh) Date: Mon, 8 Dec 2008 12:27:17 -0600 (CST) Subject: [ofa-general] Need help creating a network topology file for a 288-port Flextronics. Message-ID: <49651.131.225.84.224.1228760837.squirrel@mail.cs.uh.edu> Spine switch is a 288-port Flextronics switch. Several Flextronics 24-port leaf switches. 6 uplinks from each leaf switch to the spine switch. OFED 1.2 kernel 2.6.21 Redhat 4 I have successfully defined almost the entire network topology but need help defining the 288-port switch. The leaf switches are defined as follows: MTS2400 leaf-switch-1 P19 -5G-> MT25204 node0101 P1 P15 -5G-> MT25204 node0102 P1 ..... and the spine switch had to be split into two 144-port spines and is defined as follows (the only way I could get it to work) MTS14400-DDR S144-1 L1/P1 -5G-> MTS2400 leaf-switch-1 P1 L2/P1 -5G-> MTS2400 leaf-switch-2 P1 ...... MTS14400-DDR S144-2 L1/P1 -5G-> MTS2400 leaf-switch-30 P1 L2/P1 -5G-> MTS2400 leaf-switch-31 P1 ..... If I define the spine as "MTS28800" I get "switch undefined" error, but if I split the 288 switch into two 144-port switches (S144-1 and S144-2) it at least discovers the network but then I get the following errors .. Missing System Board:S144-2/spine1 Missing System Board:S144-2/spine2 BTW, I am running the following commands .. /usr/local/ofed/bin/ibdiagnet -pc -s sm-node -t /etc/ibadm.topo > /dev/null /usr/local/ofed/bin/ibdiagnet -P all=1 -s sm-node -t /etc/ibadm.topo Any help in this regard will be much appreciated. From ralph.campbell at qlogic.com Mon Dec 8 10:50:47 2008 From: ralph.campbell at qlogic.com (Ralph Campbell) Date: Mon, 08 Dec 2008 10:50:47 -0800 Subject: [ofa-general] [PATCH] IB/cm: change cma_modify_qp_err() to handle QP in RESET state In-Reply-To: <000101c9595e$97a3ebb0$3758180a@amr.corp.intel.com> References: <20081206012532.17073.64349.stgit@eng-46.mv.qlogic.com> <000101c9595e$97a3ebb0$3758180a@amr.corp.intel.com> Message-ID: <1228762247.4232.220.camel@chromite.mv.qlogic.com> On Mon, 2008-12-08 at 09:58 -0800, Sean Hefty wrote: > >Since the IBTA 1.2.1 spec. clarified that the RESET to ERROR QP state > >transition is not valid but earlier the openfabrics code supported it, > >the code in cma_modify_qp_err() will now return an error if the QP > >is in the RESET state. This can cause RDS to go into a loop trying to > >call rdma_disconnect() continuously. > > Can you explain when the QP is in the RESET state when cma_modify_qp_err() is > called? > > - Sean On closer inspection, I think this patch isn't needed. It looks like ib_modify_qp() to INIT is always called right after creating the QP and if there is an error, the QP is destroyed so it shouldn't be possible for the QP to be left in the RESET state. I was debugging an RDS problem and seeing rdma_disconnect() returning errors but now I'm not sure why I thought it was due to the ib_modify_qp() to ERR that was failing. From hal.rosenstock at gmail.com Mon Dec 8 11:32:04 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 8 Dec 2008 14:32:04 -0500 Subject: [ofa-general] Re: [PATCH] opensm: fix race in main OpenSM flow. In-Reply-To: <20081207034053.GE27505@sashak.voltaire.com> References: <20081204185209.GI6183@sashak.voltaire.com> <20081207034053.GE27505@sashak.voltaire.com> Message-ID: Sasha, On Sat, Dec 6, 2008 at 10:40 PM, Sasha Khapyorsky wrote: > Hi Hal, > > On 16:03 Thu 04 Dec , Hal Rosenstock wrote: >> >> Looks to me like this has been there from around the following commit >> or some related changes shortly thereafter: >> >> commit 1b2eb3daddbfa9fc555488cddbea12b01f6635a3 >> Date: Mon Jan 28 03:10:18 2008 +0200 >> >> opensm: wait_for_pending_transaction() generalization >> >> Function wait_for_pending_transaction() is global now and moved from >> PerfMgr to StateMgr, all related objects are generalized. >> >> If so, this is applicable to 3.1 and maybe also 3.0 based OpenSMs. > > No, it doesn't really affect main flow of 3.1 (and 3.0) - it was used > only in discovery phase of PerfMgr (and when it is running in standby SM > mode) which has experimental status in 3.1. But it does affect almost > all OpenSM-3.2.x versions. What's the difference between 3.2 and 3.1 in this regard ? Is it just the experimental status of PerfMgr ? As PerfMgr is part of 3.1, isn't this an important issue for 3.1 when perf mgr enabled ? -- Hal > > Sasha > From hal.rosenstock at gmail.com Mon Dec 8 11:32:54 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 8 Dec 2008 14:32:54 -0500 Subject: ***SPAM*** Re: [ofa-general] Automation Test Tool of OFED and opensm In-Reply-To: References: Message-ID: 2008/12/5 Wen Hao Wang : > > > Tel: 86-10-82451055 > Fax: 86-10-82782244 ext. 2312 > Address: 1/F, IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software > Park,No.8 Dong Bei Wang West Road, Haidian District Beijing, 100193, > P.R.China > > > "Hal Rosenstock" 写于 2008-12-05 22:37:36: > OK. I know osmtest, but not ibsim/ibmgtsim. Will check them. > >> There are also a number of examples in the various OFED packages which >> can be used for regression of those components. >> > Would you point our one or two these examples for my reference? What OFED components are of interest ? kenel modules don't typically have examples but can be tested via user space APIs. Most user space components have examples. -- Hal >> As far as an automation harness/test tool goes, I believe some vendors >> have worked on this but to the best of my knowledge none of this has >> been open sourced. >> > > Thanks a lot! You feedback is really helpful. > > Wen Hao Wang > wangwhao at cn.ibm.com > >> -- Hal >> >> > Thanks. >> > >> > Wen Hao Wang >> > Email: wangwhao at cn.ibm.com >> > >> > _______________________________________________ >> > general mailing list >> > general at lists.openfabrics.org >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> > >> > To unsubscribe, please visit >> > http://openib.org/mailman/listinfo/openib-general >> > > > >> Hi, >> >> On Fri, Dec 5, 2008 at 2:35 AM, Wen Hao Wang wrote: >> > Hi all: >> > >> > Is there any test or diagnostic tool, especially automation test tool to >> > check the functions of OFED and opensm? The only thing here I know is >> > infiniband-diags and libibverbs-utils. >> >> In terms of OpenSM, there is osmtest. Also, ibsim and ibmgtsim (in >> ibutils) might help simulations of various subnet topologies. In terms >> of additional diagnostics, there is ibutils (ibdiagnet). >> > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Mon Dec 8 11:58:26 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 8 Dec 2008 21:58:26 +0200 Subject: [ofa-general] Re: [PATCH] opensm: fix race in main OpenSM flow. In-Reply-To: References: <20081204185209.GI6183@sashak.voltaire.com> <20081207034053.GE27505@sashak.voltaire.com> Message-ID: <20081208195826.GA13924@sashak.voltaire.com> Hi Hal, On 14:32 Mon 08 Dec , Hal Rosenstock wrote: > >> > >> If so, this is applicable to 3.1 and maybe also 3.0 based OpenSMs. > > > > No, it doesn't really affect main flow of 3.1 (and 3.0) - it was used > > only in discovery phase of PerfMgr (and when it is running in standby SM > > mode) which has experimental status in 3.1. But it does affect almost > > all OpenSM-3.2.x versions. > > What's the difference between 3.2 and 3.1 in this regard ? Is it just > the experimental status of PerfMgr ? At least. Also 3.1 is old and outdated version of OpenSM. > As PerfMgr is part of 3.1, isn't this an important issue for 3.1 when > perf mgr enabled ? I'm not sure. AFAIK all PerfMgr users are running 3.2.x, not 3.1. Sasha From sashak at voltaire.com Mon Dec 8 12:02:17 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 8 Dec 2008 22:02:17 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_inform.c report IB traps to plugin In-Reply-To: <493CEBBE.2020407@gmail.com> References: <493CEBBE.2020407@gmail.com> Message-ID: <20081208200217.GB13924@sashak.voltaire.com> Hi Eli, On 11:41 Mon 08 Dec , Eli Dorfman wrote: > report IB traps to plugin > > Signed-off-by: Eli Dorfman > --- > opensm/opensm/osm_inform.c | 4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c > index f3c8ed7..bb16e3a 100644 > --- a/opensm/opensm/osm_inform.c > +++ b/opensm/opensm/osm_inform.c > @@ -565,7 +565,8 @@ osm_report_notice(IN osm_log_t * const p_log, > } > > /* an official Event information log */ > - if (ib_notice_is_generic(p_ntc)) > + if (ib_notice_is_generic(p_ntc)) { > + osm_opensm_report_event(p_subn->p_osm, OSM_EVENT_ID_TRAP, p_ntc); > OSM_LOG(p_log, OSM_LOG_INFO, > "Reporting Generic Notice type:%u num:%u (%s)" > " from LID:%u GID:%s\n", > @@ -575,6 +576,7 @@ osm_report_notice(IN osm_log_t * const p_log, > cl_ntoh16(p_ntc->issuer_lid), > inet_ntop(AF_INET6, p_ntc->issuer_gid.raw, gid_str, > sizeof gid_str)); > + } Did you mean to have it osm_report_notice()? Actually it is where OpenSM sends notices, not where OpenSM gets traps. Trap receiver processor is located in osm_trap_rcv.c. Sasha > else > OSM_LOG(p_log, OSM_LOG_INFO, > "Reporting Vendor Notice type:%u vend:%u dev:%u" > -- > 1.5.5 > From tziporet at mellanox.co.il Mon Dec 8 13:20:03 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Mon, 8 Dec 2008 23:20:03 +0200 Subject: [ofa-general] OFED Dec 8, 2008 meeting minutes on OFED 1.4 release status In-Reply-To: <458BC6B0F287034F92FE78908BD01CE84EF37AEB@mtlexch01.mtl.com> Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD0129F38A@mtlexch01.mtl.com> OFED Dec 8, 2008 meeting minutes on OFED 1.4 release status =============================================== Meeting minutes on the web: http://www.openfabrics.org/txt/documentation/linux/EWG_meeting_minutes/ Meeting Summary: ============== - OFED 1.4 release: GA on Dec 10 (one day delay due to few critical bug fixes under testing) - UNH Logo testing: From SW perspective everything is fine. - All docs must be updated - need to send update to Tziporet today Details: ======= > 1. Release date: Due to 2 more fixes that are under verification the > release will be done on Wed - Dec 10 > Major bugs status: > 1383 blo jackm at mellanox.co.il Local protection error > on transmit from ipoib datagram to... - we have a fix in mlx4 driver - > need more testing > 1395 maj vu at mellanox.com kernel panic during SRP HA > test - we have a fix - patch to be sent by Vu today > 1434 maj andy.grover at oracle.com RDS RDMA mode does not work on > QLogic HCAs - not a blocker > Once the SRP fixes are committed need to update to Qlogic so they can test SRP after the fix. > 2. Logo program report status - Rupert > UNH done most of the test suite > not all MPI csenarios covered > IPoIB, SRP, link init and fabric init are done SM: only OSM was tested till now There are some issues related to interoperabilty between vendor HW product - will be sent to the vendors > A report about the SW only part will be sent to the ewg list > > > 3. Documents update - all the following docs must be updated: > iser_release_notes.txt: - Doron > mlx4_release_notes.txt: - Jack & Yevgeny P > MPI_README.txt: - Pasha and Jeff S. > mvapich_release_notes.txt: - Pasha > ib-bonding.txt - Moni Shoua > open_mpi_release_notes.txt: - Jeff S. > opensm_release_notes.txt:Date: - Sasha > PERF_TEST_README.txt: - Oren M. > rdma_cm_release_notes.txt: - Sean > srp_release_notes.txt: - Vu Pham > diags_release_notes.txt: - Oren K. > MSTFLINT_README.txt - Oren K. > mpi-selector_release_notes.txt: - Jeff S. > mthca_release_notes.txt: - Jack > qperf_release_notes.txt: - Johann > NFS-RDMA - a new document - Jeff Backer > iSER target - - a new document - Voltaire (Doron) > > > Tziporet > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pradeeps at linux.vnet.ibm.com Mon Dec 8 14:51:56 2008 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Mon, 08 Dec 2008 14:51:56 -0800 Subject: [ofa-general] IB Bonding errors with recent kernel Message-ID: <493DA50C.8050306@linux.vnet.ibm.com> I was attempting to execute a few bonding tests with the 2.6.28-rc7 kernel and see the following error when I try to add the first slave: ib0: multicast join failed for 0001:0000:0000:0000:0000:0000:0000:0000, status -22 Here were the steps that I followed: 1. modprobe bonding 2. ib-bond --bond-ip 192.168.2.132 --add-slave ib0 --miimon 100 I have been able to debug this to a certain degree and traced this to the case wherein an Ethernet mapped address of the IPv6 multicast address seems to be added to the dev->mc_list (I see an entry for 333300:0100:0000:0000:0000 on the mc_list). This subsequently gets transformed to the 0001:0000:0000:0000:0000:0000:0000:0000 address in ipoib_mcast_restart_task(), which what is seen in the errors. Can others recreate this? Any clues as to why an Ethernet specific mapped address is being added to IB? Pradeep From wangwhao at cn.ibm.com Mon Dec 8 16:29:30 2008 From: wangwhao at cn.ibm.com (Wen Hao Wang) Date: Tue, 9 Dec 2008 08:29:30 +0800 Subject: [ofa-general] Automation Test Tool of OFED and opensm In-Reply-To: Message-ID: > 2008/12/5 Wen Hao Wang : > > > > > > Tel: 86-10-82451055 > > Fax: 86-10-82782244 ext. 2312 > > Address: 1/F, IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software > > Park,No.8 Dong Bei Wang West Road, Haidian District Beijing, 100193, > > P.R.China > > > > > > "Hal Rosenstock" 写于 2008-12-05 22:37:36: > > OK. I know osmtest, but not ibsim/ibmgtsim. Will check them. > > > >> There are also a number of examples in the various OFED packages which > >> can be used for regression of those components. > >> > > Would you point our one or two these examples for my reference? > > What OFED components are of interest ? kenel modules don't typically > have examples but can be tested via user space APIs. Most user space > components have examples. > > -- Hal Hi Hal. Many thanks for your answer! If you can list one user space API to test kernel module, and one example for user space components, for example, opensm/ibutils/IPoIB, that would be great help. Wen Hao Wang wangwhao at cn.ibm.com > > >> As far as an automation harness/test tool goes, I believe some vendors > >> have worked on this but to the best of my knowledge none of this has > >> been open sourced. > >> > > > > Thanks a lot! You feedback is really helpful. > > > > Wen Hao Wang > > wangwhao at cn.ibm.com > > > >> -- Hal > >> > >> > Thanks. > >> > > >> > Wen Hao Wang > >> > Email: wangwhao at cn.ibm.com > >> > > >> > _______________________________________________ > >> > general mailing list > >> > general at lists.openfabrics.org > >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > > >> > To unsubscribe, please visit > >> > http://openib.org/mailman/listinfo/openib-general > >> > > > > > > >> Hi, > >> > >> On Fri, Dec 5, 2008 at 2:35 AM, Wen Hao Wang wrote: > >> > Hi all: > >> > > >> > Is there any test or diagnostic tool, especially automation test tool to > >> > check the functions of OFED and opensm? The only thing here I know is > >> > infiniband-diags and libibverbs-utils. > >> > >> In terms of OpenSM, there is osmtest. Also, ibsim and ibmgtsim (in > >> ibutils) might help simulations of various subnet topologies. In terms > >> of additional diagnostics, there is ibutils (ibdiagnet). > >> > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From liammcquil at gmail.com Mon Dec 8 22:57:46 2008 From: liammcquil at gmail.com (liam mcquillan) Date: Mon, 8 Dec 2008 22:57:46 -0800 Subject: [ofa-general] ***SPAM*** uDAPL on GigE ? Message-ID: <180f2ef70812082257qd919c91l1382fc8292247288@mail.gmail.com> Hello, Is it possible to run OFED uDAPL over plain old GigE? - If so, how do you configure /etc/dat.conf as dtest complains about the provider - If so, will uDAPL give less latency than TCP on GigE Liam -------------- next part -------------- An HTML attachment was scrubbed... URL: From dorfman.eli at gmail.com Tue Dec 9 01:31:17 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Tue, 09 Dec 2008 11:31:17 +0200 Subject: [ofa-general] ***SPAM*** Re: [PATCH] opensm/osm_inform.c report IB traps to plugin In-Reply-To: <20081208200217.GB13924@sashak.voltaire.com> References: <493CEBBE.2020407@gmail.com> <20081208200217.GB13924@sashak.voltaire.com> Message-ID: <493E3AE5.5000604@gmail.com> Sasha Khapyorsky wrote: > Hi Eli, > > On 11:41 Mon 08 Dec , Eli Dorfman wrote: >> report IB traps to plugin >> >> Signed-off-by: Eli Dorfman >> --- >> opensm/opensm/osm_inform.c | 4 +++- >> 1 files changed, 3 insertions(+), 1 deletions(-) >> >> diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c >> index f3c8ed7..bb16e3a 100644 >> --- a/opensm/opensm/osm_inform.c >> +++ b/opensm/opensm/osm_inform.c >> @@ -565,7 +565,8 @@ osm_report_notice(IN osm_log_t * const p_log, >> } >> >> /* an official Event information log */ >> - if (ib_notice_is_generic(p_ntc)) >> + if (ib_notice_is_generic(p_ntc)) { >> + osm_opensm_report_event(p_subn->p_osm, OSM_EVENT_ID_TRAP, p_ntc); >> OSM_LOG(p_log, OSM_LOG_INFO, >> "Reporting Generic Notice type:%u num:%u (%s)" >> " from LID:%u GID:%s\n", >> @@ -575,6 +576,7 @@ osm_report_notice(IN osm_log_t * const p_log, >> cl_ntoh16(p_ntc->issuer_lid), >> inet_ntop(AF_INET6, p_ntc->issuer_gid.raw, gid_str, >> sizeof gid_str)); >> + } > > Did you mean to have it osm_report_notice()? Actually it is where OpenSM > sends notices, not where OpenSM gets traps. Trap receiver processor is > located in osm_trap_rcv.c. Yes that's what i meant. When OpenSM receives traps it calls osm_report_notice(). It is also call for OpenSM initiated traps (e.g. GID IN/OUT and MC CREATE/DELETE). > > Sasha > >> else >> OSM_LOG(p_log, OSM_LOG_INFO, >> "Reporting Vendor Notice type:%u vend:%u dev:%u" >> -- >> 1.5.5 >> From kliteyn at dev.mellanox.co.il Tue Dec 9 01:47:02 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Tue, 09 Dec 2008 11:47:02 +0200 Subject: [ofa-general] [PATCH v2] OFED docs/opensm_release_notes.txt: OpenSM 3.2.5 RN Message-ID: <493E3E96.5050402@dev.mellanox.co.il> OpenSM 3.2.5 Release Notes for OFED 1.4 docs. V2: add "opensm -Q" error messages as known issues. Signed-off-by: Yevgeny Kliteynik --- opensm_release_notes.txt | 406 ++++++++++++++++++++++++++++++---------------- 1 files changed, 268 insertions(+), 138 deletions(-) diff --git a/opensm_release_notes.txt b/opensm_release_notes.txt index 007c85a..11223de 100644 --- a/opensm_release_notes.txt +++ b/opensm_release_notes.txt @@ -1,121 +1,209 @@ - OpenSM Release Notes 3.1.11 + OpenSM Release Notes 3.2 ============================= -Version: OpenFabrics Enterprise Distribution (OFED) 1.3 -Repo: git://git.openfabrics.org/~ofed_1_3/management.git (release) - git://git.openfabrics.org/~sashak/management.git (development) -Date: June 2008 +Version: OpenSM 3.2.x +Repo: git://git.openfabrics.org/~sashak/management.git +Date: Dec 2008 1 Overview ---------- -This document describes the contents of the OpenSM OFED 1.3 release. +This document describes the contents of the OpenSM 3.2 release. OpenSM is an InfiniBand compliant Subnet Manager and Administration, and runs on top of OpenIB. The OpenSM version for this release -is openib-3.1.11 +is opensm-3.2.5 This document includes the following sections: 1 This Overview section (describing new features and software dependencies) 2 Known Issues And Limitations 3 Unsupported IB compliance statements -4 Major Bug Fixes +4 Bug Fixes 5 Main Verification Flows -6 Qualified software stacks and devices +6 Qualified Software Stacks and Devices 1.1 Major New Features -* QoS manager (experimental) - This QoS manager implementation is in accordance with IBA QoS Annex. - Highly configurable QoS Policy is parsed from OpenSM QoS policy file. - Valid QoS parameters will be reported in SA PathRecord and - MultiPathRecord. In addition simple QoS levels per ULPs configuration - is supported too. - -* Performance Manager - When enabled it collects a fabric port counters and able to log it or - to pass to external program via event plugin interface. It handles - counters overflow, supports LID/QP redirection and is able to work - when OpenSM is in master, standby, and inactive states. - -* Dimension Order routing (DOR) algorithm - DOR Unicast routing algorithm - based on the Min Hop algorithm, but - avoids port equalization except for redundant links between the - same two switches. This provides deadlock free routes for hypercubes - when the fabric is cabled as a hypercube and for meshes when cabled - as a mesh (see details in OpenSM man page). - -* Routing improvements - Speedup the current routing algorithms default MinHops, Up/Down and - LASH and lid matrix generation. Fat Tree routing engine is able to work - with not pure fat free topology. - -* Multiple IB routers support - OpenSM now able to keep configurable subnet prefix to router table. - SA will report path to this routers when SA PathRecord was issued with - non-local DGID. - -* Node map - This is possible to name nodes in this config file. Those names will be - used for logging and by QoS configuration. - -* PKey index support - Proper support for PKey index in GSI queries. - -* Incremental LFTs, PKey, SL2VL, and VLarbitration table updates - OpenSM will only fetch those tables in first heavy sweep and then - will maintain this internally. - -* Fast port and switch detector - When port and/or switch was externally reset and it was fast so sweep - doesn't find this device as disconnected OpenSM will detect this by - changed port states and handle accordingly. - -* Duplicated GUIDs/port moving detector - OpenSM will be able to detect port moving during a fabric discovery - and will not report duplicated GUIDs in this case. - -* Multicast rerouting speedup - Now OpenSM will calculate and setup multicast forwarding tables for - all altered multicast groups and not for each one. - -* Event plugin API - OpenSM allows to load dynamically various plugin modules. - -* Many generic improvements +* Cached Routing + OpenSM provides an optional unicast routing cache (enabled by '-A' or + '--ucast_cache' options). When enabled, unicast routing cache prevents + routing recalculation (which is a heavy task in a large cluster) when + there was no topology change detected during the heavy sweep, or when + the topology change does not require new routing calculation, e.g. when + one or more CAs/RTRs/leaf switches going down, or one or more of these + nodes coming back after being down. + +* Routing Chaining + Routing chaining is the ability to configure the order in which routing + algorithms are applied in opensm, i.e. '-R ftree,updn,minhop' - try + using ftree routing. If ftree fails, try updn. If updn fails, try + minhop. + +* IPv6 Solicited Node Multicast addresses consolidation + When this mode is used (enabled with --consolidate_ipv6_snm_req option) + OpenSM will map all IPv6 Solicited Node Multicast address join requests + into a single Multicast group with address ff10:601b::1:ff00:0. In this + way limited MLID space is saved. This IBA noncompliant feature is very + useful with large (~> 1024 nodes) clusters. + +* OpenSM sweep state machine rework + Huge and buggy OpenSM sweep state machine was fully rewritten in safer + and more effective synchronous manner. + +* Multi lid routing balancing for updn/minhop routing algorithms + When LMC > 0 is used OpenSM will ensure to generate routing paths via + different switches and when possible chassis. + +* Preserve base lid routes when LMC > 0 + When LMC > 0 is used OpenSM will preserve routing paths for base lids + as it would be with LMC = 0. In this way traffic on each LID level is + not affected by LMC changes. + +* Ordered routing paths balancing + This adds ability to predefine the port order in which routing paths + balancing is performed by OpenSM. Helps to improve performance + dramatically (40-50%) for applications with known communication + pattern. Activated with --guid_routing_order_file command line option. + +* Unified OpenSM configuration + Now there is "conventional" config file instead of hidden option cache + file (opensm.opts). OpenSM will find this in a default place (consult + man page for exact value) or the file name can be specified with '-F' + command line option. Also there is an option ('-c') to generate config + file template. + +* Query remote SMs during light sweep + Master OpenSM will query remote standby SMs periodically to catch its + possible state changes and react accordingly (as required by IBA spec). + +* Predefined port ids for Up/Down algorithm + This is useful as Up/Down fine tuning tool - the algorithm will use + predefined port IDs instead of GUIDs for its decision about direction. + Activated with --ids_guid_file command line option. + +* Improved plugin API version 2. + Now OpenSM will provide to plugins the access to all data structures. + This make it possible to implement powerful multi purpose plugins. All + OpenSM header files are installed now and specific configuration/build + options are exported via generated osm_config.h header file. + +* Many code improvements, optimizations and cleanups + +* Automatic daily snapshots generation. + This is is not a "feature", but simplifies the access to recent OpenSM + bits. 1.2 Minor New Features: -* Daemon mode can be activated with -B option. +* Cleanup cl_qlock_pool memory allocator - speedup memory allocations -* Support multiple scopes for IPoIB multicast groups in partition config. +* Support for configurable (via OSM_UMAD_MAX_PENDING environment variable) + size of pending MADs pool. -* Loopback connection handling - Loopback connection is not interpreted as duplicated GUID anymore. +* Set packet life time to subnet timeout option rather than default -* Connect root nodes option for Up/Down routing engine. - When this option is specified Up/Down will create routing paths between - its root nodes. +* Enforce routing paths rebalancing on switch reconnection -* Dump and log filenames changed from osm* to opensm*. +* In Up/Down routing algorithm compare GUID values in host byte order -* Support loopback console - Socket console with only local access. +* Add 'switchbalance' and 'lidbalance' commands for OpenSM console -* Configurable config directory (the default value is /etc/opensm) and - configurable default values of OpenSM config filenames. +* Respond to new trap 144 node description update flag -* Add option for force SDR link speed - Add option to opensm.opts to force link speed. Currently, only forcing - to SDR link speed is supported. This option is not supported as a - command line option. +* Add '--connect_roots' command line options. This preserves connectivity + between root nodes in Up/Down routing algorithm -* Better packaging - Building and RPM packaging were improved and simplified. +* Setting SL in the IPoIB MCast groups in accordance with QoS policy -* Handle "babbling" ports - When a babbling port (port which causes a frequent trap generation) is - detected, OpenSM will disable the port which should terminate the trap - storm. +* Dump auto detected root node guids in Up/Down routing algorithm + +* Unify OpenSM dumpers code + +* Unify various guid files parsers - add generic nodenamemap style parser + +* When root node guids were provided in file update the list on each + Up/Down run + +* During ./configure show values of configuration dirs and files + +* Make prefix routes config file name configurable + +* Add a Performance Manager HOWTO to the docs and the dist + +* Support separate SA and SM keys as clarified in IBA 1.2.1 + +* Remove AM_MAINTAINER_MODE in ./configure + +* Make vendor type OSM_VENDOR_INTF_OPENIB (libibumad) to be default + +* Build osm_perfmgr_db.* content only when PerfMgr is enabled. + +* Move PerfMgr event_db_dump_file to common OpenSM dump dir + +* Allow space separated strings as values in OpenSM config + +* Support for multiple event plugins + +* Add '--version' command line option + +* Add '--create-config ' command line option + +* Speedup and simplify logging code + +* Speedup multicast processing in SA DB + +* In log messages convert unicast LIDs from hex to decimal format and + GIDs from hex to IPv6 address format + +* Handle all possible ports in "ignore-guids" file + +* Add 'reroute' console command + +* Remove many install-exec-hook from Makefiles + +* Some cleanups in LASH routing algorithm code + +* In Makefiles remove -rpath and explicit -lpthread, -ldl from LDFLAGS + (move to configurator) + +* Install all OpenSM header files + +* Improve locking in SM Info receiver + +* Add new OSM_EVENT_ID_SUBNET_UP event for plugins + +* Redo lex and yacc files generation in conventional way + +* Add a missing Node Description check on light sweep. + +* Move vendor specific compilation defines from command to generated + config.h file + +* Provide useful error message when log file opening fails + +* Add generated osm_config.h file with OpenSM specific defines + +* Display port number in decimal in log messages + +* Replace osm_vendor_select.h by generated osm_config.h + +* Unify options listing in OpenSM usage message + +* LFT buffers handling simplification + +* Add 'dump_conf' console command + +* OpenSM performs sweep on SIGCONT (coming out of suspend). + +* When our SM is in Standby state and its priority is increased + (via console command), notify master SM by sending Trap 144. + +* When entering standby state (after discovery) notify master SM + with Trap 144. + +* support more PortInfo:CapabilityMask bits + +* When babbling port policy is on disable the port with the least hop + count. 1.3 Library API Changes @@ -123,12 +211,12 @@ This document includes the following sections: 1.4 Software Dependencies -OpenSM depends on the installation of either OFED 1.3, OFED 1.2, OFED 1.1, -OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD -distribution), or Mellanox VAPI stacks. The qualified driver versions -are provided in Table 2, "Qualified IB Stacks". +OpenSM depends on the installation of either OFED 1.x, OpenIB gen2 (e.g. +IBG2 distribution), OpenIB gen1 (e.g. IBGD distribution), or Mellanox +VAPI stacks. The qualified driver versions are provided in Table 2, +"Qualified IB Stacks". -Also building of QoS manager policy file parser requires flex, and either +Also, building of QoS manager policy file parser requires flex, and either bison or byacc installed. 1.5 Supported Devices Firmware @@ -147,6 +235,10 @@ are listed in Table 3. Puts the burden of re-registering services, multicast groups, and inform-info on the client application (or IB access layer core). +* When running with QoS with default configuration (opensm -Q), + OpenSM prints list of "Invalid Cached Option" error messages. + This does not affect OpenSM functionality. + 3 Unsupported IB Compliance Statements -------------------------------------- The following section lists all the IB compliance statements which @@ -210,76 +302,105 @@ information regarding each compliance statement. * C15-0.1.14 (Services): Provide means to associate service name and ServiceKeys. -4 Major Bug Fixes ------------------ +4 Bug Fixes +----------- + +4.1 Major Bug Fixes -The following is a list of bugs that were fixed. Note that other less critical -or visible bugs were also fixed. +* Set SA attribute offset to 0 when no records are returned -* osm_ucast_ftree.c: do load-leveling of non-CN routes +* Send trap 64 only after new ports are in ACTIVE state. -* osm_ucast_ftree.c: ignore port 0 and loopbacks on switches +* Fix in sending client reregistration bit -* lash: fix possible segfault in osm_get_lash_sl() +* Fix default OpenSM SM (and SA) Key byte order -* osm_ucast_ftree.c: fixing coredump in fat-tree routing +* Fix in sending Multicast groups creation/deletion notification (Traps + 66,67) -* osm_sa_slvl_record: fix overflow crash +* Don't startup automatically on SuSE based systems -* Break multicast rerouting requests processing when heavy sweep is - scheduled. +4.2 Other Bug Fixes -* updn: report fallback properly +* opensm/osm_console.c: fix seg fault when running "portstatus ca" in + the console -* Fix incorrect identification of routing engine used +* opensm: fix potential core dumps where osm_node_get_physp_ptr can + return NULL -* Don't zero base LID when invalid value is received +* opensm/osm_mcast_mgr: limit spanning tree creation recursion to value + of max hops (64) -* lash: fix wrong allocation size +* opensm: switch LFTs incremental update fix -* Fixing broken logic in 'process world' part of LinkRecord processing +* opensm/osm_state_mgr.c: fix segmentation fault -* Fix lmc_mask bit order in osm_sa_link_record.c +* opensm: eliminate some potential NULL pointer dereferences -* Adding missing comparison by to_lid/from_lid in LinkRecord processing +* opensm/osm_console.c: fix guid parsing -* Broken logic when scanning subnet for PIR request +* opensm: fix off by 1 issue with max_lid and max_multicat_lid_ho -* No interactive games in daemon mode +* opensm: fix potentially wrong port_guid initialization -* Fixing memory leak in node description +* opensm/configure.in: fix wrong HAVE_DEFAULT_OPENSM_CONFIG_FILE define + generation -* Fix PortInfo update issues for switch port 0 +* opensm: fix snprintf() usage -* Changed method_mask type in user_mad interface in accordance with - kernel ABI +* opensm/osm_sa_lft_record: validate LFT block number -* Use umad_get_issm_path() in osm_vendor_set_sm() +* opensm/osm_sa_lft_record: pass block parameter in host byte order -* Report message fix +* opensm/include/Makefile.am: don't duplicate header files in EXTRA_DIST -* Uninitialized variables usage fix +* opensm/osm_sa_class_port_info.c: fix over bound array access -* osm_ucast_ftree.c: Possible NULL ptr seg fault +* osmtest/osmt_service.c: fix over bound array access -* osm_mcast_mgr.c: Possible NULL ptr seg fault +* osmtest: fix qpn encoding in osmtest_informinfo_request() -* TrapRepress was failing for mkey != 0 +* opensm/osm_vendor_mlx_sa.c: handling attribute offset of 0 -* IB_PR_COMPMASK was used in MPR +* opensm: fix segfault corner case when osm_console_init fails -* Set hop limit when creating ipoib multicast groups +* opensm/console: close console socket on cleanup path -* Fix outstanding mad counters tracking on the error paths. +* opensm/osm_ucast_lash: fix buffer overflow -* Report new ports before handover mastership +* opensm: fix broken IPv6 SNM consolidation code -* Fix opvls and neighbormtu when remote port invalid. +* opensm/osm_sa_lft_record.c: fix block number encoding byte order -* Bug in coding trying to set vl_arb_high_limit when PortInfo.base_lid - was still zero. +* opensm/osm_sa: fix memory leak in SA responder -* Protect SMInfo response against port moving issue. +* opensm/osm_mcast_mgr: fix memory leak + +* opensm: fix qos config parsing bugs + +* opensm/osm_mcast_tbl.c: fix sending invalid MF block due to max mlid + overflow + +* opensm: log_max_size config parameter in MB + +* opensm/osm_ucast_lash: fix extra memory allocations + +* opensm: fix race in main OpenSM flow + +* opensm/ftree: fix GUID check against cn_guid_file + +* opensm/ftree: save FLT buffers memory allocations + +* opensm/osm_sa_link_record.c: prevent potential endless recursion + +* opensm: remove SM from sm_guid_tbl when IsSM port capability flag is + not set + +* opensm: fix QoS config bug + +* opensm: don't reassign zeroed params from config file + +* Other less critical or visible bugs were also fixed. 5 Main Verification Flows ------------------------- @@ -439,14 +560,24 @@ interface. The test procedure includes: * Trap injection and recovery -6 Qualification ----------------- +6 Qualified Software Stacks and Devices +--------------------------------------- + +OpenSM Compatibility +-------------------- +Note that OpenSM version 3.2.1 and earlier used a value of 1 in host +byte order for the default SM_Key, so there is a compatibility issue +with these earlier versions of OpenSM when the 3.2.2 or later version +is running on a little endian machine. This affects SM handover as well +as SA queries (saquery tool in infiniband-diags). + Table 2 - Qualified IB Stacks ============================= Stack | Version -----------------------------------------|-------------------------- +OFED | 1.4 OFED | 1.3 OFED | 1.2 OFED | 1.1 @@ -463,6 +594,7 @@ Device | FW versions ------------------------------------|------------------------------- InfiniScale | fw-43132 5.2.000 (and later) InfiniScale III | fw-47396 0.5.000 (and later) +InfiniScale IV | fw-48436 7.1.000 (and later) InfiniHost | fw-23108 3.5.000 (and later) InfiniHost III Lx | fw-25204 1.2.000 (and later) InfiniHost III Ex (InfiniHost Mode) | fw-25208 4.8.200 (and later) @@ -483,10 +615,8 @@ QP0 and QP1. However, it does support it as a device on the subnet. Note 2: QoS firmware and Mellanox devices -HCAs: QoS supported by ConnectX. The current FW release -doesn't support QoS. QoS-enabled FW release (2_5_000) is -planned for May. If someone wishes to get QoS-enabled FW -before the official release, they should contact Mellanox FAE. +HCAs: QoS supported by ConnectX. QoS-enabled FW release is 2_5_000 and +later. Switches: QoS supported by InfiniScale III Any InfiniScale III FW that is supported by OpenSM supports QoS. -- 1.5.1.4 From vlad at lists.openfabrics.org Tue Dec 9 03:17:18 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 9 Dec 2008 03:17:18 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081209-0200 daily build status Message-ID: <20081209111719.1BEA6E60F5F@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.27 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: Build failed on x86_64 with linux-2.6.9-42.ELsmp Log: /usr/bin/quilt --quiltrc /home/vlad/tmp/ofa_1_4_kernel-20081209-0200_linux-2.6.9-42.ELsmp_x86_64_check/patches/quiltrc push patches/srp_scsi_scan_target_7242_to_2_6_11.patch Applying patch srp_scsi_scan_target_7242_to_2_6_11.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 FAILED at 1104. Hunk #2 succeeded at 1787 (offset 29 lines). 1 out of 2 hunks FAILED -- rejects in file drivers/infiniband/ulp/srp/ib_srp.c Patch srp_scsi_scan_target_7242_to_2_6_11.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-55.ELsmp Log: /usr/bin/quilt --quiltrc /home/vlad/tmp/ofa_1_4_kernel-20081209-0200_linux-2.6.9-55.ELsmp_x86_64_check/patches/quiltrc push patches/srp_scsi_scan_target_7242_to_2_6_11.patch Applying patch srp_scsi_scan_target_7242_to_2_6_11.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 FAILED at 1104. Hunk #2 succeeded at 1787 (offset 29 lines). 1 out of 2 hunks FAILED -- rejects in file drivers/infiniband/ulp/srp/ib_srp.c Patch srp_scsi_scan_target_7242_to_2_6_11.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-67.ELsmp Log: /usr/bin/quilt --quiltrc /home/vlad/tmp/ofa_1_4_kernel-20081209-0200_linux-2.6.9-67.ELsmp_x86_64_check/patches/quiltrc push patches/srp_scsi_scan_target_7242_to_2_6_11.patch Applying patch srp_scsi_scan_target_7242_to_2_6_11.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 FAILED at 1104. Hunk #2 succeeded at 1787 (offset 29 lines). 1 out of 2 hunks FAILED -- rejects in file drivers/infiniband/ulp/srp/ib_srp.c Patch srp_scsi_scan_target_7242_to_2_6_11.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- Build failed on x86_64 with linux-2.6.9-78.ELsmp Log: /usr/bin/quilt --quiltrc /home/vlad/tmp/ofa_1_4_kernel-20081209-0200_linux-2.6.9-78.ELsmp_x86_64_check/patches/quiltrc push patches/srp_scsi_scan_target_7242_to_2_6_11.patch Applying patch srp_scsi_scan_target_7242_to_2_6_11.patch patching file drivers/infiniband/ulp/srp/ib_srp.c Hunk #1 FAILED at 1104. Hunk #2 succeeded at 1787 (offset 29 lines). 1 out of 2 hunks FAILED -- rejects in file drivers/infiniband/ulp/srp/ib_srp.c Patch srp_scsi_scan_target_7242_to_2_6_11.patch does not apply (enforce with -f) Failed executing /usr/bin/quilt ---------------------------------------------------------------------------------- From tziporet at dev.mellanox.co.il Tue Dec 9 03:23:56 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Tue, 09 Dec 2008 13:23:56 +0200 Subject: [ofa-general] Re: [PATCH v2] OFED docs/opensm_release_notes.txt: OpenSM 3.2.5 RN In-Reply-To: <493E3E96.5050402@dev.mellanox.co.il> References: <493E3E96.5050402@dev.mellanox.co.il> Message-ID: <493E554C.7010602@mellanox.co.il> Yevgeny Kliteynik wrote: > OpenSM 3.2.5 Release Notes for OFED 1.4 docs. > V2: add "opensm -Q" error messages as known issues. > > thanks - applied Tziporet From hal.rosenstock at gmail.com Tue Dec 9 03:51:34 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 9 Dec 2008 06:51:34 -0500 Subject: [ofa-general] Automation Test Tool of OFED and opensm In-Reply-To: References: Message-ID: 2008/12/8 Wen Hao Wang : >> 2008/12/5 Wen Hao Wang : >> > >> > >> > Tel: 86-10-82451055 >> > Fax: 86-10-82782244 ext. 2312 >> > Address: 1/F, IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software >> > Park,No.8 Dong Bei Wang West Road, Haidian District Beijing, 100193, >> > P.R.China >> > >> > >> > "Hal Rosenstock" 写于 2008-12-05 22:37:36: >> > OK. I know osmtest, but not ibsim/ibmgtsim. Will check them. >> > >> >> There are also a number of examples in the various OFED packages which >> >> can be used for regression of those components. >> >> >> > Would you point our one or two these examples for my reference? >> >> What OFED components are of interest ? kenel modules don't typically >> have examples but can be tested via user space APIs. Most user space >> components have examples. >> >> -- Hal > > Hi Hal. > > Many thanks for your answer! > > If you can list one user space API to test kernel module, and one example > for > user space components, for example, opensm/ibutils/IPoIB, that would be > great > help. For IPoIB, any socket based program will work depending on what you want to test. Some userspace examples are libibverbs/examples/ibv_devices, ibv_rc/uc/ud_pingpong and librdmacm/examples/ucmatose, rping, ... -- Hal > Wen Hao Wang > wangwhao at cn.ibm.com > >> >> >> As far as an automation harness/test tool goes, I believe some vendors >> >> have worked on this but to the best of my knowledge none of this has >> >> been open sourced. >> >> >> > >> > Thanks a lot! You feedback is really helpful. >> > >> > Wen Hao Wang >> > wangwhao at cn.ibm.com >> > >> >> -- Hal >> >> >> >> > Thanks. >> >> > >> >> > Wen Hao Wang >> >> > Email: wangwhao at cn.ibm.com >> >> > >> >> > _______________________________________________ >> >> > general mailing list >> >> > general at lists.openfabrics.org >> >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> > >> >> > To unsubscribe, please visit >> >> > http://openib.org/mailman/listinfo/openib-general >> >> > >> > >> > >> >> Hi, >> >> >> >> On Fri, Dec 5, 2008 at 2:35 AM, Wen Hao Wang >> >> wrote: >> >> > Hi all: >> >> > >> >> > Is there any test or diagnostic tool, especially automation test tool >> >> > to >> >> > check the functions of OFED and opensm? The only thing here I know is >> >> > infiniband-diags and libibverbs-utils. >> >> >> >> In terms of OpenSM, there is osmtest. Also, ibsim and ibmgtsim (in >> >> ibutils) might help simulations of various subnet topologies. In terms >> >> of additional diagnostics, there is ibutils (ibdiagnet). >> >> >> > _______________________________________________ >> > general mailing list >> > general at lists.openfabrics.org >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> > >> > To unsubscribe, please visit >> > http://openib.org/mailman/listinfo/openib-general >> > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From ogerlitz at voltaire.com Tue Dec 9 06:41:09 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Tue, 09 Dec 2008 16:41:09 +0200 Subject: [ofa-general] Re: IB Bonding errors with recent kernel In-Reply-To: <493DA50C.8050306@linux.vnet.ibm.com> References: <493DA50C.8050306@linux.vnet.ibm.com> Message-ID: <493E8385.9080507@voltaire.com> Pradeep Satyanarayana wrote: > I was attempting to execute a few bonding tests with the 2.6.28-rc7 kernel and see the following error when I try to add the first slave: > ib0: multicast join failed for 0001:0000:0000:0000:0000:0000:0000:0000, status -22 Hi PRadeep, This (junk MGID) issue is well know for somehow long time and irrelevant to bonding, we see it with non bonded IPoIB devices as well. I added Yossi Etigin from Voltaire to the distribution list as he might have more details on the issue. Or. > > Here were the steps that I followed: > > 1. modprobe bonding > 2. ib-bond --bond-ip 192.168.2.132 --add-slave ib0 --miimon 100 > > I have been able to debug this to a certain degree and traced this to the case wherein an Ethernet mapped address of the IPv6 multicast address > seems to be added to the dev->mc_list (I see an entry for 333300:0100:0000:0000:0000 on the mc_list). This subsequently gets transformed to the > 0001:0000:0000:0000:0000:0000:0000:0000 address in ipoib_mcast_restart_task(), which what is seen in the errors. > > Can others recreate this? Any clues as to why an Ethernet specific mapped address is being added to IB? > > Pradeep > From ronli.voltaire at gmail.com Tue Dec 9 08:34:31 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Tue, 9 Dec 2008 18:34:31 +0200 Subject: [ofa-general] Re: [ewg] [PATCH 1/2 v2]libibvers: add create_qp_expanded In-Reply-To: References: Message-ID: <3b5e77ad0812090834t20036f35v901e885e16d8acc2@mail.gmail.com> Roland, Wouldn't adding a new field to struct ibv_qp_init_attr break the ABI? Anyway, I prefer your first approach and just change the functions name from create_qp_expanded to avoid compatibility issues. But if you prefer the ext_mask approach and adding creation flags to the qp_init_attr struct - it's fine by me. Please tell me which one do you prefer. > Also, I wonder if it's worth a new verb in the kernel ABI for this. > Maybe we should add a new command in the ABI where libibverbs can pass > in a bitmask of supported extensions, and the kernel can respond with > which extensions it supports. And then we can just continue to use the > reserved field in the existing create_qp command if both kernel and > userspace agree that they support create flags there. There are only 8 reserved bits. I think they have a good chance to run out quickly. Ron On Tue, Aug 12, 2008 at 9:37 PM, Roland Dreier wrote: > Sorry for jumping in so late in the process, but a few big concerns: > > > struct ibv_qp *ibv_create_qp_expanded(struct ibv_pd *pd, > > struct ibv_qp_init_attr *qp_init_attr, > > uint32_t create_flags); > > I don't like the name "_expanded" when all we are doing is adding a > flags parameter. The next time we need to tweak this API, then we end > up with _extra_super_expanded or something like that. > > I see two better options: keep the same prototype but call it something > like ibv_create_qp_with_flags (or maybe ibv_create_qp_flags), or keep > the name ibv_create_qp_expanded but instead of create_flags, have the > new parameter be ext_mask, have one bit in ext_mask indicate create > flags, and add create_flags to struct ibv_qp_init_attr -- then we can > add more extra stuff by using more bits in ext_mask. > > Also, I wonder if it's worth a new verb in the kernel ABI for this. > Maybe we should add a new command in the ABI where libibverbs can pass > in a bitmask of supported extensions, and the kernel can respond with > which extensions it supports. And then we can just continue to use the > reserved field in the existing create_qp command if both kernel and > userspace agree that they support create flags there. > > - R. > _______________________________________________ > ewg mailing list > ewg at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > From arlin.r.davis at intel.com Tue Dec 9 09:05:04 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Tue, 9 Dec 2008 09:05:04 -0800 Subject: [ofa-general] ***SPAM*** uDAPL on GigE ? In-Reply-To: <180f2ef70812082257qd919c91l1382fc8292247288@mail.gmail.com> References: <180f2ef70812082257qd919c91l1382fc8292247288@mail.gmail.com> Message-ID: Liam, You can get some early experimental direct ethernet transport code from the Intel whatif site: see the following for details: http://software.intel.com/en-us/articles/intel-direct-ethernet-transport -arlin ________________________________ From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of liam mcquillan Sent: Monday, December 08, 2008 10:58 PM To: general at lists.openfabrics.org Subject: [ofa-general] ***SPAM*** uDAPL on GigE ? Hello, Is it possible to run OFED uDAPL over plain old GigE? - If so, how do you configure /etc/dat.conf as dtest complains about the provider - If so, will uDAPL give less latency than TCP on GigE Liam -------------- next part -------------- An HTML attachment was scrubbed... URL: From pradeeps at linux.vnet.ibm.com Tue Dec 9 11:27:42 2008 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Tue, 09 Dec 2008 11:27:42 -0800 Subject: [ofa-general] Re: IB Bonding errors with recent kernel In-Reply-To: <493E8385.9080507@voltaire.com> References: <493DA50C.8050306@linux.vnet.ibm.com> <493E8385.9080507@voltaire.com> Message-ID: <493EC6AE.3050401@linux.vnet.ibm.com> Or Gerlitz wrote: > Pradeep Satyanarayana wrote: >> I was attempting to execute a few bonding tests with the 2.6.28-rc7 >> kernel and see the following error when I try to add the first slave: >> ib0: multicast join failed for >> 0001:0000:0000:0000:0000:0000:0000:0000, status -22 > Hi PRadeep, > > This (junk MGID) issue is well know for somehow long time and irrelevant > to bonding, we see it with non bonded IPoIB devices as well. I added > Yossi Etigin from Voltaire to the distribution list as he might have > more details on the issue. > Or, If I am not mistaken the issue you mention is a little different from the one I pointed out. Without bonding I see the following: kernel: ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 However, with bonding what I see is : ib0: multicast join failed for 0001:0000:0000:0000:0000:0000:0000:0000, status -22 The latter is seen only when IPoIB bonding is activated (i.e. when a slave is added) The latter junk MGID is because a Ethernet mapped address is being added to the dev->mc_list. Subsequently an ib-bond status does not reveal any slave as active as shown below: ib-bond --status bond0: 80:00:04:04:fe:80:00:00:00:00:00:00:00:05:ad:00:00:03:05:b9 slave0: ib0 slave1: ib1 Pradeep From or.gerlitz at gmail.com Tue Dec 9 12:21:52 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Tue, 9 Dec 2008 22:21:52 +0200 Subject: [ofa-general] Re: IB Bonding errors with recent kernel In-Reply-To: <493EC6AE.3050401@linux.vnet.ibm.com> References: <493DA50C.8050306@linux.vnet.ibm.com> <493E8385.9080507@voltaire.com> <493EC6AE.3050401@linux.vnet.ibm.com> Message-ID: <15ddcffd0812091221l323c2e3v2c035daaf4bdbe07@mail.gmail.com> > If I am not mistaken the issue you mention is a little different from the one I pointed out. > Without bonding I see the following: > kernel: ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 > However, with bonding what I see is : > ib0: multicast join failed for 0001:0000:0000:0000:0000:0000:0000:0000, status -22 Please note that -11 EAGAIN (try again) is and -22 is EINVAL (invalid argument). So you can get EAGAIN when the underlying core sa agent is not ready to send SA queries, while you get EINVAL when attempting to join on a junk MGID. I am confident that for long time we see joins on junk MGIDs and it has been reported on this list (google...) in the past, no resolution yet. Under bonding there might be a window is time where from the kernel network stack perspective the bonding device ether-type is ethernet and not infiniband and hence the wrong (ip_eth_mc_map instead of ip_ib_mc_map) function would be called to do the mapping from the IP multicast address to the HW multicast address > Subsequently an ib-bond status does not reveal any slave as active as shown below: > ib-bond --status > bond0: 80:00:04:04:fe:80:00:00:00:00:00:00:00:05:ad:00:00:03:05:b9 > slave0: ib0 > slave1: ib1 As this script is not standard and deprecated, I would recommend not to use it but rather the classic /proc/net/bonding/bond0 entry, along with ip addr show on bond0, ib0, ib1 Or. From pradeeps at linux.vnet.ibm.com Tue Dec 9 17:27:23 2008 From: pradeeps at linux.vnet.ibm.com (Pradeep Satyanarayana) Date: Tue, 09 Dec 2008 17:27:23 -0800 Subject: [ofa-general] Re: IB Bonding errors with recent kernel In-Reply-To: <15ddcffd0812091221l323c2e3v2c035daaf4bdbe07@mail.gmail.com> References: <493DA50C.8050306@linux.vnet.ibm.com> <493E8385.9080507@voltaire.com> <493EC6AE.3050401@linux.vnet.ibm.com> <15ddcffd0812091221l323c2e3v2c035daaf4bdbe07@mail.gmail.com> Message-ID: <493F1AFB.6050401@linux.vnet.ibm.com> Or Gerlitz wrote: >> If I am not mistaken the issue you mention is a little different from the one I pointed out. >> Without bonding I see the following: >> kernel: ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 >> However, with bonding what I see is : >> ib0: multicast join failed for 0001:0000:0000:0000:0000:0000:0000:0000, status -22 > > Please note that -11 EAGAIN (try again) is and -22 is EINVAL (invalid > argument). So you can get EAGAIN when the underlying core sa agent is > not ready to send SA queries, while you get EINVAL when attempting to > join on a junk MGID. I am confident that for long time we see joins on > junk MGIDs and it has been reported on this list (google...) in the > past, no resolution yet. Or, I looked through the mailing list going back more than a year. The closest I can find to this issue (-EINVAL) was when you reported problems with junk MGID on a child interface (and that works properly now). I agree that the -EAGAIN problem has been known for some time now. However, this issue with IPoIB bonding is new. My recollections are that it all worked properly around end October. I had not tested since then, so this is something that must have cropped in the interregnum. > > Under bonding there might be a window is time where from the kernel > network stack perspective the bonding device ether-type is ethernet > and not infiniband and hence the wrong (ip_eth_mc_map instead of > ip_ib_mc_map) function would be called to do the mapping from the IP > multicast address to the HW multicast address > > >> Subsequently an ib-bond status does not reveal any slave as active as shown below: >> ib-bond --status >> bond0: 80:00:04:04:fe:80:00:00:00:00:00:00:00:05:ad:00:00:03:05:b9 >> slave0: ib0 >> slave1: ib1 > > As this script is not standard and deprecated, I would recommend not > to use it but rather the classic /proc/net/bonding/bond0 entry, along > with ip addr show on bond0, ib0, ib1 Thanks for alerting me to the fact that the ib-bond script was deprecated. Again this seemed to all work about 6 weeks ago. Is that (ib-bond is deprecated) documented somewhere? Pradeep From monis at Voltaire.COM Wed Dec 10 01:05:16 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Wed, 10 Dec 2008 11:05:16 +0200 Subject: [ofa-general] Re: IB Bonding errors with recent kernel In-Reply-To: <493F1AFB.6050401@linux.vnet.ibm.com> References: <493DA50C.8050306@linux.vnet.ibm.com> <493E8385.9080507@voltaire.com> <493EC6AE.3050401@linux.vnet.ibm.com> <15ddcffd0812091221l323c2e3v2c035daaf4bdbe07@mail.gmail.com> <493F1AFB.6050401@linux.vnet.ibm.com> Message-ID: <493F864C.7010104@Voltaire.COM> Hi, You can read more about this in a discussion few months ago. https://kerneltrap.org/mailarchive/linux-netdev/2008/4/17/1456344/thread From ogerlitz at voltaire.com Wed Dec 10 01:18:44 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 10 Dec 2008 11:18:44 +0200 Subject: [ofa-general] Re: IB Bonding errors with recent kernel In-Reply-To: <493F1AFB.6050401@linux.vnet.ibm.com> References: <493DA50C.8050306@linux.vnet.ibm.com> <493E8385.9080507@voltaire.com> <493EC6AE.3050401@linux.vnet.ibm.com> <15ddcffd0812091221l323c2e3v2c035daaf4bdbe07@mail.gmail.com> <493F1AFB.6050401@linux.vnet.ibm.com> Message-ID: <493F8974.5040808@voltaire.com> Pradeep Satyanarayana wrote: > I looked through the mailing list going back more than a year. The closest I can find to this issue (-EINVAL) was when you reported problems with junk MGID on a child interface (and that works properly now). Yes, this was observation way back, since then, I realized that this happens also regardless of vlans, bonding, anything. > I agree that the -EAGAIN problem has been known for some time now. However, this issue with IPoIB bonding is new. My recollections are that it all worked properly around end October. I had not tested since then, so this is something that must have cropped in the interregnum. Moni just sent you a note on the matter. > Thanks for alerting me to the fact that the ib-bond script was > deprecated. Again this seemed to all work about 6 weeks ago. Is that > (ib-bond is deprecated) documented somewhere? I don't know about whether this is documented (Moni?). Anyway, under all aspects: design, implementation, main line kernel maintainer and distro deployment, bonding is bonding is bonding, where you just happen to use it over IPoIB NICs and not Eth NICs. I wouldn't recommend that the fact of you using a short term solution provided by the ofed ib-bonding package make you take that to the extent of relying on proprietary helper script to be well maintained, etc. Or. From monis at Voltaire.COM Wed Dec 10 01:45:20 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Wed, 10 Dec 2008 11:45:20 +0200 Subject: [ofa-general] Re: IB Bonding errors with recent kernel In-Reply-To: <493F8974.5040808@voltaire.com> References: <493DA50C.8050306@linux.vnet.ibm.com> <493E8385.9080507@voltaire.com> <493EC6AE.3050401@linux.vnet.ibm.com> <15ddcffd0812091221l323c2e3v2c035daaf4bdbe07@mail.gmail.com> <493F1AFB.6050401@linux.vnet.ibm.com> <493F8974.5040808@voltaire.com> Message-ID: <493F8FB0.1020604@Voltaire.COM> > I don't know about whether this is documented (Moni?). Anyway, under all > aspects: design, implementation, main line kernel maintainer and distro > deployment, bonding is bonding is bonding, where you just happen to use > it over IPoIB NICs and not Eth NICs. In OFED documentation we recommend only OS standard tools to configure bonding. However, this matter is irrelevant to the question here. From vlad at lists.openfabrics.org Wed Dec 10 03:18:23 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 10 Dec 2008 03:18:23 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081210-0200 daily build status Message-ID: <20081210111823.67EAAE28006@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From aostvold at platform.com Wed Dec 10 07:04:23 2008 From: aostvold at platform.com (Asmund Ostvold) Date: Wed, 10 Dec 2008 16:04:23 +0100 Subject: [ofa-general] ibv_post_send fails when using malloc in a special way In-Reply-To: References: <49368267.40007@platform.com> Message-ID: <493FDA77.8030802@platform.com> Roland Dreier wrote: > The most obvious explanation is that the physical pages underlying your > allocation are different after the free/re-valloc. This could happen > without a system call I guess if a page is faulted in. This is in the general sense true; that is way we pin down the memory by calling ibv_reg_mr(). Having done this we expect the virtual to physical relationship to remain constant. However, we are concerned that ibv_reg_mr() does not call madvise() as appropriate, as mentioned in our posting. Thanks, Åsmund From rdreier at cisco.com Wed Dec 10 07:26:27 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 10 Dec 2008 07:26:27 -0800 Subject: [ofa-general] ibv_post_send fails when using malloc in a special way In-Reply-To: <493FDA77.8030802@platform.com> (Asmund Ostvold's message of "Wed, 10 Dec 2008 16:04:23 +0100") References: <49368267.40007@platform.com> <493FDA77.8030802@platform.com> Message-ID: > This is in the general sense true; that is way we pin down the memory > by calling ibv_reg_mr(). Having done this we expect the virtual to > physical relationship to remain constant. However, we are concerned > that ibv_reg_mr() does not call madvise() as appropriate, as mentioned > in our posting. I'm not sure how madvise() would have any relevance to your problem, since as far as I can see you are not using fork(). In any case, libibverbs will only call madvise() if you call ibv_fork_init() or set the IBV_FORK_SAFE environment variable. - R. From rpearson at systemfabricworks.com Wed Dec 10 09:49:34 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Wed, 10 Dec 2008 11:49:34 -0600 Subject: [ofa-general] [PATCH] [1 of 10] [REVISED] mesh analysis - skeleton Message-ID: <005d01c95aef$aa4fcd50$feef67f0$@com> Sasha, Here is a revised mesh patch #1 that incorporates changes based on your comments. I've attached the patch file to avoid the problems you were having with line breakage. The purpose of this patch is to create a skeleton for the remaining code. This patch - creates a new command line flag --do_mesh_analysis and a new Boolean that is set if the flag is used. - adds a description to opensm/doc/current-routing.txt - adds a description to opensm/man/opensm.8.in - adds code to main to implement the flag and option. - adds text to usage() to support the new flag - creates a new file osm_mesh.c to hold the algorithm code - moves declarations from osm_ucast_lash.c and osm_mesh.c into header files - adds these files to Makefile.am - adds a stub do_mesh_analysis() that is called from lash_core. -------------- next part -------------- A non-text attachment was scrubbed... Name: p1 Type: application/octet-stream Size: 15110 bytes Desc: not available URL: From rpearson at systemfabricworks.com Wed Dec 10 10:00:21 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Wed, 10 Dec 2008 12:00:21 -0600 Subject: [ofa-general] [PATCH] [2 of 10] [REVISED] mesh analysis - mesh_t data structure Message-ID: <006701c95af1$2b7cff00$8276fd00$@com> Sasha, Here is a revised mesh patch #2 that incorporates changes based on your comments. The purpose of this patch is to create a per fabric data structure and methods. This patch: - creates a data structure, mesh_t, that holds per mesh information - adds a pointer to this structure in lash_t - creates methods to create, cleanup and destroy the object. - adds calls in osm_ucast_lash.c to call these. Regards, Bob Pearson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p2 Type: application/octet-stream Size: 3353 bytes Desc: not available URL: From rpearson at systemfabricworks.com Wed Dec 10 10:12:13 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Wed, 10 Dec 2008 12:12:13 -0600 Subject: [ofa-general] [PATCH] [3 of 10] [REVISED] mesh analysis - node and link structures Message-ID: <007301c95af2$d4fff720$7effe560$@com> Sasha, Here is a revised mesh patch #3 that incorporates changes based on your comments. This patch - create a per logical switch to switch link structure link_t - creates per mesh node (e.g. switch) data structure mesh_node_t - adds a pointer to mesh_node_t in the switch_t structure in *lash.h - implements create and delete methods for mesh_node_t - calls these in switch_create and swich_delete in *lash.c Regards, Bob Pearson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p3 Type: application/octet-stream Size: 4257 bytes Desc: not available URL: From rpearson at systemfabricworks.com Wed Dec 10 10:18:26 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Wed, 10 Dec 2008 12:18:26 -0600 Subject: [ofa-general] [PATCH] [4 of 10] [REVISED] mesh analysis - matrix/polynomial routines Message-ID: <008101c95af3$b20cb5e0$162621a0$@com> Sasha, Here is a revised mesh patch #4 that incorporates changes based on your comments. This patch implements - create and cleanup methods for polynomial with integer coefficients - create and cleanup methods for square matrix with integer coefficients - create and cleanup methods for square matrix with polynomial coefficients - routine to compute the determinant of a matrix with polynomial coefficients (Note the determinant is restricted to computing the characteristic polynomial) Regards, Bob Pearson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p4 Type: application/octet-stream Size: 5093 bytes Desc: not available URL: From rpearson at systemfabricworks.com Wed Dec 10 10:28:00 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Wed, 10 Dec 2008 12:28:00 -0600 Subject: [ofa-general] [PATCH] [5 of 10] [REVISED] mesh analysis - local geometry Message-ID: <008d01c95af5$0870d730$19528590$@com> Sasha, Here is a revised mesh patch #5 that incorporates changes based on your comments. This patch implements - routine to compute characteristics polynomial of a matrix - routine to compute the local 'metric' around each switch - routine to classify switches into a histogram of local geometry classes Regards, Bob Pearson -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlentini at netapp.com Wed Dec 10 10:29:43 2008 From: jlentini at netapp.com (James Lentini) Date: Wed, 10 Dec 2008 13:29:43 -0500 (EST) Subject: [ofa-general] mlx4 support for fast_reg_mr Message-ID: The mlx4 code to support the fast_reg_mr API is upstream, but the last post I found reported that the firmware was not ready: http://lists.openfabrics.org/pipermail/general/2008-July/052931.html What is the status of mlx4 firmware support for the fast_reg_mr API? -james From michael.heinz at qlogic.com Wed Dec 10 10:31:33 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Wed, 10 Dec 2008 12:31:33 -0600 Subject: [ofa-general] Bugs in opensm/libvendor Message-ID: While experimenting with the APIs in opensm/libvendor, I was unable to get the path record queries to work. Reviewing the error logs from the SM, I discovered that the APIs were not setting the required num_path field. Here's the fix: --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 @@ -615,7 +615,7 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); sa_mad_data.p_attr = &path_rec; ib_gid_set_default(&path_rec.dgid, ((osmv_guid_pair_t *) (p_query_req-> @@ -625,6 +625,7 @@ ((osmv_guid_pair_t *) (p_query_req-> p_query_input))-> src_guid); + path_rec.num_path = 1; break; case OSMV_QUERY_PATH_REC_BY_GIDS: @@ -634,7 +635,7 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); sa_mad_data.p_attr = &path_rec; memcpy(&path_rec.dgid, &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> @@ -642,6 +643,7 @@ memcpy(&path_rec.sgid, &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> src_gid, sizeof(ib_gid_t)); + path_rec.num_path = 1; break; case OSMV_QUERY_PATH_REC_BY_LIDS: @@ -652,13 +654,14 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | IB_PR_COMPMASK_NUMBPATH); sa_mad_data.p_attr = &path_rec; path_rec.dlid = ((osmv_lid_pair_t *) (p_query_req->p_query_input))-> dest_lid; path_rec.slid = ((osmv_lid_pair_t *) (p_query_req->p_query_input))->src_lid; + path_rec.num_path = 1; break; case OSMV_QUERY_UD_MULTICAST_SET: --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 @@ -743,7 +743,7 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); sa_mad_data.p_attr = &path_rec; ib_gid_set_default(&path_rec.dgid, ((osmv_guid_pair_t *) (p_query_req-> @@ -753,6 +753,7 @@ ((osmv_guid_pair_t *) (p_query_req-> p_query_input))-> src_guid); + path_rec.num_path = 1; break; case OSMV_QUERY_PATH_REC_BY_GIDS: @@ -763,7 +764,7 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); sa_mad_data.p_attr = &path_rec; memcpy(&path_rec.dgid, &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> @@ -771,6 +772,7 @@ memcpy(&path_rec.sgid, &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> src_gid, sizeof(ib_gid_t)); + path_rec.num_path = 1; break; case OSMV_QUERY_PATH_REC_BY_LIDS: @@ -789,6 +791,7 @@ dest_lid; path_rec.slid = ((osmv_lid_pair_t *) (p_query_req->p_query_input))->src_lid; + path_rec.num_path = 1; break; case OSMV_QUERY_UD_MULTICAST_SET: -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpearson at systemfabricworks.com Wed Dec 10 10:36:20 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Wed, 10 Dec 2008 12:36:20 -0600 Subject: [ofa-general] [PATCH] [6 of 10] [REVISED] mesh analysis - mesh info table Message-ID: <00a101c95af6$324001c0$96c00540$@com> Sasha, Here is a revised mesh patch #6 that incorporates changes based on your comments. This patch implements - a table of polynomials for selected regular Cartesian meshes up to dimension 8 - a routine to classify each switch based on the table (Note all dimensions of length 4 are converted to 2x2.) Regards, Bob Pearson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p6 Type: application/octet-stream Size: 8447 bytes Desc: not available URL: From rpearson at systemfabricworks.com Wed Dec 10 10:40:33 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Wed, 10 Dec 2008 12:40:33 -0600 Subject: [ofa-general] [PATCH] [7 of 10] [REVISED] mesh analysis - induce global geometry Message-ID: <00ad01c95af6$c93c59c0$5bb50d40$@com> Sasha, Here is a revised mesh patch #7 that incorporates changes based on your comments. This patch implements - routine to induce axes on mesh starting from seed node Regards, Bob Pearson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p7 Type: application/octet-stream Size: 7845 bytes Desc: not available URL: From jlentini at netapp.com Wed Dec 10 10:43:13 2008 From: jlentini at netapp.com (James Lentini) Date: Wed, 10 Dec 2008 13:43:13 -0500 (EST) Subject: [ofa-general] mlx4 support for reg_phys_mr Message-ID: The upstream mlx4 driver does not support reg_phys_mr. Are there any plans to add support for this verb? The reg_phys_mr verb was made optional a while back: http://lists.openfabrics.org/pipermail/general/2008-April/048715.html making dma_mr the only required kernel memory registration mode. As a result, kernel ULPs don't have a guaranteed mechanism for creating MRs of limited size and fine-grained permissions since both reg_phys_mr and fast_reg_mr are optional. Since all HW providers except mlx4 support reg_phys_mr, providing a reg_phys_mr implementation would fix this. -james From rpearson at systemfabricworks.com Wed Dec 10 10:44:41 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Wed, 10 Dec 2008 12:44:41 -0600 Subject: [ofa-general] [PATCH] [8 of 10] [REVISED] mesh analysis - reorder links Message-ID: <00b901c95af7$5cea7120$16bf5360$@com> Sasha, Here is a revised mesh patch #8 that incorporates changes based on your comments. This patch implements - routine to reorder links and measure the size of the mesh Regards, Bob Pearson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p8 Type: application/octet-stream Size: 4555 bytes Desc: not available URL: From rpearson at systemfabricworks.com Wed Dec 10 10:49:45 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Wed, 10 Dec 2008 12:49:45 -0600 Subject: [ofa-general] [PATCH] [9 of 10] [REVISED] Message-ID: <00c501c95af8$12434ec0$36c9ec40$@com> Sasha, Here is a revised mesh patch #9 that incorporates changes based on your comments. This patch makes some minor cleanups in osm_ucast_lash.c in preparation for next steps. The main change is to minimize the occurrences of phys_connections. These changes are mainly cosmetic and independent of the mash analysis code but will make patch 10 easier to understand. Regards, Bob Pearson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p9 Type: application/octet-stream Size: 6126 bytes Desc: not available URL: From rpearson at systemfabricworks.com Wed Dec 10 10:55:37 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Wed, 10 Dec 2008 12:55:37 -0600 Subject: [ofa-general] [PATCH] [10 of 10] [REVISED] mesh analysis - integrate into lash core Message-ID: <00d101c95af8$e459cf10$ad0d6d30$@com> Sasha, Here is a revised mesh patch #10 that incorporates changes based on your comments. This patch - hooks mesh code into lash - replaces sw->phys_connections by the equivalent switch->node->links - replaces sw->num_connections by the equivalent switch->node->num_links - replaces sw->virtual_physical_port_table by switch->node->links[]->ports When the do_mesh_analysis flag is not set there is no change to the function except To replace the variables with variables in node that have the same size. In this Case the port table in link_t will always have just one port. When the do_mesh_analysis flag is set multiple physical links will collapse to a Single logical link with a port list with more than one element. - rewrote connect switches to use variables in node - in log Lane requirements (%d) exceed available lanes (%d) Arguments were reversed, fixed - compute physical egress port in routine get_next_port Which will use round robin if there are more than one Physical links between switches Regards, Bob Pearson -------------- next part -------------- A non-text attachment was scrubbed... Name: p10 Type: application/octet-stream Size: 11619 bytes Desc: not available URL: From tziporet at mellanox.co.il Wed Dec 10 13:12:28 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 10 Dec 2008 23:12:28 +0200 Subject: [ofa-general] OFED 1.4 GA release Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD0FE7D2@mtlexch01.mtl.com> I am happy to announce the OFED 1.4 GA release. The release can be found under: http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4.tgz The release documents are under: http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-docs/ This release was produced by a joint effort of all the companies in the EWG group. I wish to thank all who contributed its success. Tziporet ======================================================================== ======= OFED 1.4 GA Release summary: ============================ The OpenFabrics Enterprise Distribution (OFED) version 1.4 software package supports InfiniBand and iWARP fabrics. It is composed of several software modules intended for use on a computer cluster constructed as an InfiniBand subnet or an iWARP network. OFED package contains the following components: =============================================== The OFED Distribution package generates RPMs for installing the following: 1.1 OFED 1.4 Contents ----------------------- The OFED package contains the following components: - OpenFabrics core and ULPs: - IB HCA drivers (mthca, mlx4, ipath, ehca) - iWARP RNIC driver (cxgb3, nes) - core - Upper Layer Protocols: IPoIB, SDP, SRP Initiator and target, iSER Initiator and target, RDS, uDAPL, qlgc_vnic and NFS-RDMA. - OpenFabrics utilities: - OpenSM (OSM): InfiniBand Subnet Manager - Diagnostic tools - Performance tests - MPI: - OSU MPI stack supporting the InfiniBand and iWARP interface - Open MPI stack supporting the InfiniBand and iWARP interface - OSU MVAPICH2 stack supporting the InfiniBand and iWARP interface - MPI benchmark tests (OSU benchmarks, Intel MPI benchmarks, Presta) - Extra packages: - open-iscsi: open-iscsi initiator with iSER support - ib-bonding: Bonding driver for IPoIB interface - Sources of all software modules (under conditions mentioned in the modules' LICENSE files) - Documentation Third Party Packages -------------------- The following third party packages have been tested with OFED 1.3: 1. Intel MPI 2. HP MPI 3. Open MPI 1.3 Main Changes from OFED 1.3 ========================== 1.1 General changes ------------------- o Kernel code based on 2.6.27 o Added iSER target package o Added NFS-RDMA support - in technology preview state o New verbs to support BMME: - Fast memory thru send queue - Local invalidate send work requests - Read with invalidate o Virtual Protocol Interconnect (Multi-Protocol: Eth and IB) support for ConnectX. See mlx4 release note for the usage model. o New MPI versions: - OSU MVAPICH 1.1.0 - Open MPI 1.2.8 - OSU MVAPICH2 1.2p1 o New management package: 3.2 o New uDAPL libraries: compat-dapl-1.2.12-1, dapl-2.0.15-1 See each component release notes for details on the changes. OFED release notes are attached here. <> 1.2 Supported Platforms and Operating Systems --------------------------------------------- o CPU architectures: - x86_64 - x86 - ppc64 - ia64 o Linux Operating Systems: - RedHat EL4 up4: 2.6.9-42.ELsmp * - RedHat EL4 up5: 2.6.9-55.ELsmp - RedHat EL4 up6: 2.6.9-67.ELsmp - RedHat EL4 up7: 2.6.9-78.ELsmp - RedHat EL5: 2.6.18-8.el5 - RedHat EL5 up1: 2.6.18-53.el5 - RedHat EL5 up2: 2.6.18-92.el5 - Fedora C9: 2.6.25-14.fc9 * - SLES10: 2.6.16.21-0.8-smp - SLES10 SP1: 2.6.16.46-0.12-smp - SLES10 SP2: 2.6.16.60-0.21-smp - OpenSuSE 10.3: 2.6.22.5-31 * - OEL 4 up7 2.6.9-78.ELsmp - OEL 5 up2 2.6.18-92.el5 - CentOS5.2 2.6.18-92.el5 - kernel.org: 2.6.26 and 2.6.27 * Minimal QA for these versions 1.3 HCAs and RNICs Supported ---------------------------- This release supports IB HCAs by Mellanox Technologies, Qlogic and IBM as well as iWARP RNICs by Chelsio Communications and Intel. o Mellanox Technologies HCAs (SDR, DDR and QDR Modes are Supported): - InfiniHost (fw-23108 Rev 3.5.000) - InfiniHost III Ex (MemFree: fw-25218 Rev 5.3.000 with memory: fw-25208 Rev 4.8.200) - InfiniHost III Lx (fw-25204 Rev 1.2.000) - ConnectX IB (fw-25408 Rev 2.5.0 and 2.6.000) o Qlogic HCAs: - QHT6040 (PathScale InfiniPath HT-460) - QHT6140 (PathScale InfiniPath HT-465) - QLE6140 (PathScale InfiniPath PE-880) o IBM HCAs: - GX Dual-port SDR 4x IB HCA - GX Dual-port SDR 12x IB HCA - GX Dual-port DDR 4x IB HCA - GX Dual-port DDR 12x IB HCA o Chelsio RNICs: - S310/S320 10GbE Storage Accelerators - R310/R320 10GbE iWARP Adapters o Intel RNICs: - NE020 10Gb iWARP Adapter 1.4 Switches Supported ---------------------- This release was tested with switches and gateways provided by the following companies: - Cisco - Voltaire - Qlogic - Flextronics - Sun -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: OFED_release_notes.txt URL: From tziporet at dev.mellanox.co.il Thu Dec 11 00:47:10 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 11 Dec 2008 10:47:10 +0200 Subject: [ofa-general] mlx4 support for fast_reg_mr In-Reply-To: References: Message-ID: <4940D38E.2000700@mellanox.co.il> James Lentini wrote: > The mlx4 code to support the fast_reg_mr API is upstream, but the last > post I found reported that the firmware was not ready: > > http://lists.openfabrics.org/pipermail/general/2008-July/052931.html > > What is the status of mlx4 firmware support for the fast_reg_mr API? > > FMRs perfectly working with mlx4 and FW 2.5.0 Bets if you use ofed 1.4 just released, but also 1.3.1 will do the work Tziporet From nicolas.morey-chaisemartin at ext.bull.net Thu Dec 11 01:03:35 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Thu, 11 Dec 2008 10:03:35 +0100 Subject: [ofa-general] Test modules for ib_core Message-ID: <4940D767.8090200@ext.bull.net> Hello, I was wondering if there was any program/module available to test directly the kernel verbs. I am looking for a simple test, like ib_write_bw (but in kernel mode) to do some tests on the ib_core/rdma_cm modules. I could write it but it takes some time to got a clean interface between user and kernel mode to send parameters (like server/host, message size and so on). So if anyone of you has such a module and can share it, I'd be really thankful. Best Regards Nicolas Morey-Chaisemartin From ruffing at motama.com Thu Dec 11 01:50:38 2008 From: ruffing at motama.com (Jan Ruffing) Date: Thu, 11 Dec 2008 10:50:38 +0100 Subject: [ofa-general] ibutils patch Message-ID: <4940E26E.9080502@motama.com> Hello, Last friday, I installed the OFED 1.4 beta on a 32 bit system with the OpenSuse 11 distribution installed (Kernel 2.6.25.16). The install script worked fine up until ibutils. Ibutils had to be patched with additional includes to compile: diff -r /tmp/infiniband/ibutils/ibutils-1.2//ibdm/datamodel/Fabric.h /var/tmp/OFED_topdir/BUILD/ibutils-1.2//ibdm/datamodel/Fabric.h 57a58 > #include 58a60 > #include diff -r /tmp/infiniband/ibutils/ibutils-1.2//ibdm/datamodel/LinkCover.cpp /var/tmp/OFED_topdir/BUILD/ibutils-1.2//ibdm/datamodel/LinkCover.cpp 41a42,43 > #include > diff -r /tmp/infiniband/ibutils/ibutils-1.2//ibmgtsim/src/dispatcher.h /var/tmp/OFED_topdir/BUILD/ibutils-1.2//ibmgtsim/src/dispatcher.h 55a56 > #include diff -r /tmp/infiniband/ibutils/ibutils-1.2//ibmgtsim/src/msgmgr.cpp /var/tmp/OFED_topdir/BUILD/ibutils-1.2//ibmgtsim/src/msgmgr.cpp 36a37 > #include diff -r /tmp/infiniband/ibutils/ibutils-1.2//ibmgtsim/src/node.cpp /var/tmp/OFED_topdir/BUILD/ibutils-1.2//ibmgtsim/src/node.cpp 52a53 > #include I don't know if this has been corrected in the OFED 1.4 release. If not, including these changes in the next version of the OFED might be helpful. Thanks, Jan Ruffing -- Jan Ruffing Software Developer Motama GmbH Lortzingstraße 10 · 66111 Saarbrücken · Germany tel +49 681 940 85 50 · fax +49 681 940 85 49 ruffing at motama.com · www.motama.com Companies register · district council Saarbrücken · HRB 15249 CEOs · Dr.-Ing. Marco Lohse, Michael Repplinger This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. From ruffing at motama.com Thu Dec 11 01:55:38 2008 From: ruffing at motama.com (Jan Ruffing) Date: Thu, 11 Dec 2008 10:55:38 +0100 Subject: [ofa-general] Infiniband performance Message-ID: <4940E39A.5090802@motama.com> Hello, I'm new to Infiniband and still trying to get a grasp on what performance it can realistically deliver. The two directly connected test machines have Mellanox Infinihost III Lx DDR HCA cards installed and run OpenSuse 11 with a 2.6.25.16 Kernel. 1) Maximum Bandwidth? Infiniband (Double Data Rate, 4x lane) is advertised with a bandwidth of 20 Gbit/s. If my understanding is correct, this is only the signal rate, which would translate to a 16/Gbit/s data rate due to 8:10 encryption? The maximum speed I meassured so far was 12Gbit/s on the low-level-Protocolls: tamara /home/ruffing> ibv_rc_pingpong -m 2048 -s 1048576 -n 10000 local address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302 remote address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba 20971520000 bytes in 13.63 seconds = 12313.27 Mbit/sec 10000 iters in 13.63 seconds = 1362.53 usec/iter melissa Dokumente/Infiniband> ibv_rc_pingpong 192.168.2.1 -m 2048 -s 1048576 -n 10000 local address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba remote address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302 20971520000 bytes in 13.63 seconds = 12313.38 Mbit/sec 10000 iters in 13.63 seconds = 1362.52 usec/iter Maximal user-level bandwidth was 11.5 GBit/s using RDMA: ruffing at melissa:~/Dokumente/Infiniband/NetPIPE-3.7.1> ./NPibv -m 2048 -t rdma_write -c local_poll -h 192.168.2.1 -n 100 Using RDMA Write communications Using local polling completion Preposting asynchronous receives (required for Infiniband) Now starting the main loop [...] 121: 8388605 bytes 100 times --> 11851.72 Mbps in 5400.06 usec 122: 8388608 bytes 100 times --> 11851.66 Mbps in 5400.09 usec 123: 8388611 bytes 100 times --> 11850.62 Mbps in 5400.57 usec That's actually 4 Gbit/s short of what I was hoping for. Yet I couldn't find any test results on the net that yielded more than 12 GBit/s on 4x DDR-HCAs. Where does this performance loss stem from? On first view, 4 GBit/s (25% of the data rate) looks quite a lot to be only protocol overhead... Is 12 GBit/s the current maximum bandwidth, or is it possible for Infiniband users to improve performance beyond that? 2) TCP (over IPoIB) vs. RDMA/SDP/uverbs? On the first Infiniband installation using the packages of the OpenSuse 11 distribution, I got a TCP bandwidth of 10 GBit/s. (Which actually isn't that bad when compared to a meassured maximal bandwidth of 12 GBit/s.) This installation did neither support RDMA nor SDP, though. tamara iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 3M ------------------------------------------------------------ Client connecting to 192.168.2.2, TCP port 5001 TCP window size: 515 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.1 port 47730 connected with 192.168.2.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 11.6 GBytes 10.0 Gbits/sec After I installed the OFED 1.4 beta to be able to use SDP, RDMA and uverbs, I could use them to get of 12 GBit/s. Yet the TCP rate dropped by 2-3 GBit/s to 7-8 GBit/s. ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 10M ------------------------------------------------------------ Client connecting to 192.168.2.2, TCP port 5001 TCP window size: 193 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.1 port 51988 connected with 192.168.2.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 8.16 GBytes 7.00 Gbits/sec What could have caused this loss of bandwidth? Is there a way to avoid it? Obviously, this could be a show stopper (for me) as far as native Infiniband protocolls are concerned: Gaining 2 GBit/sec under special circumstances probably won't outweigh loosing 3 GBit/s during normal use. 3) SDP performance The SDP performance (using preloading of libsdp.so) only meassured 6.2 GBit/s, even underperforming TCP: ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> LD_PRELOAD=/usr/lib/libsdp.so LIBSDP_CONFIG_FILE=/etc/libsdp.conf ./iperf -c 192.168.2.2 -l 10M ------------------------------------------------------------ Client connecting to 192.168.2.2, TCP port 5001 TCP window size: 16.0 MByte (default) ------------------------------------------------------------ [ 4] local 192.168.2.1 port 36832 connected with 192.168.2.2 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 7.22 GBytes 6.20 Gbits/sec /etc/libsdp.conf consits of the following two lines: use both server * *:* use both client * *:* I have a hard time believing that's the max rate of SDP. (Even if Cisco meassured similar 6.6 GBit/s: https://www.cisco.com/en/US/docs/server_nw_virtual/commercial_host_driver/host_driver_linux/user/guide/sdp.html#wp948100) Did I mess up my Infiniband installation, or is SDP really slower than TCP over IPoIB? Sorry if my mail might sound somewhat negative, but I'm still trying to get past the marketing buzz and figure out what to realisticly expect of Infiniband. Currently, I'm still hoping that I messed up my installation somewhere, and that a few pointers in the right direction might resolve most of the issues... :) Thanks in advance, Jan Ruffing Devices: tamara /dev/infiniband> ls -la total 0 drwxr-xr-x 2 root root 140 2008-12-02 16:20 . drwxr-xr-x 13 root root 4580 2008-12-09 14:59 .. crw-rw---- 1 root root 231, 64 2008-12-02 16:20 issm0 crw-rw-rw- 1 root users 10, 59 2008-11-27 10:24 rdma_cm crw-rw---- 1 root root 231, 0 2008-12-02 16:20 umad0 crw-rw-rw- 1 root users 231, 192 2008-11-27 10:15 uverbs0 crw-rw---- 1 root users 231, 193 2008-11-27 10:15 uverbs1 Installed Packages: Build ofa_kernel RPM Install kernel-ib RPM: Build ofed-scripts RPM Install ofed-scripts RPM: Install libibverbs RPM: Install libibverbs-devel RPM: Install libibverbs-devel-static RPM: Install libibverbs-utils RPM: Install libmthca RPM: Install libmthca-devel-static RPM: Install libmlx4 RPM: Install libmlx4-devel RPM: Install libcxgb3 RPM: Install libcxgb3-devel RPM: Install libnes RPM: Install libnes-devel-static RPM: Install libibcm RPM: Install libibcm-devel RPM: Install libibcommon RPM: Install libibcommon-devel RPM: Install libibcommon-static RPM: Install libibumad RPM: Install libibumad-devel RPM: Install libibumad-static RPM: Build libibmad RPM Install libibmad RPM: Install libibmad-devel RPM: Install libibmad-static RPM: Install ibsim RPM: Install librdmacm RPM: Install librdmacm-utils RPM: Install librdmacm-devel RPM: Install libsdp RPM: Install libsdp-devel RPM: Install opensm-libs RPM: Install opensm RPM: Install opensm-devel RPM: Install opensm-static RPM: Install compat-dapl RPM: Install compat-dapl-devel RPM: Install dapl RPM: Install dapl-devel RPM: Install dapl-devel-static RPM: Install dapl-utils RPM: Install perftest RPM: Install mstflint RPM: Install sdpnetstat RPM: Install srptools RPM: Install rds-tools RPM: (installed ibutils manually) Loaded Modules: (libsdp currently unloaded) Directory: /home/ruffing tamara /home/ruffing> lsmod | grep ib ib_addr 24580 1 rdma_cm ib_ipoib 97576 0 ib_cm 53584 2 rdma_cm,ib_ipoib ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm ib_uverbs 56884 1 rdma_ucm ib_umad 32016 4 mlx4_ib 79884 0 mlx4_core 114924 1 mlx4_ib ib_mthca 148924 0 ib_mad 53400 5 ib_cm,ib_sa,ib_umad,mlx4_ib,ib_mthca ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad ipv6 281064 23 ib_ipoib rtc_lib 19328 1 rtc_core libata 176604 2 ata_piix,pata_it8213 scsi_mod 168436 4 sr_mod,sg,sd_mod,libata dock 27536 1 libata tamara /home/ruffing> lsmod | grep rdma rdma_ucm 30248 0 rdma_cm 49544 1 rdma_ucm iw_cm 25988 1 rdma_cm ib_addr 24580 1 rdma_cm ib_cm 53584 2 rdma_cm,ib_ipoib ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm ib_uverbs 56884 1 rdma_ucm ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad -- Jan Ruffing Software Developer Motama GmbH Lortzingstraße 10 · 66111 Saarbrücken · Germany tel +49 681 940 85 50 · fax +49 681 940 85 49 ruffing at motama.com · www.motama.com Companies register · district council Saarbrücken · HRB 15249 CEOs · Dr.-Ing. Marco Lohse, Michael Repplinger This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. From tziporet at dev.mellanox.co.il Thu Dec 11 02:30:13 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 11 Dec 2008 12:30:13 +0200 Subject: [ofa-general] Test modules for ib_core In-Reply-To: <4940D767.8090200@ext.bull.net> References: <4940D767.8090200@ext.bull.net> Message-ID: <4940EBB5.6030702@mellanox.co.il> Nicolas Morey Chaisemartin wrote: > > I was wondering if there was any program/module available to test > directly the kernel verbs. > I am looking for a simple test, like ib_write_bw (but in kernel mode) > to do some tests on the ib_core/rdma_cm modules. > > I could write it but it takes some time to got a clean interface > between user and kernel mode to send parameters (like server/host, > message size and so on). > > So if anyone of you has such a module and can share it, I'd be really > thankful. > There is krping test: http://www.openfabrics.org/git/?p=~swise/krping.git;a=summary Steve Wise developed it Tziporet From orenk at dev.mellanox.co.il Thu Dec 11 03:08:35 2008 From: orenk at dev.mellanox.co.il (Oren Kladnitsky) Date: Thu, 11 Dec 2008 13:08:35 +0200 Subject: [ofa-general] ibutils patch In-Reply-To: <4940E26E.9080502@motama.com> References: <4940E26E.9080502@motama.com> Message-ID: <4940F4B3.90900@dev.mellanox.co.il> Jan Ruffing wrote: > Hello, > > Last friday, I installed the OFED 1.4 beta on a 32 bit system with the > OpenSuse 11 distribution installed (Kernel 2.6.25.16). > > The install script worked fine up until ibutils. > > Ibutils had to be patched with additional includes to compile: > > diff -r /tmp/infiniband/ibutils/ibutils-1.2//ibdm/datamodel/Fabric.h > /var/tmp/OFED_topdir/BUILD/ibutils-1.2//ibdm/datamodel/Fabric.h > 57a58 > >> #include >> > 58a60 > >> #include >> > diff -r > /tmp/infiniband/ibutils/ibutils-1.2//ibdm/datamodel/LinkCover.cpp > /var/tmp/OFED_topdir/BUILD/ibutils-1.2//ibdm/datamodel/LinkCover.cpp > 41a42,43 > >> #include >> >> > diff -r /tmp/infiniband/ibutils/ibutils-1.2//ibmgtsim/src/dispatcher.h > /var/tmp/OFED_topdir/BUILD/ibutils-1.2//ibmgtsim/src/dispatcher.h > 55a56 > >> #include >> > diff -r /tmp/infiniband/ibutils/ibutils-1.2//ibmgtsim/src/msgmgr.cpp > /var/tmp/OFED_topdir/BUILD/ibutils-1.2//ibmgtsim/src/msgmgr.cpp > 36a37 > >> #include >> > diff -r /tmp/infiniband/ibutils/ibutils-1.2//ibmgtsim/src/node.cpp > /var/tmp/OFED_topdir/BUILD/ibutils-1.2//ibmgtsim/src/node.cpp > 52a53 > >> #include >> > > > I don't know if this has been corrected in the OFED 1.4 release. If not, > including these changes in the next version of the OFED might be helpful. > > Thanks, > Jan Ruffing > > Hi, This patch is already in OFED 1.4 release. Regards, Oren. From vlad at lists.openfabrics.org Thu Dec 11 03:23:12 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 11 Dec 2008 03:23:12 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081211-0200 daily build status Message-ID: <20081211112312.7F59FE28006@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From nicolas.morey-chaisemartin at ext.bull.net Thu Dec 11 03:36:02 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Thu, 11 Dec 2008 12:36:02 +0100 Subject: [ofa-general] Test modules for ib_core In-Reply-To: <4940EBB5.6030702@mellanox.co.il> References: <4940D767.8090200@ext.bull.net> <4940EBB5.6030702@mellanox.co.il> Message-ID: <4940FB22.9090809@ext.bull.net> Tziporet Koren wrote: > Nicolas Morey Chaisemartin wrote: >> >> I was wondering if there was any program/module available to test >> directly the kernel verbs. >> I am looking for a simple test, like ib_write_bw (but in kernel mode) >> to do some tests on the ib_core/rdma_cm modules. >> >> I could write it but it takes some time to got a clean interface >> between user and kernel mode to send parameters (like server/host, >> message size and so on). >> >> So if anyone of you has such a module and can share it, I'd be really >> thankful. >> > > There is krping test: > http://www.openfabrics.org/git/?p=~swise/krping.git;a=summary > Steve Wise developed it > > Tziporet > > Great ! That's exactly what I needed !! Best Regards Nicolas From ronli at voltaire.com Thu Dec 11 00:59:09 2008 From: ronli at voltaire.com (Ron Livne) Date: Thu, 11 Dec 2008 10:59:09 +0200 (IST) Subject: [ofa-general] [PATCH 0/4] Adding new verbs: create_qp_flags Message-ID: This series of patches will add a new user space verb: ibv_create_qp_flags This new verb works similarly like ibv_create_qp, excpet for the additional parameter uint32_t create_flags that it takes in order to create the QP with the specified creation flags. I've already sent a similar series of patches in July, but it wasn't merged. The older patches were based on Jack Morgensteins XRC patches. These patches are not based on them. The uverbs patch was written based on the for-next branch. These patches don't break the ABI and are compatible with older kernel/libibverbs versions. The reason I added another verb in the kernel, is because I don't think 8 bits (the reserved bits in struct ib_uverbs_create_qp) will be enough in the future. From ronli at voltaire.com Thu Dec 11 01:02:24 2008 From: ronli at voltaire.com (Ron Livne) Date: Thu, 11 Dec 2008 11:02:24 +0200 (IST) Subject: [ofa-general] [PATCH 2/4]libmlx4: add support for new verb create_qp_flags Message-ID: Adds support for the new verb: create_qp_expanded in libmlx4. Signed-off-by: Ron Livne diff --git a/configure.in b/configure.in index 25f27f7..10f61eb 100644 --- a/configure.in +++ b/configure.in @@ -42,7 +42,9 @@ AC_CHECK_HEADER(valgrind/memcheck.h, dnl Checks for typedefs, structures, and compiler characteristics. AC_C_CONST AC_CHECK_SIZEOF(long) - +AC_CHECK_MEMBER(struct ibv_context.more_ops, + [AC_DEFINE([HAVE_IBV_MORE_OPS], 1, [Define to 1 if more_ops is a member of ibv_context])],, + [#include ]) dnl Checks for library functions AC_CHECK_FUNC(ibv_read_sysfs_file, [], AC_MSG_ERROR([ibv_read_sysfs_file() not found. libmlx4 requires libibverbs >= 1.0.3.])) diff --git a/src/mlx4-abi.h b/src/mlx4-abi.h index 20a40c9..4f0ad13 100644 --- a/src/mlx4-abi.h +++ b/src/mlx4-abi.h @@ -90,4 +90,14 @@ struct mlx4_create_qp { __u8 reserved[5]; }; +struct mlx4_create_qp_flags { + struct ibv_create_qp_flags ibv_cmd; + __u64 buf_addr; + __u64 db_addr; + __u8 log_sq_bb_count; + __u8 log_sq_stride; + __u8 sq_no_prefetch; + __u8 reserved[5]; +}; + #endif /* MLX4_ABI_H */ diff --git a/src/mlx4.c b/src/mlx4.c index 34ece39..04f453f 100644 --- a/src/mlx4.c +++ b/src/mlx4.c @@ -68,6 +68,12 @@ struct { HCA(MELLANOX, 0x673c), /* MT25408 "Hermon" QDR PCIe gen2 */ }; +#ifdef HAVE_IBV_MORE_OPS +static struct ibv_more_ops mlx4_more_ops = { + .create_qp_flags = mlx4_create_qp_flags, +}; +#endif + static struct ibv_context_ops mlx4_ctx_ops = { .query_device = mlx4_query_device, .query_port = mlx4_query_port, @@ -156,6 +162,10 @@ static struct ibv_context *mlx4_alloc_context(struct ibv_device *ibdev, int cmd_ context->ibv_ctx.ops = mlx4_ctx_ops; +#ifdef HAVE_IBV_MORE_OPS + context->ibv_ctx.more_ops = &mlx4_more_ops; +#endif + return &context->ibv_ctx; err_free: diff --git a/src/mlx4.h b/src/mlx4.h index 827a201..29c46a5 100644 --- a/src/mlx4.h +++ b/src/mlx4.h @@ -335,6 +335,11 @@ int mlx4_post_srq_recv(struct ibv_srq *ibsrq, struct ibv_recv_wr **bad_wr); struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr); +#ifdef HAVE_IBV_MORE_OPS +struct ibv_qp *mlx4_create_qp_flags(struct ibv_pd *pd, + struct ibv_qp_init_attr *attr, + uint32_t create_flags); +#endif int mlx4_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, struct ibv_qp_init_attr *init_attr); diff --git a/src/verbs.c b/src/verbs.c index cc179a0..b67ea79 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -384,13 +384,10 @@ int mlx4_destroy_srq(struct ibv_srq *srq) return 0; } -struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr) +static struct mlx4_qp *mlx4_create_qp_common(struct ibv_pd *pd, + struct ibv_qp_init_attr *attr, + struct mlx4_qp *qp) { - struct mlx4_create_qp cmd; - struct ibv_create_qp_resp resp; - struct mlx4_qp *qp; - int ret; - /* Sanity check QP size before proceeding */ if (attr->cap.max_send_wr > 65536 || attr->cap.max_recv_wr > 65536 || @@ -439,6 +436,31 @@ struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr) *qp->db = 0; } + return qp; + +err_free: + free(qp->sq.wrid); + if (qp->rq.wqe_cnt) + free(qp->rq.wrid); + mlx4_free_buf(&qp->buf); + +err: + free(qp); + + return NULL; +} + +struct ibv_qp *mlx4_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr) +{ + struct mlx4_create_qp cmd; + struct ibv_create_qp_resp resp; + struct mlx4_qp *qp; + int ret; + + qp = mlx4_create_qp_common(pd, attr, qp); + if (qp == NULL) + return NULL; + cmd.buf_addr = (uintptr_t) qp->buf.buf; if (attr->srq) cmd.db_addr = 0; @@ -484,13 +506,78 @@ err_rq_db: if (!attr->srq) mlx4_free_db(to_mctx(pd->context), MLX4_DB_TYPE_RQ, qp->db); -err_free: free(qp->sq.wrid); if (qp->rq.wqe_cnt) free(qp->rq.wrid); mlx4_free_buf(&qp->buf); -err: + free(qp); + + return NULL; +} + +struct ibv_qp *mlx4_create_qp_flags(struct ibv_pd *pd, + struct ibv_qp_init_attr *attr, + uint32_t create_flags) +{ + struct mlx4_create_qp_flags cmd; + struct ibv_create_qp_resp resp; + struct mlx4_qp *qp; + int ret; + + qp = mlx4_create_qp_common(pd, attr, qp); + + cmd.buf_addr = (uintptr_t) qp->buf.buf; + if (attr->srq) + cmd.db_addr = 0; + else + cmd.db_addr = (uintptr_t) qp->db; + cmd.log_sq_stride = qp->sq.wqe_shift; + for (cmd.log_sq_bb_count = 0; + qp->sq.wqe_cnt > 1 << cmd.log_sq_bb_count; + ++cmd.log_sq_bb_count) + ; /* nothing */ + cmd.sq_no_prefetch = 0; /* OK for ABI 2: just a reserved field */ + memset(cmd.reserved, 0, sizeof cmd.reserved); + + pthread_mutex_lock(&to_mctx(pd->context)->qp_table_mutex); + + ret = ibv_cmd_create_qp_flags(pd, &qp->ibv_qp, attr, create_flags, + &cmd.ibv_cmd, sizeof cmd, + &resp, sizeof resp); + if (ret) + goto err_rq_db; + + ret = mlx4_store_qp(to_mctx(pd->context), qp->ibv_qp.qp_num, qp); + if (ret) + goto err_destroy; + pthread_mutex_unlock(&to_mctx(pd->context)->qp_table_mutex); + + qp->rq.wqe_cnt = qp->rq.max_post = attr->cap.max_recv_wr; + qp->rq.max_gs = attr->cap.max_recv_sge; + mlx4_set_sq_sizes(qp, &attr->cap, attr->qp_type); + + qp->doorbell_qpn = htonl(qp->ibv_qp.qp_num << 8); + if (attr->sq_sig_all) + qp->sq_signal_bits = htonl(MLX4_WQE_CTRL_CQ_UPDATE); + else + qp->sq_signal_bits = 0; + + return &qp->ibv_qp; + +err_destroy: + ibv_cmd_destroy_qp(&qp->ibv_qp); + +err_rq_db: + pthread_mutex_unlock(&to_mctx(pd->context)->qp_table_mutex); + if (!attr->srq) + mlx4_free_db(to_mctx(pd->context), MLX4_DB_TYPE_RQ, qp->db); + + free(qp->sq.wrid); + if (qp->rq.wqe_cnt) + free(qp->rq.wrid); + mlx4_free_buf(&qp->buf); + free(qp); return NULL; From ronli at voltaire.com Thu Dec 11 01:03:22 2008 From: ronli at voltaire.com (Ron Livne) Date: Thu, 11 Dec 2008 11:03:22 +0200 (IST) Subject: [ofa-general] [PATCH 3/4]uverbs: add support for new verb create_qp_flags Message-ID: This patch adds support for create_qp_flags to uverbs. Signed-off-by: Ron Livne diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index b3ea958..5871fd0 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -194,5 +194,6 @@ IB_UVERBS_DECLARE_CMD(create_srq); IB_UVERBS_DECLARE_CMD(modify_srq); IB_UVERBS_DECLARE_CMD(query_srq); IB_UVERBS_DECLARE_CMD(destroy_srq); +IB_UVERBS_DECLARE_CMD(create_qp_flags); #endif /* UVERBS_H */ diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 56feab6..58128a8 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -1153,6 +1153,146 @@ err_put: return ret; } +ssize_t ib_uverbs_create_qp_flags(struct ib_uverbs_file *file, + const char __user *buf, int in_len, + int out_len) +{ + struct ib_uverbs_create_qp_flags cmd; + struct ib_uverbs_create_qp_resp resp; + struct ib_udata udata; + struct ib_uqp_object *obj; + struct ib_pd *pd; + struct ib_cq *scq, *rcq; + struct ib_srq *srq; + struct ib_qp *qp; + struct ib_qp_init_attr attr; + int ret; + + if (out_len < sizeof resp) + return -ENOSPC; + + if (copy_from_user(&cmd, buf, sizeof cmd)) + return -EFAULT; + + INIT_UDATA(&udata, buf + sizeof cmd, + (unsigned long) cmd.response + sizeof resp, + in_len - sizeof cmd, out_len - sizeof resp); + + obj = kmalloc(sizeof *obj, GFP_KERNEL); + if (!obj) + return -ENOMEM; + + init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext, &qp_lock_key); + down_write(&obj->uevent.uobject.mutex); + + srq = cmd.is_srq ? idr_read_srq(cmd.srq_handle, file->ucontext) : NULL; + pd = idr_read_pd(cmd.pd_handle, file->ucontext); + scq = idr_read_cq(cmd.send_cq_handle, file->ucontext, 0); + rcq = cmd.recv_cq_handle == cmd.send_cq_handle ? + scq : idr_read_cq(cmd.recv_cq_handle, file->ucontext, 1); + + if (!pd || !scq || !rcq || (cmd.is_srq && !srq)) { + ret = -EINVAL; + goto err_put; + } + + attr.event_handler = ib_uverbs_qp_event_handler; + attr.qp_context = file; + attr.send_cq = scq; + attr.recv_cq = rcq; + attr.srq = srq; + attr.sq_sig_type = cmd.sq_sig_all ? IB_SIGNAL_ALL_WR : IB_SIGNAL_REQ_WR; + attr.qp_type = cmd.qp_type; + attr.create_flags = cmd.create_flags; + attr.cap.max_send_wr = cmd.max_send_wr; + attr.cap.max_recv_wr = cmd.max_recv_wr; + attr.cap.max_send_sge = cmd.max_send_sge; + attr.cap.max_recv_sge = cmd.max_recv_sge; + attr.cap.max_inline_data = cmd.max_inline_data; + + obj->uevent.events_reported = 0; + INIT_LIST_HEAD(&obj->uevent.event_list); + INIT_LIST_HEAD(&obj->mcast_list); + qp = pd->device->create_qp(pd, &attr, &udata); + + if (IS_ERR(qp)) { + ret = PTR_ERR(qp); + goto err_put; + } + + qp->device = pd->device; + qp->pd = pd; + qp->send_cq = attr.send_cq; + qp->recv_cq = attr.recv_cq; + qp->srq = attr.srq; + qp->uobject = &obj->uevent.uobject; + qp->event_handler = attr.event_handler; + qp->qp_context = attr.qp_context; + qp->qp_type = attr.qp_type; + atomic_inc(&pd->usecnt); + atomic_inc(&attr.send_cq->usecnt); + atomic_inc(&attr.recv_cq->usecnt); + if (attr.srq) + atomic_inc(&attr.srq->usecnt); + + obj->uevent.uobject.object = qp; + ret = idr_add_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject); + if (ret) + goto err_destroy; + + memset(&resp, 0, sizeof resp); + resp.qpn = qp->qp_num; + resp.qp_handle = obj->uevent.uobject.id; + resp.max_recv_sge = attr.cap.max_recv_sge; + resp.max_send_sge = attr.cap.max_send_sge; + resp.max_recv_wr = attr.cap.max_recv_wr; + resp.max_send_wr = attr.cap.max_send_wr; + resp.max_inline_data = attr.cap.max_inline_data; + + if (copy_to_user((void __user *) (unsigned long) cmd.response, + &resp, sizeof resp)) { + ret = -EFAULT; + goto err_copy; + } + + put_pd_read(pd); + put_cq_read(scq); + if (rcq != scq) + put_cq_read(rcq); + if (srq) + put_srq_read(srq); + + mutex_lock(&file->mutex); + list_add_tail(&obj->uevent.uobject.list, &file->ucontext->qp_list); + mutex_unlock(&file->mutex); + + obj->uevent.uobject.live = 1; + + up_write(&obj->uevent.uobject.mutex); + + return in_len; + +err_copy: + idr_remove_uobj(&ib_uverbs_qp_idr, &obj->uevent.uobject); + +err_destroy: + ib_destroy_qp(qp); + +err_put: + if (pd) + put_pd_read(pd); + if (scq) + put_cq_read(scq); + if (rcq && rcq != scq) + put_cq_read(rcq); + if (srq) + put_srq_read(srq); + + put_uobj_write(&obj->uevent.uobject); + return ret; +} + + ssize_t ib_uverbs_query_qp(struct ib_uverbs_file *file, const char __user *buf, int in_len, int out_len) diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index eb36a81..e050df4 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -108,6 +108,7 @@ static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file *file, [IB_USER_VERBS_CMD_MODIFY_SRQ] = ib_uverbs_modify_srq, [IB_USER_VERBS_CMD_QUERY_SRQ] = ib_uverbs_query_srq, [IB_USER_VERBS_CMD_DESTROY_SRQ] = ib_uverbs_destroy_srq, + [IB_USER_VERBS_CMD_CREATE_QP_FLAGS] = ib_uverbs_create_qp_flags, }; static struct vfsmount *uverbs_event_mnt; diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 2e80f8f..78c161d 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -603,7 +603,8 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) (1ull << IB_USER_VERBS_CMD_CREATE_SRQ) | (1ull << IB_USER_VERBS_CMD_MODIFY_SRQ) | (1ull << IB_USER_VERBS_CMD_QUERY_SRQ) | - (1ull << IB_USER_VERBS_CMD_DESTROY_SRQ); + (1ull << IB_USER_VERBS_CMD_DESTROY_SRQ) | + (1ull << IB_USER_VERBS_CMD_CREATE_QP_FLAGS); ibdev->ib_dev.query_device = mlx4_ib_query_device; ibdev->ib_dev.query_port = mlx4_ib_query_port; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 39167a7..09d79ea 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -505,9 +505,6 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, } else { qp->sq_no_prefetch = 0; - if (init_attr->create_flags & IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK) - qp->flags |= MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK; - if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO) qp->flags |= MLX4_IB_QP_LSO; @@ -554,6 +551,10 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, goto err_wrid; } + if (init_attr->create_flags & + IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK) + qp->flags |= MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK; + err = mlx4_qp_alloc(dev->dev, qpn, &qp->mqp); if (err) goto err_qpn; @@ -697,16 +698,17 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd, struct mlx4_ib_qp *qp; int err; - /* - * We only support LSO and multicast loopback blocking, and - * only for kernel UD QPs. - */ - if (init_attr->create_flags & ~(IB_QP_CREATE_IPOIB_UD_LSO | - IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK)) +/** + * We support creation flags only for UD type QPs. + */ + if (init_attr->create_flags && init_attr->qp_type != IB_QPT_UD) return ERR_PTR(-EINVAL); - if (init_attr->create_flags && - (pd->uobject || init_attr->qp_type != IB_QPT_UD)) +/** + * We support LSO creation flag only for kernel QPs. + */ + if ((init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO) && + pd->uobject) return ERR_PTR(-EINVAL); switch (init_attr->qp_type) { diff --git a/include/rdma/ib_user_verbs.h b/include/rdma/ib_user_verbs.h index a17f771..ab2e646 100644 --- a/include/rdma/ib_user_verbs.h +++ b/include/rdma/ib_user_verbs.h @@ -81,7 +81,8 @@ enum { IB_USER_VERBS_CMD_MODIFY_SRQ, IB_USER_VERBS_CMD_QUERY_SRQ, IB_USER_VERBS_CMD_DESTROY_SRQ, - IB_USER_VERBS_CMD_POST_SRQ_RECV + IB_USER_VERBS_CMD_POST_SRQ_RECV, + IB_USER_VERBS_CMD_CREATE_QP_FLAGS, }; /* @@ -403,6 +404,27 @@ struct ib_uverbs_create_qp { __u64 driver_data[0]; }; +struct ib_uverbs_create_qp_flags { + __u64 response; + __u64 user_handle; + __u32 pd_handle; + __u32 send_cq_handle; + __u32 recv_cq_handle; + __u32 srq_handle; + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + __u8 sq_sig_all; + __u8 qp_type; + __u8 is_srq; + __u8 reserved; + __u32 reserved1; + __u32 create_flags; + __u64 driver_data[0]; +}; + struct ib_uverbs_create_qp_resp { __u32 qp_handle; __u32 qpn; From ronli at voltaire.com Thu Dec 11 01:04:48 2008 From: ronli at voltaire.com (Ron Livne) Date: Thu, 11 Dec 2008 11:04:48 +0200 (IST) Subject: [ofa-general] [PATCH 4/4] libibverbs: Add new creation flag: IBV_QP_CREATE_MULTICAST_LOOPBACK_BLOCK Message-ID: Adds enum ibv_qp_create_flags, which holds all possible user-space creation flags for a QP. This enum currently has one flag: IBV_QP_CREATE_MULTICAST_LOOPBACK_BLOCK Which will be used for creating a UD QP that blocks multicast loopback packets. This will only work if the device has this capability - currently only for mlx4 Signed-off-by: Ron Livne diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index fe586f1..cf3b15d 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -92,7 +92,16 @@ enum ibv_device_cap_flags { IBV_DEVICE_SYS_IMAGE_GUID = 1 << 11, IBV_DEVICE_RC_RNR_NAK_GEN = 1 << 12, IBV_DEVICE_SRQ_RESIZE = 1 << 13, - IBV_DEVICE_N_NOTIFY_CQ = 1 << 14 + IBV_DEVICE_N_NOTIFY_CQ = 1 << 14, + IBV_DEVICE_BLOCK_MULTICAST_LOOPBACK = 1 << 22 +}; + +/* + This enum must be aligned with ib_qp_create_flags + in include/rdma/ib_verbs.h +*/ +enum ibv_qp_create_flags { + IBV_QP_CREATE_MULTICAST_LOOPBACK_BLOCK = 1 << 1 }; enum ibv_atomic_cap { From ronli.voltaire at gmail.com Thu Dec 11 04:11:50 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Thu, 11 Dec 2008 14:11:50 +0200 Subject: [ofa-general] ***SPAM*** Fwd: [ewg] [PATCH 1/4]libibverbs: add new verb: ibv_create_qp_flags In-Reply-To: References: Message-ID: <3b5e77ad0812110411w6f9c327fr4b62f985ade2660d@mail.gmail.com> I accedently sent this to ewg instead of general. ---------- Forwarded message ---------- From: Ron Livne Date: Thu, Dec 11, 2008 at 11:01 AM Subject: [ewg] [PATCH 1/4]libibverbs: add new verb: ibv_create_qp_flags To: Roland Drier Cc: Ofed lists This patch adds a new verb to the libibverbs: struct ibv_qp *ibv_create_qp_flags(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr, uint32_t create_flags); which works similarly to ibv_create_qp, except for that it now takes another argument: uint32_t create_flags these creation flags must be aligned with those in the ib_verbs.h in the kernel. Signed-off-by: Ron Livne diff --git a/include/infiniband/driver.h b/include/infiniband/driver.h index 67a3bf8..fa74909 100644 --- a/include/infiniband/driver.h +++ b/include/infiniband/driver.h @@ -112,6 +112,11 @@ int ibv_cmd_create_qp(struct ibv_pd *pd, struct ibv_qp *qp, struct ibv_qp_init_attr *attr, struct ibv_create_qp *cmd, size_t cmd_size, struct ibv_create_qp_resp *resp, size_t resp_size); +int ibv_cmd_create_qp_flags(struct ibv_pd *pd, + struct ibv_qp *qp, struct ibv_qp_init_attr *attr, + uint32_t create_flags, + struct ibv_create_qp_flags *cmd, size_t cmd_size, + struct ibv_create_qp_resp *resp, size_t resp_size); int ibv_cmd_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *qp_attr, enum ibv_qp_attr_mask attr_mask, struct ibv_qp_init_attr *qp_init_attr, diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h index 0db083a..9a3a27b 100644 --- a/include/infiniband/kern-abi.h +++ b/include/infiniband/kern-abi.h @@ -85,7 +85,8 @@ enum { IB_USER_VERBS_CMD_MODIFY_SRQ, IB_USER_VERBS_CMD_QUERY_SRQ, IB_USER_VERBS_CMD_DESTROY_SRQ, - IB_USER_VERBS_CMD_POST_SRQ_RECV + IB_USER_VERBS_CMD_POST_SRQ_RECV, + IB_USER_VERBS_CMD_CREATE_QP_FLAGS }; /* @@ -451,6 +452,30 @@ struct ibv_create_qp { __u64 driver_data[0]; }; +struct ibv_create_qp_flags { + __u32 command; + __u16 in_words; + __u16 out_words; + __u64 response; + __u64 user_handle; + __u32 pd_handle; + __u32 send_cq_handle; + __u32 recv_cq_handle; + __u32 srq_handle; + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + __u8 sq_sig_all; + __u8 qp_type; + __u8 is_srq; + __u8 reserved; + __u32 reserved1; + __u32 create_flags; + __u64 driver_data[0]; +}; + struct ibv_create_qp_resp { __u32 qp_handle; __u32 qpn; @@ -803,6 +828,7 @@ enum { * trick opcodes in IBV_INIT_CMD() doesn't break. */ IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL_V2 = -1, + IB_USER_VERBS_CMD_CREATE_QP_FLAGS_V2 = -1, }; struct ibv_destroy_cq_v1 { diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h index a04cc62..fe586f1 100644 --- a/include/infiniband/verbs.h +++ b/include/infiniband/verbs.h @@ -625,6 +625,12 @@ struct ibv_device { char ibdev_path[IBV_SYSFS_PATH_MAX]; }; +struct ibv_more_ops { + struct ibv_qp * (*create_qp_flags)(struct ibv_pd *pd, + struct ibv_qp_init_attr *attr, + uint32_t create_flags); +}; + struct ibv_context_ops { int (*query_device)(struct ibv_context *context, struct ibv_device_attr *device_attr); @@ -691,6 +697,7 @@ struct ibv_context { int num_comp_vectors; pthread_mutex_t mutex; void *abi_compat; + struct ibv_more_ops *more_ops; }; /** @@ -963,6 +970,13 @@ struct ibv_qp *ibv_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr); /** + * ibv_create_qp_flags - Create a queue pair with creation flags. + */ +struct ibv_qp *ibv_create_qp_flags(struct ibv_pd *pd, + struct ibv_qp_init_attr *qp_init_attr, + uint32_t create_flags); + +/** * ibv_modify_qp - Modify a queue pair. */ int ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, diff --git a/src/cmd.c b/src/cmd.c index 66d7134..1b778e1 100644 --- a/src/cmd.c +++ b/src/cmd.c @@ -592,31 +592,12 @@ int ibv_cmd_destroy_srq(struct ibv_srq *srq) return 0; } -int ibv_cmd_create_qp(struct ibv_pd *pd, - struct ibv_qp *qp, struct ibv_qp_init_attr *attr, - struct ibv_create_qp *cmd, size_t cmd_size, - struct ibv_create_qp_resp *resp, size_t resp_size) +static int cmd_create_qp_common(struct ibv_pd *pd, + struct ibv_qp *qp, + struct ibv_qp_init_attr *attr, + struct ibv_create_qp_resp *resp, + size_t resp_size) { - IBV_INIT_CMD_RESP(cmd, cmd_size, CREATE_QP, resp, resp_size); - - cmd->user_handle = (uintptr_t) qp; - cmd->pd_handle = pd->handle; - cmd->send_cq_handle = attr->send_cq->handle; - cmd->recv_cq_handle = attr->recv_cq->handle; - cmd->srq_handle = attr->srq ? attr->srq->handle : 0; - cmd->max_send_wr = attr->cap.max_send_wr; - cmd->max_recv_wr = attr->cap.max_recv_wr; - cmd->max_send_sge = attr->cap.max_send_sge; - cmd->max_recv_sge = attr->cap.max_recv_sge; - cmd->max_inline_data = attr->cap.max_inline_data; - cmd->sq_sig_all = attr->sq_sig_all; - cmd->qp_type = attr->qp_type; - cmd->is_srq = !!attr->srq; - cmd->reserved = 0; - - if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) - return errno; - VALGRIND_MAKE_MEM_DEFINED(resp, resp_size); qp->handle = resp->qp_handle; @@ -650,6 +631,64 @@ int ibv_cmd_create_qp(struct ibv_pd *pd, return 0; } +int ibv_cmd_create_qp(struct ibv_pd *pd, + struct ibv_qp *qp, struct ibv_qp_init_attr *attr, + struct ibv_create_qp *cmd, size_t cmd_size, + struct ibv_create_qp_resp *resp, size_t resp_size) +{ + IBV_INIT_CMD_RESP(cmd, cmd_size, CREATE_QP, resp, resp_size); + + cmd->user_handle = (uintptr_t) qp; + cmd->pd_handle = pd->handle; + cmd->send_cq_handle = attr->send_cq->handle; + cmd->recv_cq_handle = attr->recv_cq->handle; + cmd->srq_handle = attr->srq ? attr->srq->handle : 0; + cmd->max_send_wr = attr->cap.max_send_wr; + cmd->max_recv_wr = attr->cap.max_recv_wr; + cmd->max_send_sge = attr->cap.max_send_sge; + cmd->max_recv_sge = attr->cap.max_recv_sge; + cmd->max_inline_data = attr->cap.max_inline_data; + cmd->sq_sig_all = attr->sq_sig_all; + cmd->qp_type = attr->qp_type; + cmd->is_srq = !!attr->srq; + cmd->reserved = 0; + + if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + return cmd_create_qp_common(pd, qp, attr, resp, resp_size); +} + +int ibv_cmd_create_qp_flags(struct ibv_pd *pd, + struct ibv_qp *qp, struct ibv_qp_init_attr *attr, + uint32_t create_flags, + struct ibv_create_qp_flags *cmd, size_t cmd_size, + struct ibv_create_qp_resp *resp, size_t resp_size) +{ + IBV_INIT_CMD_RESP(cmd, cmd_size, CREATE_QP_FLAGS, resp, resp_size); + + cmd->create_flags = create_flags; + cmd->user_handle = (uintptr_t) qp; + cmd->pd_handle = pd->handle; + cmd->send_cq_handle = attr->send_cq->handle; + cmd->recv_cq_handle = attr->recv_cq->handle; + cmd->srq_handle = attr->srq ? attr->srq->handle : 0; + cmd->max_send_wr = attr->cap.max_send_wr; + cmd->max_recv_wr = attr->cap.max_recv_wr; + cmd->max_send_sge = attr->cap.max_send_sge; + cmd->max_recv_sge = attr->cap.max_recv_sge; + cmd->max_inline_data = attr->cap.max_inline_data; + cmd->sq_sig_all = attr->sq_sig_all; + cmd->qp_type = attr->qp_type; + cmd->is_srq = !!attr->srq; + cmd->reserved = 0; + + if (write(pd->context->cmd_fd, cmd, cmd_size) != cmd_size) + return errno; + + return cmd_create_qp_common(pd, qp, attr, resp, resp_size); +} + int ibv_cmd_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, struct ibv_qp_init_attr *init_attr, diff --git a/src/libibverbs.map b/src/libibverbs.map index 1827da0..bdb70a0 100644 --- a/src/libibverbs.map +++ b/src/libibverbs.map @@ -96,4 +96,6 @@ IBVERBS_1.1 { ibv_port_state_str; ibv_event_type_str; ibv_wc_status_str; + ibv_create_qp_flags; + ibv_cmd_create_qp_flags; } IBVERBS_1.0; diff --git a/src/verbs.c b/src/verbs.c index 9e370ce..8ec171d 100644 --- a/src/verbs.c +++ b/src/verbs.c @@ -395,11 +395,10 @@ int __ibv_destroy_srq(struct ibv_srq *srq) } default_symver(__ibv_destroy_srq, ibv_destroy_srq); -struct ibv_qp *__ibv_create_qp(struct ibv_pd *pd, - struct ibv_qp_init_attr *qp_init_attr) +static struct ibv_qp *create_qp_common(struct ibv_pd *pd, + struct ibv_qp_init_attr *qp_init_attr, + struct ibv_qp *qp) { - struct ibv_qp *qp = pd->context->ops.create_qp(pd, qp_init_attr); - if (qp) { qp->context = pd->context; qp->qp_context = qp_init_attr->qp_context; @@ -416,8 +415,25 @@ struct ibv_qp *__ibv_create_qp(struct ibv_pd *pd, return qp; } + +struct ibv_qp *__ibv_create_qp(struct ibv_pd *pd, + struct ibv_qp_init_attr *qp_init_attr) +{ + struct ibv_qp *qp = pd->context->ops.create_qp(pd, qp_init_attr); + + return create_qp_common(pd, qp_init_attr, qp); +} default_symver(__ibv_create_qp, ibv_create_qp); +struct ibv_qp *ibv_create_qp_flags(struct ibv_pd *pd, + struct ibv_qp_init_attr *qp_init_attr, + uint32_t create_flags) +{ + struct ibv_qp *qp = pd->context->more_ops->create_qp_flags(pd, qp_init_attr, create_flags); + + return create_qp_common(pd, qp_init_attr, qp); +} + int __ibv_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, struct ibv_qp_init_attr *init_attr) _______________________________________________ ewg mailing list ewg at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg From celine.bourde at ext.bull.net Thu Dec 11 05:25:09 2008 From: celine.bourde at ext.bull.net (Celine Bourde) Date: Thu, 11 Dec 2008 14:25:09 +0100 Subject: [ofa-general] Test modules for ib_core In-Reply-To: <4940FB22.9090809@ext.bull.net> References: <4940D767.8090200@ext.bull.net> <4940EBB5.6030702@mellanox.co.il> <4940FB22.9090809@ext.bull.net> Message-ID: <494114B5.7000703@ext.bull.net> I am trying krping on 2.6.27 kernel and OFED1.4 (10 December 2008). Module compilation is ok, but modprobe failed. [root at twin krping]# modprobe rdma_krping FATAL: Error inserting rdma_krping (/lib/modules/2.6.27/extra/rdma_krping.ko): Unknown symbol in module, or unknown parameter (see dmesg) dmesg output : rdma_krping: disagrees about version of symbol ib_create_cq rdma_krping: Unknown symbol ib_create_cq rdma_krping: disagrees about version of symbol ib_alloc_fast_reg_page_list rdma_krping: Unknown symbol ib_alloc_fast_reg_page_list rdma_krping: disagrees about version of symbol rdma_resolve_addr rdma_krping: Unknown symbol rdma_resolve_addr rdma_krping: disagrees about version of symbol ib_reg_phys_mr rdma_krping: Unknown symbol ib_reg_phys_mr rdma_krping: disagrees about version of symbol ib_dereg_mr rdma_krping: Unknown symbol ib_dereg_mr rdma_krping: disagrees about version of symbol rdma_disconnect rdma_krping: Unknown symbol rdma_disconnect rdma_krping: disagrees about version of symbol rdma_resolve_route rdma_krping: Unknown symbol rdma_resolve_route rdma_krping: disagrees about version of symbol rdma_bind_addr rdma_krping: Unknown symbol rdma_bind_addr rdma_krping: disagrees about version of symbol rdma_create_qp rdma_krping: Unknown symbol rdma_create_qp rdma_krping: disagrees about version of symbol ib_destroy_cq rdma_krping: Unknown symbol ib_destroy_cq rdma_krping: disagrees about version of symbol rdma_create_id rdma_krping: Unknown symbol rdma_create_id rdma_krping: disagrees about version of symbol rdma_listen rdma_krping: Unknown symbol rdma_listen rdma_krping: disagrees about version of symbol ib_query_device rdma_krping: Unknown symbol ib_query_device rdma_krping: disagrees about version of symbol ib_get_dma_mr rdma_krping: Unknown symbol ib_get_dma_mr rdma_krping: disagrees about version of symbol ib_alloc_pd rdma_krping: Unknown symbol ib_alloc_pd rdma_krping: disagrees about version of symbol rdma_connect rdma_krping: Unknown symbol rdma_connect rdma_krping: disagrees about version of symbol ib_alloc_mw rdma_krping: Unknown symbol ib_alloc_mw rdma_krping: disagrees about version of symbol rdma_destroy_id rdma_krping: Unknown symbol rdma_destroy_id rdma_krping: disagrees about version of symbol ib_free_fast_reg_page_list rdma_krping: Unknown symbol ib_free_fast_reg_page_list rdma_krping: disagrees about version of symbol rdma_accept rdma_krping: Unknown symbol rdma_accept rdma_krping: disagrees about version of symbol ib_destroy_qp rdma_krping: Unknown symbol ib_destroy_qp rdma_krping: disagrees about version of symbol ib_dealloc_mw rdma_krping: Unknown symbol ib_dealloc_mw rdma_krping: disagrees about version of symbol ib_alloc_fast_reg_mr rdma_krping: Unknown symbol ib_alloc_fast_reg_mr rdma_krping: disagrees about version of symbol ib_dealloc_pd rdma_krping: Unknown symbol ib_dealloc_pd rdma_krping: disagrees about version of symbol ib_create_cq rdma_krping: Unknown symbol ib_create_cq rdma_krping: disagrees about version of symbol ib_alloc_fast_reg_page_list rdma_krping: Unknown symbol ib_alloc_fast_reg_page_list rdma_krping: disagrees about version of symbol rdma_resolve_addr rdma_krping: Unknown symbol rdma_resolve_addr rdma_krping: disagrees about version of symbol ib_reg_phys_mr rdma_krping: Unknown symbol ib_reg_phys_mr rdma_krping: disagrees about version of symbol ib_dereg_mr rdma_krping: Unknown symbol ib_dereg_mr rdma_krping: disagrees about version of symbol rdma_disconnect rdma_krping: Unknown symbol rdma_disconnect rdma_krping: disagrees about version of symbol rdma_resolve_route rdma_krping: Unknown symbol rdma_resolve_route rdma_krping: disagrees about version of symbol rdma_bind_addr rdma_krping: Unknown symbol rdma_bind_addr rdma_krping: disagrees about version of symbol rdma_create_qp rdma_krping: Unknown symbol rdma_create_qp rdma_krping: disagrees about version of symbol ib_destroy_cq rdma_krping: Unknown symbol ib_destroy_cq rdma_krping: disagrees about version of symbol rdma_create_id rdma_krping: Unknown symbol rdma_create_id rdma_krping: disagrees about version of symbol rdma_listen rdma_krping: Unknown symbol rdma_listen rdma_krping: disagrees about version of symbol ib_query_device rdma_krping: Unknown symbol ib_query_device rdma_krping: disagrees about version of symbol ib_get_dma_mr rdma_krping: Unknown symbol ib_get_dma_mr rdma_krping: disagrees about version of symbol ib_alloc_pd rdma_krping: Unknown symbol ib_alloc_pd rdma_krping: disagrees about version of symbol rdma_connect rdma_krping: Unknown symbol rdma_connect rdma_krping: disagrees about version of symbol ib_alloc_mw rdma_krping: Unknown symbol ib_alloc_mw rdma_krping: disagrees about version of symbol rdma_destroy_id rdma_krping: Unknown symbol rdma_destroy_id rdma_krping: disagrees about version of symbol ib_free_fast_reg_page_list rdma_krping: Unknown symbol ib_free_fast_reg_page_list rdma_krping: disagrees about version of symbol rdma_accept rdma_krping: Unknown symbol rdma_accept rdma_krping: disagrees about version of symbol ib_destroy_qp rdma_krping: Unknown symbol ib_destroy_qp rdma_krping: disagrees about version of symbol ib_dealloc_mw rdma_krping: Unknown symbol ib_dealloc_mw rdma_krping: disagrees about version of symbol ib_alloc_fast_reg_mr rdma_krping: Unknown symbol ib_alloc_fast_reg_mr rdma_krping: disagrees about version of symbol ib_dealloc_pd rdma_krping: Unknown symbol ib_dealloc_pd Any Advice ? Céline Bourde Nicolas Morey Chaisemartin wrote: > Tziporet Koren wrote: >> Nicolas Morey Chaisemartin wrote: >>> >>> I was wondering if there was any program/module available to test >>> directly the kernel verbs. >>> I am looking for a simple test, like ib_write_bw (but in kernel >>> mode) to do some tests on the ib_core/rdma_cm modules. >>> >>> I could write it but it takes some time to got a clean interface >>> between user and kernel mode to send parameters (like server/host, >>> message size and so on). >>> >>> So if anyone of you has such a module and can share it, I'd be >>> really thankful. >>> >> >> There is krping test: >> http://www.openfabrics.org/git/?p=~swise/krping.git;a=summary >> Steve Wise developed it >> >> Tziporet >> >> > Great ! > > That's exactly what I needed !! > > Best Regards > > Nicolas > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > From Arkady.Kanevsky at netapp.com Thu Dec 11 07:01:55 2008 From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady) Date: Thu, 11 Dec 2008 10:01:55 -0500 Subject: [ofa-general] mlx4 support for fast_reg_mr In-Reply-To: <4940D38E.2000700@mellanox.co.il> References: <4940D38E.2000700@mellanox.co.il> Message-ID: Tziporet, >From which version of kernel.org there is support for FMRs for mlx4? Thanks, Arkady Kanevsky email: arkady at netapp.com NetApp phone: 781-768-5395 1601 Trapelo Rd. - Suite 16. Fax: 781-895-1195 Waltham, MA 02451 central phone: 781-768-5300 -----Original Message----- From: Tziporet Koren [mailto:tziporet at dev.mellanox.co.il] Sent: Thursday, December 11, 2008 3:47 AM To: Lentini, James Cc: rdreier at cisco.com; general at lists.openfabrics.org Subject: Re: [ofa-general] mlx4 support for fast_reg_mr James Lentini wrote: > The mlx4 code to support the fast_reg_mr API is upstream, but the last > post I found reported that the firmware was not ready: > > http://lists.openfabrics.org/pipermail/general/2008-July/052931.html > > What is the status of mlx4 firmware support for the fast_reg_mr API? > > FMRs perfectly working with mlx4 and FW 2.5.0 Bets if you use ofed 1.4 just released, but also 1.3.1 will do the work Tziporet _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From jlentini at netapp.com Thu Dec 11 07:06:39 2008 From: jlentini at netapp.com (James Lentini) Date: Thu, 11 Dec 2008 10:06:39 -0500 (EST) Subject: [ofa-general] mlx4 support for fast_reg_mr In-Reply-To: <4940D38E.2000700@mellanox.co.il> References: <4940D38E.2000700@mellanox.co.il> Message-ID: On Thu, 11 Dec 2008, Tziporet Koren wrote: > James Lentini wrote: > > The mlx4 code to support the fast_reg_mr API is upstream, but the last post > > I found reported that the firmware was not ready: > > http://lists.openfabrics.org/pipermail/general/2008-July/052931.html > > What is the status of mlx4 firmware support for the fast_reg_mr API? > > > > > > FMRs perfectly working with mlx4 and FW 2.5.0 > Bets if you use ofed 1.4 just released, but also 1.3.1 will do the work To be clear, I'm referring to the fast_reg_mr APIs [ib_alloc_fast_reg_mr(), ib_alloc_fast_reg_page_list(), and the ib_send_wr's fast_reg type], not the FMR APIs [ib_alloc_fmr(),ib_map_phys_fmr(), etc]. Are the fast_reg_mr APIs supported with mlx4 and FW 2.5.0? From Shainer at Mellanox.com Thu Dec 11 07:17:46 2008 From: Shainer at Mellanox.com (Gilad Shainer) Date: Thu, 11 Dec 2008 07:17:46 -0800 Subject: [ofa-general] Infiniband performance Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F017A4199@mtiexch01.mti.com> On the maximum BW you are correct - IB is capable for 16Gb/s data rate. You are seeing 12Gb/s due to the host chipset bandwidth limitation. Gilad. -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Jan Ruffing Sent: Thursday, December 11, 2008 1:56 AM To: general at lists.openfabrics.org Subject: [ofa-general] Infiniband performance Hello, I'm new to Infiniband and still trying to get a grasp on what performance it can realistically deliver. The two directly connected test machines have Mellanox Infinihost III Lx DDR HCA cards installed and run OpenSuse 11 with a 2.6.25.16 Kernel. 1) Maximum Bandwidth? Infiniband (Double Data Rate, 4x lane) is advertised with a bandwidth of 20 Gbit/s. If my understanding is correct, this is only the signal rate, which would translate to a 16/Gbit/s data rate due to 8:10 encryption? The maximum speed I meassured so far was 12Gbit/s on the low-level-Protocolls: tamara /home/ruffing> ibv_rc_pingpong -m 2048 -s 1048576 -n 10000 local address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302 remote address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba 20971520000 bytes in 13.63 seconds = 12313.27 Mbit/sec 10000 iters in 13.63 seconds = 1362.53 usec/iter melissa Dokumente/Infiniband> ibv_rc_pingpong 192.168.2.1 -m 2048 -s 1048576 -n 10000 local address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba remote address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302 20971520000 bytes in 13.63 seconds = 12313.38 Mbit/sec 10000 iters in 13.63 seconds = 1362.52 usec/iter Maximal user-level bandwidth was 11.5 GBit/s using RDMA: ruffing at melissa:~/Dokumente/Infiniband/NetPIPE-3.7.1> ./NPibv -m 2048 -t rdma_write -c local_poll -h 192.168.2.1 -n 100 Using RDMA Write communications Using local polling completion Preposting asynchronous receives (required for Infiniband) Now starting the main loop [...] 121: 8388605 bytes 100 times --> 11851.72 Mbps in 5400.06 usec 122: 8388608 bytes 100 times --> 11851.66 Mbps in 5400.09 usec 123: 8388611 bytes 100 times --> 11850.62 Mbps in 5400.57 usec That's actually 4 Gbit/s short of what I was hoping for. Yet I couldn't find any test results on the net that yielded more than 12 GBit/s on 4x DDR-HCAs. Where does this performance loss stem from? On first view, 4 GBit/s (25% of the data rate) looks quite a lot to be only protocol overhead... Is 12 GBit/s the current maximum bandwidth, or is it possible for Infiniband users to improve performance beyond that? 2) TCP (over IPoIB) vs. RDMA/SDP/uverbs? On the first Infiniband installation using the packages of the OpenSuse 11 distribution, I got a TCP bandwidth of 10 GBit/s. (Which actually isn't that bad when compared to a meassured maximal bandwidth of 12 GBit/s.) This installation did neither support RDMA nor SDP, though. tamara iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 3M ------------------------------------------------------------ Client connecting to 192.168.2.2, TCP port 5001 TCP window size: 515 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.1 port 47730 connected with 192.168.2.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 11.6 GBytes 10.0 Gbits/sec After I installed the OFED 1.4 beta to be able to use SDP, RDMA and uverbs, I could use them to get of 12 GBit/s. Yet the TCP rate dropped by 2-3 GBit/s to 7-8 GBit/s. ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 10M ------------------------------------------------------------ Client connecting to 192.168.2.2, TCP port 5001 TCP window size: 193 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.1 port 51988 connected with 192.168.2.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 8.16 GBytes 7.00 Gbits/sec What could have caused this loss of bandwidth? Is there a way to avoid it? Obviously, this could be a show stopper (for me) as far as native Infiniband protocolls are concerned: Gaining 2 GBit/sec under special circumstances probably won't outweigh loosing 3 GBit/s during normal use. 3) SDP performance The SDP performance (using preloading of libsdp.so) only meassured 6.2 GBit/s, even underperforming TCP: ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> LD_PRELOAD=/usr/lib/libsdp.so LIBSDP_CONFIG_FILE=/etc/libsdp.conf ./iperf -c 192.168.2.2 -l 10M ------------------------------------------------------------ Client connecting to 192.168.2.2, TCP port 5001 TCP window size: 16.0 MByte (default) ------------------------------------------------------------ [ 4] local 192.168.2.1 port 36832 connected with 192.168.2.2 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 7.22 GBytes 6.20 Gbits/sec /etc/libsdp.conf consits of the following two lines: use both server * *:* use both client * *:* I have a hard time believing that's the max rate of SDP. (Even if Cisco meassured similar 6.6 GBit/s: https://www.cisco.com/en/US/docs/server_nw_virtual/commercial_host_driver/host_driver_linux/user/guide/sdp.html#wp948100) Did I mess up my Infiniband installation, or is SDP really slower than TCP over IPoIB? Sorry if my mail might sound somewhat negative, but I'm still trying to get past the marketing buzz and figure out what to realisticly expect of Infiniband. Currently, I'm still hoping that I messed up my installation somewhere, and that a few pointers in the right direction might resolve most of the issues... :) Thanks in advance, Jan Ruffing Devices: tamara /dev/infiniband> ls -la total 0 drwxr-xr-x 2 root root 140 2008-12-02 16:20 . drwxr-xr-x 13 root root 4580 2008-12-09 14:59 .. crw-rw---- 1 root root 231, 64 2008-12-02 16:20 issm0 crw-rw-rw- 1 root users 10, 59 2008-11-27 10:24 rdma_cm crw-rw---- 1 root root 231, 0 2008-12-02 16:20 umad0 crw-rw-rw- 1 root users 231, 192 2008-11-27 10:15 uverbs0 crw-rw---- 1 root users 231, 193 2008-11-27 10:15 uverbs1 Installed Packages: Build ofa_kernel RPM Install kernel-ib RPM: Build ofed-scripts RPM Install ofed-scripts RPM: Install libibverbs RPM: Install libibverbs-devel RPM: Install libibverbs-devel-static RPM: Install libibverbs-utils RPM: Install libmthca RPM: Install libmthca-devel-static RPM: Install libmlx4 RPM: Install libmlx4-devel RPM: Install libcxgb3 RPM: Install libcxgb3-devel RPM: Install libnes RPM: Install libnes-devel-static RPM: Install libibcm RPM: Install libibcm-devel RPM: Install libibcommon RPM: Install libibcommon-devel RPM: Install libibcommon-static RPM: Install libibumad RPM: Install libibumad-devel RPM: Install libibumad-static RPM: Build libibmad RPM Install libibmad RPM: Install libibmad-devel RPM: Install libibmad-static RPM: Install ibsim RPM: Install librdmacm RPM: Install librdmacm-utils RPM: Install librdmacm-devel RPM: Install libsdp RPM: Install libsdp-devel RPM: Install opensm-libs RPM: Install opensm RPM: Install opensm-devel RPM: Install opensm-static RPM: Install compat-dapl RPM: Install compat-dapl-devel RPM: Install dapl RPM: Install dapl-devel RPM: Install dapl-devel-static RPM: Install dapl-utils RPM: Install perftest RPM: Install mstflint RPM: Install sdpnetstat RPM: Install srptools RPM: Install rds-tools RPM: (installed ibutils manually) Loaded Modules: (libsdp currently unloaded) Directory: /home/ruffing tamara /home/ruffing> lsmod | grep ib ib_addr 24580 1 rdma_cm ib_ipoib 97576 0 ib_cm 53584 2 rdma_cm,ib_ipoib ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm ib_uverbs 56884 1 rdma_ucm ib_umad 32016 4 mlx4_ib 79884 0 mlx4_core 114924 1 mlx4_ib ib_mthca 148924 0 ib_mad 53400 5 ib_cm,ib_sa,ib_umad,mlx4_ib,ib_mthca ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad ipv6 281064 23 ib_ipoib rtc_lib 19328 1 rtc_core libata 176604 2 ata_piix,pata_it8213 scsi_mod 168436 4 sr_mod,sg,sd_mod,libata dock 27536 1 libata tamara /home/ruffing> lsmod | grep rdma rdma_ucm 30248 0 rdma_cm 49544 1 rdma_ucm iw_cm 25988 1 rdma_cm ib_addr 24580 1 rdma_cm ib_cm 53584 2 rdma_cm,ib_ipoib ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm ib_uverbs 56884 1 rdma_ucm ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad -- Jan Ruffing Software Developer Motama GmbH Lortzingstraße 10 · 66111 Saarbrücken · Germany tel +49 681 940 85 50 · fax +49 681 940 85 49 ruffing at motama.com · www.motama.com Companies register · district council Saarbrücken · HRB 15249 CEOs · Dr.-Ing. Marco Lohse, Michael Repplinger This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From Robert at saq.co.uk Thu Dec 11 07:28:11 2008 From: Robert at saq.co.uk (Robert Dunkley) Date: Thu, 11 Dec 2008 15:28:11 -0000 Subject: [ofa-general] Infiniband performance References: <9FA59C95FFCBB34EA5E42C1A8573784F017A4199@mtiexch01.mti.com> Message-ID: Hi, I have some Opteron systems with 20Gb Mellanox Gen3 cards and get very similar RDMA results of just over 11Gbit / sec (Centos 5.2 with OFED 1.3.1). The newer gen4 20Gb cards get about 14-15Gbit/sec because of their better lower latency processing and support for higher bandwidth PCI-E 2.0 (PCI-E is not 100% efficient so a 20Gbit PCIE connect will hold back a 20Gbit IC). Rob -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Gilad Shainer Sent: 11 December 2008 15:18 To: Jan Ruffing; general at lists.openfabrics.org Subject: RE: [ofa-general] Infiniband performance On the maximum BW you are correct - IB is capable for 16Gb/s data rate. You are seeing 12Gb/s due to the host chipset bandwidth limitation. Gilad. -----Original Message----- From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Jan Ruffing Sent: Thursday, December 11, 2008 1:56 AM To: general at lists.openfabrics.org Subject: [ofa-general] Infiniband performance Hello, I'm new to Infiniband and still trying to get a grasp on what performance it can realistically deliver. The two directly connected test machines have Mellanox Infinihost III Lx DDR HCA cards installed and run OpenSuse 11 with a 2.6.25.16 Kernel. 1) Maximum Bandwidth? Infiniband (Double Data Rate, 4x lane) is advertised with a bandwidth of 20 Gbit/s. If my understanding is correct, this is only the signal rate, which would translate to a 16/Gbit/s data rate due to 8:10 encryption? The maximum speed I meassured so far was 12Gbit/s on the low-level-Protocolls: tamara /home/ruffing> ibv_rc_pingpong -m 2048 -s 1048576 -n 10000 local address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302 remote address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba 20971520000 bytes in 13.63 seconds = 12313.27 Mbit/sec 10000 iters in 13.63 seconds = 1362.53 usec/iter melissa Dokumente/Infiniband> ibv_rc_pingpong 192.168.2.1 -m 2048 -s 1048576 -n 10000 local address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba remote address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302 20971520000 bytes in 13.63 seconds = 12313.38 Mbit/sec 10000 iters in 13.63 seconds = 1362.52 usec/iter Maximal user-level bandwidth was 11.5 GBit/s using RDMA: ruffing at melissa:~/Dokumente/Infiniband/NetPIPE-3.7.1> ./NPibv -m 2048 -t rdma_write -c local_poll -h 192.168.2.1 -n 100 Using RDMA Write communications Using local polling completion Preposting asynchronous receives (required for Infiniband) Now starting the main loop [...] 121: 8388605 bytes 100 times --> 11851.72 Mbps in 5400.06 usec 122: 8388608 bytes 100 times --> 11851.66 Mbps in 5400.09 usec 123: 8388611 bytes 100 times --> 11850.62 Mbps in 5400.57 usec That's actually 4 Gbit/s short of what I was hoping for. Yet I couldn't find any test results on the net that yielded more than 12 GBit/s on 4x DDR-HCAs. Where does this performance loss stem from? On first view, 4 GBit/s (25% of the data rate) looks quite a lot to be only protocol overhead... Is 12 GBit/s the current maximum bandwidth, or is it possible for Infiniband users to improve performance beyond that? 2) TCP (over IPoIB) vs. RDMA/SDP/uverbs? On the first Infiniband installation using the packages of the OpenSuse 11 distribution, I got a TCP bandwidth of 10 GBit/s. (Which actually isn't that bad when compared to a meassured maximal bandwidth of 12 GBit/s.) This installation did neither support RDMA nor SDP, though. tamara iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 3M ------------------------------------------------------------ Client connecting to 192.168.2.2, TCP port 5001 TCP window size: 515 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.1 port 47730 connected with 192.168.2.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 11.6 GBytes 10.0 Gbits/sec After I installed the OFED 1.4 beta to be able to use SDP, RDMA and uverbs, I could use them to get of 12 GBit/s. Yet the TCP rate dropped by 2-3 GBit/s to 7-8 GBit/s. ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 10M ------------------------------------------------------------ Client connecting to 192.168.2.2, TCP port 5001 TCP window size: 193 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.2.1 port 51988 connected with 192.168.2.2 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 8.16 GBytes 7.00 Gbits/sec What could have caused this loss of bandwidth? Is there a way to avoid it? Obviously, this could be a show stopper (for me) as far as native Infiniband protocolls are concerned: Gaining 2 GBit/sec under special circumstances probably won't outweigh loosing 3 GBit/s during normal use. 3) SDP performance The SDP performance (using preloading of libsdp.so) only meassured 6.2 GBit/s, even underperforming TCP: ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> LD_PRELOAD=/usr/lib/libsdp.so LIBSDP_CONFIG_FILE=/etc/libsdp.conf ./iperf -c 192.168.2.2 -l 10M ------------------------------------------------------------ Client connecting to 192.168.2.2, TCP port 5001 TCP window size: 16.0 MByte (default) ------------------------------------------------------------ [ 4] local 192.168.2.1 port 36832 connected with 192.168.2.2 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 7.22 GBytes 6.20 Gbits/sec /etc/libsdp.conf consits of the following two lines: use both server * *:* use both client * *:* I have a hard time believing that's the max rate of SDP. (Even if Cisco meassured similar 6.6 GBit/s: https://www.cisco.com/en/US/docs/server_nw_virtual/commercial_host_driver/host_driver_linux/user/guide/sdp.html#wp948100) Did I mess up my Infiniband installation, or is SDP really slower than TCP over IPoIB? Sorry if my mail might sound somewhat negative, but I'm still trying to get past the marketing buzz and figure out what to realisticly expect of Infiniband. Currently, I'm still hoping that I messed up my installation somewhere, and that a few pointers in the right direction might resolve most of the issues... :) Thanks in advance, Jan Ruffing Devices: tamara /dev/infiniband> ls -la total 0 drwxr-xr-x 2 root root 140 2008-12-02 16:20 . drwxr-xr-x 13 root root 4580 2008-12-09 14:59 .. crw-rw---- 1 root root 231, 64 2008-12-02 16:20 issm0 crw-rw-rw- 1 root users 10, 59 2008-11-27 10:24 rdma_cm crw-rw---- 1 root root 231, 0 2008-12-02 16:20 umad0 crw-rw-rw- 1 root users 231, 192 2008-11-27 10:15 uverbs0 crw-rw---- 1 root users 231, 193 2008-11-27 10:15 uverbs1 Installed Packages: Build ofa_kernel RPM Install kernel-ib RPM: Build ofed-scripts RPM Install ofed-scripts RPM: Install libibverbs RPM: Install libibverbs-devel RPM: Install libibverbs-devel-static RPM: Install libibverbs-utils RPM: Install libmthca RPM: Install libmthca-devel-static RPM: Install libmlx4 RPM: Install libmlx4-devel RPM: Install libcxgb3 RPM: Install libcxgb3-devel RPM: Install libnes RPM: Install libnes-devel-static RPM: Install libibcm RPM: Install libibcm-devel RPM: Install libibcommon RPM: Install libibcommon-devel RPM: Install libibcommon-static RPM: Install libibumad RPM: Install libibumad-devel RPM: Install libibumad-static RPM: Build libibmad RPM Install libibmad RPM: Install libibmad-devel RPM: Install libibmad-static RPM: Install ibsim RPM: Install librdmacm RPM: Install librdmacm-utils RPM: Install librdmacm-devel RPM: Install libsdp RPM: Install libsdp-devel RPM: Install opensm-libs RPM: Install opensm RPM: Install opensm-devel RPM: Install opensm-static RPM: Install compat-dapl RPM: Install compat-dapl-devel RPM: Install dapl RPM: Install dapl-devel RPM: Install dapl-devel-static RPM: Install dapl-utils RPM: Install perftest RPM: Install mstflint RPM: Install sdpnetstat RPM: Install srptools RPM: Install rds-tools RPM: (installed ibutils manually) Loaded Modules: (libsdp currently unloaded) Directory: /home/ruffing tamara /home/ruffing> lsmod | grep ib ib_addr 24580 1 rdma_cm ib_ipoib 97576 0 ib_cm 53584 2 rdma_cm,ib_ipoib ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm ib_uverbs 56884 1 rdma_ucm ib_umad 32016 4 mlx4_ib 79884 0 mlx4_core 114924 1 mlx4_ib ib_mthca 148924 0 ib_mad 53400 5 ib_cm,ib_sa,ib_umad,mlx4_ib,ib_mthca ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad ipv6 281064 23 ib_ipoib rtc_lib 19328 1 rtc_core libata 176604 2 ata_piix,pata_it8213 scsi_mod 168436 4 sr_mod,sg,sd_mod,libata dock 27536 1 libata tamara /home/ruffing> lsmod | grep rdma rdma_ucm 30248 0 rdma_cm 49544 1 rdma_ucm iw_cm 25988 1 rdma_cm ib_addr 24580 1 rdma_cm ib_cm 53584 2 rdma_cm,ib_ipoib ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm ib_uverbs 56884 1 rdma_ucm ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad -- Jan Ruffing Software Developer Motama GmbH Lortzingstraße 10 · 66111 Saarbrücken · Germany tel +49 681 940 85 50 · fax +49 681 940 85 49 ruffing at motama.com · www.motama.com Companies register · district council Saarbrücken · HRB 15249 CEOs · Dr.-Ing. Marco Lohse, Michael Repplinger This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general The SAQ Group Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ SAQ is the trading name of SEMTEC Limited. Registered in England & Wales Company Number: 06481952 http://www.saqnet.co.uk AS29219 SAQ Group Delivers high quality, honestly priced communication and I.T. services to UK Business. Broadband : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : Backups : Managed Networks : Remote Support. ISPA Member Find us in http://www.thebestof.co.uk/petersfield From ruffing at motama.com Thu Dec 11 07:34:37 2008 From: ruffing at motama.com (Jan Ruffing) Date: Thu, 11 Dec 2008 16:34:37 +0100 Subject: [ofa-general] Infiniband performance In-Reply-To: References: <4940E39A.5090802@motama.com> Message-ID: <4941330D.9090404@motama.com> Chuck Hartley wrote: > What kind of machines are the cards installed in? - Intel Xeon Quadcore; CPU E5420 @ 2.50GHz - 4 GB RAM - Super X7DCL Mainboard (The 10 GBit/s TCP bandwidth has also been meassured on superior 64 bit machines running Ubuntu 8.04, also using Mellanox Infinihost III Lx DDR HCA cards in a 8x PCIe slots, when using a basic Infiniband installation. When trying to set up Infiniband by manually installing the OFED 1.3. packages on one of the machines, the maximum bandwidth meassured locally was also 12 GBits/s. The trial-and-error installation ran into some problems when trying to set up the more advanced Infiniband features, though. So the decision was made to test Infiniband on a system that supported the OFED install script to get a cleaner installation; the results have been summarized in the previous mail.) > Do you have the cards in 8x PCIe slots? The cards are in 8x PCIe slots. BIOS settings for the slots are as follows: 8x PCIe, Slot 6: Option ROM Scan [Enabled] Enable Master [Enabled] Latency Timer [Default] No other PCI(e)-Slots have any cards installed. (According to the manual, "Each PCI Express port on the MCH provides 4 GB/s bidirectional bandwidth if configured as a x8 port[...]". This would equal 32 Gbit/s, and even if unidirectional speed was half of that (I'm no expert on the PCIe bus), it would still be above the meassured Infiniband bandwidth by about 4 Gbit/s unidirectional. So I'm somewhat hesitant to assume PCI Express as the bottleneck for an Infiniband bandwidth of 12 GBit/s.) -- Jan Ruffing Software Developer Motama GmbH Lortzingstraße 10 · 66111 Saarbrücken · Germany tel +49 681 940 85 50 · fax +49 681 940 85 49 ruffing at motama.com · www.motama.com Companies register · district council Saarbrücken · HRB 15249 CEOs · Dr.-Ing. Marco Lohse, Michael Repplinger This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. From swise at opengridcomputing.com Thu Dec 11 07:54:40 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Thu, 11 Dec 2008 09:54:40 -0600 Subject: [ofa-general] Test modules for ib_core In-Reply-To: <4940FB22.9090809@ext.bull.net> References: <4940D767.8090200@ext.bull.net> <4940EBB5.6030702@mellanox.co.il> <4940FB22.9090809@ext.bull.net> Message-ID: <494137C0.3070503@opengridcomputing.com> >> >> There is krping test: >> http://www.openfabrics.org/git/?p=~swise/krping.git;a=summary >> Steve Wise developed it >> >> Tziporet >> >> > Great ! > > That's exactly what I needed !! > > Best Regards > > Nicolas If you make any enhancements or bug fixes to krping, please email out the patches and I'll review/merge them in if they look good. Steve From ruffing at motama.com Thu Dec 11 08:17:36 2008 From: ruffing at motama.com (Jan Ruffing) Date: Thu, 11 Dec 2008 17:17:36 +0100 Subject: [ofa-general] Infiniband performance In-Reply-To: References: <9FA59C95FFCBB34EA5E42C1A8573784F017A4199@mtiexch01.mti.com> Message-ID: <49413D20.1040805@motama.com> Robert Dunkley wrote: > The newer gen4 20Gb cards get about 14-15Gbit/sec because of their better lower latency processing and support for higher bandwidth PCI-E 2.0 (PCI-E is not 100% efficient so a 20Gbit PCIE connect will hold back a 20Gbit IC). > > Thank you. What TCP bandwidth did you get with the gen4 cards? Jan -- Jan Ruffing Software Developer Motama GmbH Lortzingstraße 10 · 66111 Saarbrücken · Germany tel +49 681 940 85 50 · fax +49 681 940 85 49 ruffing at motama.com · www.motama.com Companies register · district council Saarbrücken · HRB 15249 CEOs · Dr.-Ing. Marco Lohse, Michael Repplinger This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. From sean.hefty at intel.com Thu Dec 11 11:18:57 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 11 Dec 2008 11:18:57 -0800 Subject: [ofa-general] porting IB management code to Windows Message-ID: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> Sasha, We've started porting the IB management code (IB-diags at this point) to Windows. My strong preference is to avoid branching the code and instead keep a single source code tree. Is there any objection to accepting changes against the management tree to allow the code to run on both Linux and Windows? (We can figure out the logistics of build related files later. I'm most concerned about the code itself.) The patch below gives an example of the changes needed to make this happen. Most are a result of compiler differences. - Sean --- infiniband-diags-1.4.2\src\sminfo.c 2008-10-19 11:34:42.000000000 -0700 +++ scm\winof\branches\winverbs\tools\infiniband_diags\src\sminfo.c 2008-12-10 15:06:01.096000000 -0800 @@ -37,12 +37,19 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#include +#include "..\..\..\..\etc\user\getopt.c" +#include "..\ibdiag_common.c" +#else #include #include #include #include +#endif -#include #include #include @@ -72,13 +79,13 @@ enum { }; char *statestr[] = { - [SMINFO_NOTACT] "SMINFO_NOTACT", - [SMINFO_DISCOVER] "SMINFO_DISCOVER", - [SMINFO_STANDBY] "SMINFO_STANDBY", - [SMINFO_MASTER] "SMINFO_MASTER", + "SMINFO_NOTACT", + "SMINFO_DISCOVER", + "SMINFO_STANDBY", + "SMINFO_MASTER", }; -#define STATESTR(s) (((uint)(s)) < SMINFO_STATE_LAST ? statestr[s] : "???") +#define STATESTR(s) (((unsigned int)(s)) < SMINFO_STATE_LAST ? statestr[s] : "???") int main(int argc, char **argv) @@ -88,7 +95,7 @@ main(int argc, char **argv) ib_portid_t portid = {0}; int timeout = 0; /* use default */ uint8_t *p; - uint act = 0; + unsigned int act = 0; int prio = 0, state = SMINFO_STANDBY; uint64_t guid = 0, key = 0; extern int ibdebug; @@ -97,8 +104,8 @@ main(int argc, char **argv) char *ca = 0; int ca_port = 0; - static char const str_opts[] = "C:P:t:s:p:a:deDGVhu"; - static const struct option long_opts[] = { + static char str_opts[] = "C:P:t:s:p:a:deDGVhu"; + static struct option long_opts[] = { { "C", 1, 0, 'C'}, { "P", 1, 0, 'P'}, { "debug", 0, 0, 'd'}, @@ -112,7 +119,7 @@ main(int argc, char **argv) { "timeout", 1, 0, 't'}, { "help", 0, 0, 'h'}, { "usage", 0, 0, 'u'}, - { } + { 0 } }; argv0 = argv[0]; @@ -188,7 +195,7 @@ main(int argc, char **argv) if (mod) { if (!(p = smp_set(sminfo, &portid, IB_ATTR_SMINFO, mod, timeout))) - IBERROR("query"); + IBERROR("set"); } else if (!(p = smp_query(sminfo, &portid, IB_ATTR_SMINFO, 0, timeout))) IBERROR("query"); From jgunthorpe at obsidianresearch.com Thu Dec 11 12:36:49 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Thu, 11 Dec 2008 13:36:49 -0700 Subject: [ofa-general] porting IB management code to Windows In-Reply-To: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> Message-ID: <20081211203649.GO31451@obsidianresearch.com> On Thu, Dec 11, 2008 at 11:18:57AM -0800, Sean Hefty wrote: > We've started porting the IB management code (IB-diags at this > point) to Windows. My strong preference is to avoid branching the > code and instead keep a single source code tree. Is there any > objection to accepting changes against the management tree to allow > the code to run on both Linux and Windows? (We can figure out the > logistics of build related files later. I'm most concerned about > the code itself.) Just to chime in here with some past experience.. Is there any way it would be acceptable to use gcc (or even the Intel compiler) as the mandatory Windows C compiler? That would save everyone alot of ongoing hassle. MS does not maintain the C compiler portion of VC++ and it is very old standards wise, half your changes in this patch are due to it not supporting C99. So, really what you are proposing is to abandon all modern C constructs in the offical source tree :| Some of this is acutally harmful run-time wise (like removing const on the static variables) and harmful maintenance wise (removing C99 named initalizers) Jason From sean.hefty at intel.com Thu Dec 11 12:50:46 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 11 Dec 2008 12:50:46 -0800 Subject: [ofa-general] porting IB management code to Windows In-Reply-To: <20081211203649.GO31451@obsidianresearch.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081211203649.GO31451@obsidianresearch.com> Message-ID: <000e01c95bd2$246924a0$1e58180a@amr.corp.intel.com> >Just to chime in here with some past experience.. Is there any way it >would be acceptable to use gcc (or even the Intel compiler) as the >mandatory Windows C compiler? That would save everyone alot of >ongoing hassle. MS does not maintain the C compiler portion of VC++ >and it is very old standards wise, half your changes in this patch are >due to it not supporting C99. WinOF builds using the WDK build environment. I personally have no issue with using a different compiler, but the WWG would need to decide on that sort of change. - Sean From sean.hefty at intel.com Thu Dec 11 12:53:16 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 11 Dec 2008 12:53:16 -0800 Subject: FW: [ofa-general] porting IB management code to Windows Message-ID: <000f01c95bd2$7dca27b0$1e58180a@amr.corp.intel.com> Sorry about this, but resending to include the WinOF mailing list... We've started porting the IB management code (IB-diags at this point) to Windows. My strong preference is to avoid branching the code and instead keep a single source code tree. Is there any objection to accepting changes against the management tree to allow the code to run on both Linux and Windows? (We can figure out the logistics of build related files later. I'm most concerned about the code itself.) The patch below gives an example of the changes needed to make this happen. Most are a result of compiler differences. - Sean --- infiniband-diags-1.4.2\src\sminfo.c 2008-10-19 11:34:42.000000000 -0700 +++ scm\winof\branches\winverbs\tools\infiniband_diags\src\sminfo.c 2008-12-10 15:06:01.096000000 -0800 @@ -37,12 +37,19 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#include +#include "..\..\..\..\etc\user\getopt.c" +#include "..\ibdiag_common.c" +#else #include #include #include #include +#endif -#include #include #include @@ -72,13 +79,13 @@ enum { }; char *statestr[] = { - [SMINFO_NOTACT] "SMINFO_NOTACT", - [SMINFO_DISCOVER] "SMINFO_DISCOVER", - [SMINFO_STANDBY] "SMINFO_STANDBY", - [SMINFO_MASTER] "SMINFO_MASTER", + "SMINFO_NOTACT", + "SMINFO_DISCOVER", + "SMINFO_STANDBY", + "SMINFO_MASTER", }; -#define STATESTR(s) (((uint)(s)) < SMINFO_STATE_LAST ? statestr[s] : "???") +#define STATESTR(s) (((unsigned int)(s)) < SMINFO_STATE_LAST ? statestr[s] : "???") int main(int argc, char **argv) @@ -88,7 +95,7 @@ main(int argc, char **argv) ib_portid_t portid = {0}; int timeout = 0; /* use default */ uint8_t *p; - uint act = 0; + unsigned int act = 0; int prio = 0, state = SMINFO_STANDBY; uint64_t guid = 0, key = 0; extern int ibdebug; @@ -97,8 +104,8 @@ main(int argc, char **argv) char *ca = 0; int ca_port = 0; - static char const str_opts[] = "C:P:t:s:p:a:deDGVhu"; - static const struct option long_opts[] = { + static char str_opts[] = "C:P:t:s:p:a:deDGVhu"; + static struct option long_opts[] = { { "C", 1, 0, 'C'}, { "P", 1, 0, 'P'}, { "debug", 0, 0, 'd'}, @@ -112,7 +119,7 @@ main(int argc, char **argv) { "timeout", 1, 0, 't'}, { "help", 0, 0, 'h'}, { "usage", 0, 0, 'u'}, - { } + { 0 } }; argv0 = argv[0]; @@ -188,7 +195,7 @@ main(int argc, char **argv) if (mod) { if (!(p = smp_set(sminfo, &portid, IB_ATTR_SMINFO, mod, timeout))) - IBERROR("query"); + IBERROR("set"); } else if (!(p = smp_query(sminfo, &portid, IB_ATTR_SMINFO, 0, timeout))) IBERROR("query"); _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From jenos at ncsa.uiuc.edu Thu Dec 11 13:13:06 2008 From: jenos at ncsa.uiuc.edu (Jeremy Enos) Date: Thu, 11 Dec 2008 15:13:06 -0600 Subject: [ofa-general] ipoib device not loading? Message-ID: <49418262.3070801@ncsa.uiuc.edu> After building and installing the 20081209-0926 daily snapshot on x86_64 Fedora Core 9 (fully updated), I get this error when trying to ifup ib0: Bringing up interface ib0: ib_ipoib device ib0 does not seem to be present, delaying initialization. I ran the IPoIB configuration and it generated an ifcfg-ib0 file as expected. What's missing? btw- I tried previous releases and none would even build on fc9 until I got into the 1.4 release candidates.. rc6 built but wouldn't run opensmd. The daily build of 12/9 was the first I found that I could build and run opensmd on, but now I still have this ipoib issue. I've included ofed_info output (and other detail) below, including the error at the end. thx- Jeremy ofed_info OFED-1.4-20081209-0926 libibverbs: git://git.openfabrics.org/ofed_1_4/libibverbs.git ofed_1_4 commit b00dc7d2f79e0660ac40160607c9c4937a895433 libmthca: git://git.kernel.org/pub/scm/libs/infiniband/libmthca.git master commit be5eef3895eb7864db6395b885a19f770fde7234 libmlx4: git://git.openfabrics.org/ofed_1_4/libmlx4.git ofed_1_4 commit bd28f5307c3782b41cf6dfcbb6714df03c9f7025 libehca: git://git.openfabrics.org/ofed_1_4/libehca.git ofed_1_4 commit e0c2d7e8ee2aa5dd3f3511270521fb0c206167c6 libipathverbs: git://git.openfabrics.org/~ralphc/libipathverbs ofed_1_4 commit 65e5701dbe7b511f796cb0026b0cd51831a62318 libcxgb3: git://git.openfabrics.org/~swise/libcxgb3.git ofed_1_4 commit f685c8fe7e77e64614d825e563dd9f02a0b1ae16 libnes: git://git.openfabrics.org/~glenn/libnes.git master commit 07fb9dfbbb36b28b5ea6caa14a1a5e215386b3e8 libibcm: git://git.openfabrics.org/~shefty/libibcm.git master commit 7fb57e005b3eae2feb83b3fd369aeba700a5bcf8 librdmacm: git://git.openfabrics.org/~shefty/librdmacm.git master commit e0b1ece1dc0518b2a5232872e0c48d3e2e354e47 libsdp: git://git.openfabrics.org/ofed_1_4/libsdp.git ofed_1_4 commit 02404fb0266082f5b64412c3c25a71cb9d39442d sdpnetstat: git://git.openfabrics.org/~amirv/sdpnetstat.git ofed_1_4 commit 75a033a9512127449f141411b0b7516f72351f95 srptools: git://git.openfabrics.org/ofed_1_3/srptools.git ofed_1_3 commit d3025d0771317584e51490a419a79ab55650ebc9 perftest: git://git.openfabrics.org/~orenmeron/perftest.git master commit ca629627c7a26005a1a4c8775cc01f483524f1c4 qlvnictools: git://git.openfabrics.org/~ramachandrak/qlvnictools.git ofed_1_4 commit 1dc6e51a728cbfbdd2018260602b8bebde618da9 tvflash: git://git.openfabrics.org/ofed_1_4/tvflash.git ofed_1_4 commit e1b50b3b8af52b0bc55b2825bb4d6ce699d5c43b mstflint: git://git.openfabrics.org/~orenk/mstflint.git master commit 9ddeea464e946cd425e05b0d1fdd9ec003fca824 qperf: git://git.openfabrics.org/~johann/qperf.git/.git master commit bee05d35b09b0349cf4734ae43fc9c2e970ada8c ibutils: git://git.openfabrics.org/~orenk/ibutils.git master commit 6516d16e815c68fa405562ea773b0c5215c1b70c ibsim: git://git.openfabrics.org/~sashak/ibsim.git master commit a76132ae36dde8302552d896e35bd29608ac9524 ofa_kernel-1.4: Git: git://git.openfabrics.org/ofed_1_4/linux-2.6.git ofed_kernel commit 88ab7955605c5e769e760f6bec980e0c2e72aa5c # MPI mvapich-1.1.0-3143.src.rpm mvapich2-1.2p1-1.src.rpm openmpi-1.2.8-1.src.rpm mpitests-3.1-891.src.rpm [root at host OFED-1.4-20081209-0926]# cat /etc/*release Fedora release 9 (Sulphur) Fedora release 9 (Sulphur) Fedora release 9 (Sulphur) [root at host OFED-1.4-20081209-0926]# uname -a Linux host 2.6.27.5-41.fc9.x86_64 #1 SMP Thu Nov 13 20:29:07 EST 2008 x86_64 x86_64 x86_64 GNU/Linux [root at host OFED-1.4-20081209-0926]# cat /etc/sysconfig/network-scripts/ifcfg-ib0 DEVICE=ib0 BOOTPROTO=static ONBOOT=yes BROADCAST=192.168.2.255 IPADDR=192.168.2.254 NETMASK=255.255.255.0 NETWORK=192.168.2.0 [root at host OFED-1.4-20081209-0926]# service openibd status HCA driver loaded The following OFED modules are loaded: ib_ipath mlx4_core mlx4_ib ib_mthca ib_uverbs ib_umad ib_sa ib_cm ib_mad ib_core iw_cxgb3 [root at host OFED-1.4-20081209-0926]# ifup ib0 ib_ipoib device ib0 does not seem to be present, delaying initialization. From jenos at ncsa.uiuc.edu Thu Dec 11 13:22:56 2008 From: jenos at ncsa.uiuc.edu (Jeremy Enos) Date: Thu, 11 Dec 2008 15:22:56 -0600 Subject: [ofa-general] ipoib device not loading? In-Reply-To: <49418262.3070801@ncsa.uiuc.edu> References: <49418262.3070801@ncsa.uiuc.edu> Message-ID: <494184B0.30703@ncsa.uiuc.edu> Forgot to include this info, in case it's pertinent: [root at host OFED-1.4-20081209-0926]# ibstat CA 'mthca0' CA type: MT25204 Number of ports: 1 Firmware version: 1.2.0 Hardware version: a0 Node GUID: 0x0002c9020024b55c System image GUID: 0x0002c9000100d050 Port 1: State: Active Physical state: LinkUp Rate: 10 Base lid: 1 LMC: 0 SM lid: 1 Capability mask: 0x02510a6a Port GUID: 0x0002c9020024b55d Jeremy Enos wrote: > After building and installing the 20081209-0926 daily snapshot on x86_64 > Fedora Core 9 (fully updated), I get this error when trying to ifup ib0: > Bringing up interface ib0: ib_ipoib device ib0 does not seem to be > present, delaying initialization. > > I ran the IPoIB configuration and it generated an ifcfg-ib0 file as > expected. What's missing? > > btw- I tried previous releases and none would even build on fc9 until I > got into the 1.4 release candidates.. rc6 built but wouldn't run > opensmd. The daily build of 12/9 was the first I found that I could > build and run opensmd on, but now I still have this ipoib issue. I've > included ofed_info output (and other detail) below, including the error > at the end. > thx- > > Jeremy > > > ofed_info > OFED-1.4-20081209-0926 > libibverbs: > git://git.openfabrics.org/ofed_1_4/libibverbs.git ofed_1_4 > commit b00dc7d2f79e0660ac40160607c9c4937a895433 > libmthca: > git://git.kernel.org/pub/scm/libs/infiniband/libmthca.git master > commit be5eef3895eb7864db6395b885a19f770fde7234 > libmlx4: > git://git.openfabrics.org/ofed_1_4/libmlx4.git ofed_1_4 > commit bd28f5307c3782b41cf6dfcbb6714df03c9f7025 > libehca: > git://git.openfabrics.org/ofed_1_4/libehca.git ofed_1_4 > commit e0c2d7e8ee2aa5dd3f3511270521fb0c206167c6 > libipathverbs: > git://git.openfabrics.org/~ralphc/libipathverbs ofed_1_4 > commit 65e5701dbe7b511f796cb0026b0cd51831a62318 > libcxgb3: > git://git.openfabrics.org/~swise/libcxgb3.git ofed_1_4 > commit f685c8fe7e77e64614d825e563dd9f02a0b1ae16 > libnes: > git://git.openfabrics.org/~glenn/libnes.git master > commit 07fb9dfbbb36b28b5ea6caa14a1a5e215386b3e8 > libibcm: > git://git.openfabrics.org/~shefty/libibcm.git master > commit 7fb57e005b3eae2feb83b3fd369aeba700a5bcf8 > librdmacm: > git://git.openfabrics.org/~shefty/librdmacm.git master > commit e0b1ece1dc0518b2a5232872e0c48d3e2e354e47 > libsdp: > git://git.openfabrics.org/ofed_1_4/libsdp.git ofed_1_4 > commit 02404fb0266082f5b64412c3c25a71cb9d39442d > sdpnetstat: > git://git.openfabrics.org/~amirv/sdpnetstat.git ofed_1_4 > commit 75a033a9512127449f141411b0b7516f72351f95 > srptools: > git://git.openfabrics.org/ofed_1_3/srptools.git ofed_1_3 > commit d3025d0771317584e51490a419a79ab55650ebc9 > perftest: > git://git.openfabrics.org/~orenmeron/perftest.git master > commit ca629627c7a26005a1a4c8775cc01f483524f1c4 > qlvnictools: > git://git.openfabrics.org/~ramachandrak/qlvnictools.git ofed_1_4 > commit 1dc6e51a728cbfbdd2018260602b8bebde618da9 > tvflash: > git://git.openfabrics.org/ofed_1_4/tvflash.git ofed_1_4 > commit e1b50b3b8af52b0bc55b2825bb4d6ce699d5c43b > mstflint: > git://git.openfabrics.org/~orenk/mstflint.git master > commit 9ddeea464e946cd425e05b0d1fdd9ec003fca824 > qperf: > git://git.openfabrics.org/~johann/qperf.git/.git master > commit bee05d35b09b0349cf4734ae43fc9c2e970ada8c > ibutils: > git://git.openfabrics.org/~orenk/ibutils.git master > commit 6516d16e815c68fa405562ea773b0c5215c1b70c > ibsim: > git://git.openfabrics.org/~sashak/ibsim.git master > commit a76132ae36dde8302552d896e35bd29608ac9524 > > ofa_kernel-1.4: > Git: > git://git.openfabrics.org/ofed_1_4/linux-2.6.git ofed_kernel > commit 88ab7955605c5e769e760f6bec980e0c2e72aa5c > > # MPI > mvapich-1.1.0-3143.src.rpm > mvapich2-1.2p1-1.src.rpm > openmpi-1.2.8-1.src.rpm > mpitests-3.1-891.src.rpm > > > [root at host OFED-1.4-20081209-0926]# cat /etc/*release > Fedora release 9 (Sulphur) > Fedora release 9 (Sulphur) > Fedora release 9 (Sulphur) > [root at host OFED-1.4-20081209-0926]# uname -a > Linux host 2.6.27.5-41.fc9.x86_64 #1 SMP Thu Nov 13 20:29:07 EST 2008 > x86_64 x86_64 x86_64 GNU/Linux > > [root at host OFED-1.4-20081209-0926]# cat > /etc/sysconfig/network-scripts/ifcfg-ib0 > DEVICE=ib0 > BOOTPROTO=static > ONBOOT=yes > BROADCAST=192.168.2.255 > IPADDR=192.168.2.254 > NETMASK=255.255.255.0 > NETWORK=192.168.2.0 > > [root at host OFED-1.4-20081209-0926]# service openibd status > > HCA driver loaded > > > The following OFED modules are loaded: > > ib_ipath > mlx4_core > mlx4_ib > ib_mthca > ib_uverbs > ib_umad > ib_sa > ib_cm > ib_mad > ib_core > iw_cxgb3 > > [root at host OFED-1.4-20081209-0926]# ifup ib0 > ib_ipoib device ib0 does not seem to be present, delaying initialization. > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From jeff at splitrockpr.com Thu Dec 11 14:54:23 2008 From: jeff at splitrockpr.com (Jeffrey Scott) Date: Thu, 11 Dec 2008 14:54:23 -0800 Subject: [ofa-general] OpenFabrics Sonoma Workshop - call for papers Message-ID: <8461F407011046BCB7CCB4384C7D2F40@Gaucho> Call for Presentations The OpenFabrics Alliance is hosting the fifth-annual International Sonoma Workshop from March 22-26, 2009. The Workshop provides the OFA community with an opportunity to hear from server and storage OEMs, major OS distributors, and leading ISVs regarding their plans for the OpenFabrics software stack. Attendees also hear how end users address pain points in their enterprise data centers and high-performance computing environments with the OpenFabrics software stack. In addition, the Workshop provides a venue for the OFA community to discuss a wide range of technical and development issues, including the future direction of the Linux and Windows stacks, plans for IB and Ethernet interoperability, virtualization, routing, MPI, tools and sockets. We Want Your Participation! Opportunities to present at the Sonoma Workshop are open to OFA members and non-members alike. If you would like to speak at the Workshop, please submit your proposal for a 30-minute presentation by January 9, 2009. The proposal should be ONE page. It should include: - Title (5-6 words) - Abstract (1-2 paragraphs) - Short biography of the presenter, plus job title, address, telephone, email - Brief description of the presenter's organization (1-2 sentences) Proposals should be submitted to the OpenFabrics Marketing Working Group via Jeff Scott at jeff at splitrockpr.com. To view presentations given at the last Sonoma Workshop, visit http://www.openfabrics.org/archives/april2008sonoma.htm. ----------------------------------- Jeffrey Scott Split Rock Communications 408-884-4017 408-348-3651 Mobile 408-884-3900 Fax www.SplitRockPR.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From weiny2 at llnl.gov Thu Dec 11 16:20:28 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 11 Dec 2008 16:20:28 -0800 Subject: [ofa-general] [PATCH V2 0/3] ibnetdiscover library "libibnetdisc" V2 Message-ID: <20081211162028.4813317a.weiny2@llnl.gov> The following 3 patches implement "libibnetdisc" and convert 2 of the diag tools to use them (ibnetdiscover and iblinkinfo.pl) As per comments from the list; this new version makes the library part of the infiniband-diags package as well as cleans up the public interface quite a bit. In addition more testing was done on the QLogic chassis's I remembered we have. Once again this series is heavily tested. I have run it through valgrind and tested on QLogic and Voltarie chassis for grouping. Since I don't have a Xsigo box to test on I can only verify that it compiles correctly. I believe that since it works on QLogic chasiss this will translate to Cisco as well. If someone has a Cisco chassis please test and let me know if it breaks something. Enjoy, Ira From weiny2 at llnl.gov Thu Dec 11 16:20:31 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 11 Dec 2008 16:20:31 -0800 Subject: [ofa-general] ***SPAM*** [PATCH V2 1/3] Create a new library libibnetdisc Message-ID: <20081211162031.0c591f54.weiny2@llnl.gov> >From d615162e547f3a2b2d1acd8c79c24ee691c96c95 Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Wed, 26 Nov 2008 12:54:47 -0800 Subject: [PATCH] Create a new library libibnetdisc This encompasses the functionality of ibnetdiscover in a C library. It returns a single "ibnd_fabric_t" object which represents the data found during the scan. The NodeInfo, PortInfo, and SwitchInfo are preserved from the queries made on the fabric to be used by the calling function as they see fit. This greatly benefits some diags like iblinkinfo.pl. This diag in particular was re-written using this library in C and has shown an 85% speed up on a ~1000 node cluster. Previous iblinkinfo.pl real 3m35.876s user 0m13.210s sys 1m1.046s New iblinkinfotest real 0m32.869s user 0m0.067s sys 0m0.140s Signed-off-by: Ira Weiny --- infiniband-diags/Makefile.am | 1 + infiniband-diags/configure.in | 31 +- infiniband-diags/libibnetdisc/Makefile.am | 66 ++ .../libibnetdisc/include/infiniband/ibnetdisc.h | 276 ++++++ infiniband-diags/libibnetdisc/libibnetdisc.ver | 9 + infiniband-diags/libibnetdisc/man/ibnd_debug.3 | 2 + .../libibnetdisc/man/ibnd_destroy_fabric.3 | 2 + .../libibnetdisc/man/ibnd_discover_fabric.3 | 49 ++ .../libibnetdisc/man/ibnd_find_node_dr.3 | 2 + .../libibnetdisc/man/ibnd_find_node_guid.3 | 25 + .../libibnetdisc/man/ibnd_iter_nodes.3 | 24 + .../libibnetdisc/man/ibnd_iter_nodes_type.3 | 2 + .../libibnetdisc/man/ibnd_linkspeed_str.3 | 2 + .../libibnetdisc/man/ibnd_linkstate_str.3 | 2 + .../libibnetdisc/man/ibnd_linkwidth_str.3 | 26 + .../libibnetdisc/man/ibnd_node_type_str.3 | 2 + .../libibnetdisc/man/ibnd_node_type_str_short.3 | 2 + .../libibnetdisc/man/ibnd_physstate_str.3 | 2 + .../libibnetdisc/man/ibnd_show_progress.3 | 2 + .../libibnetdisc/man/ibnd_update_node.3 | 21 + infiniband-diags/libibnetdisc/src/chassis.c | 818 ++++++++++++++++++ infiniband-diags/libibnetdisc/src/chassis.h | 85 ++ infiniband-diags/libibnetdisc/src/ibnetdisc.c | 872 ++++++++++++++++++++ infiniband-diags/libibnetdisc/src/internal.h | 82 ++ infiniband-diags/libibnetdisc/src/libibnetdisc.map | 27 + .../libibnetdisc/test/iblinkinfotest.c | 395 +++++++++ infiniband-diags/libibnetdisc/test/ibnetdisctest.c | 675 +++++++++++++++ infiniband-diags/libibnetdisc/test/testleaks.c | 268 ++++++ 28 files changed, 3769 insertions(+), 1 deletions(-) create mode 100644 infiniband-diags/libibnetdisc/Makefile.am create mode 100644 infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h create mode 100644 infiniband-diags/libibnetdisc/libibnetdisc.ver create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_debug.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_update_node.3 create mode 100644 infiniband-diags/libibnetdisc/src/chassis.c create mode 100644 infiniband-diags/libibnetdisc/src/chassis.h create mode 100644 infiniband-diags/libibnetdisc/src/ibnetdisc.c create mode 100644 infiniband-diags/libibnetdisc/src/internal.h create mode 100644 infiniband-diags/libibnetdisc/src/libibnetdisc.map create mode 100644 infiniband-diags/libibnetdisc/test/iblinkinfotest.c create mode 100644 infiniband-diags/libibnetdisc/test/ibnetdisctest.c create mode 100644 infiniband-diags/libibnetdisc/test/testleaks.c diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am index c22ba5e..8e8c3c1 100644 --- a/infiniband-diags/Makefile.am +++ b/infiniband-diags/Makefile.am @@ -1,3 +1,4 @@ +SUBDIRS = libibnetdisc INCLUDES = -I$(top_builddir)/include/ -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in index 5509fec..7c346e2 100644 --- a/infiniband-diags/configure.in +++ b/infiniband-diags/configure.in @@ -145,6 +145,34 @@ IBSCRIPTPATH_TMP2="`echo $IBSCRIPTPATH_TMP1 | sed 's/^NONE/$ac_default_prefix/'` IBSCRIPTPATH="`eval echo $IBSCRIPTPATH_TMP2`" AC_SUBST(IBSCRIPTPATH) +dnl Begin libibnetdisc stuff +AC_CHECK_HEADERS([stdint.h stdlib.h string.h syslog.h unistd.h]) +AC_CHECK_FUNCS([strrchr strtoul strtoull]) + +ibnetdisc_api_version=`grep LIBVERSION $srcdir/libibnetdisc/libibnetdisc.ver | sed 's/LIBVERSION=//'` +if test -z $ibnetdisc_api_version; then + echo "FAILED to find $srcdir/libibnetdisc/libibnetdisc.ver" + exit 1 +fi +AC_SUBST(ibnetdisc_api_version) +AC_DEFINE_UNQUOTED(API_VERSION, + ["$ibnetdisc_api_version"], + [The API version of this library]) + +AC_MSG_CHECKING(for --enable-test-utils) +AC_ARG_ENABLE(test-utils, +[ --enable-test-utils build additional test utilities (default=no)], +[case "${enableval}" in + yes) tutils=yes ;; + no) tutils=no ;; + *) AC_MSG_ERROR(bad value ${enableval} for --enable-test-utils) ;; +esac],[tutils=no]) +AM_CONDITIONAL(ENABLE_TEST_UTILS, test x$tutils = xyes) +AC_MSG_RESULT(${tutils=no}) + +dnl End libibnetdisc stuff + + AC_CONFIG_FILES([\ Makefile \ infiniband-diags.spec \ @@ -165,6 +193,7 @@ AC_CONFIG_FILES([\ scripts/ibhosts \ scripts/ibnodes \ scripts/ibswitches \ - scripts/ibrouters + scripts/ibrouters \ + libibnetdisc/Makefile ]) AC_OUTPUT diff --git a/infiniband-diags/libibnetdisc/Makefile.am b/infiniband-diags/libibnetdisc/Makefile.am new file mode 100644 index 0000000..7b478b1 --- /dev/null +++ b/infiniband-diags/libibnetdisc/Makefile.am @@ -0,0 +1,66 @@ + +#SUBDIRS = . + +INCLUDES = -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband + +lib_LTLIBRARIES = libibnetdisc.la +sbin_PROGRAMS = + +if ENABLE_TEST_UTILS +sbin_PROGRAMS += test/ibnetdisctest \ + test/iblinkinfotest \ + test/testleaks +endif + +DBGFLAGS = -g + +if HAVE_LD_VERSION_SCRIPT +libibnetdisc_version_script = -Wl,--version-script=$(srcdir)/src/libibnetdisc.map +else +libibnetdisc_version_script = +endif + +libibnetdisc_la_SOURCES = src/ibnetdisc.c src/chassis.c src/chassis.h +libibnetdisc_la_CFLAGS = -Wall $(DBGFLAGS) +libibnetdisc_la_LDFLAGS = -version-info $(ibnetdisc_api_version) \ + -export-dynamic $(libibnetdisc_version_script) \ + -losmcomp -libmad +libibnetdisc_la_DEPENDENCIES = $(srcdir)/src/libibnetdisc.map + +libibnetdiscincludedir = $(includedir)/infiniband + +test_ibnetdisctest_SOURCES = test/ibnetdisctest.c +test_ibnetdisctest_CFLAGS = -Wall $(DBGFLAGS) +test_ibnetdisctest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ + -libcommon -libnetdisc + +test_iblinkinfotest_SOURCES = test/iblinkinfotest.c +test_iblinkinfotest_CFLAGS = -Wall $(DBGFLAGS) +test_iblinkinfotest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ + -libcommon -libnetdisc + +test_testleaks_SOURCES = test/testleaks.c +test_testleaks_CFLAGS = -Wall $(DBGFLAGS) +test_testleaks_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ + -libcommon -libnetdisc + +libibnetdiscinclude_HEADERS = $(srcdir)/include/infiniband/ibnetdisc.h + +man_MANS = man/ibnd_debug.3 \ + man/ibnd_destroy_fabric.3 \ + man/ibnd_discover_fabric.3 \ + man/ibnd_find_node_dr.3 \ + man/ibnd_find_node_guid.3 \ + man/ibnd_iter_nodes.3 \ + man/ibnd_iter_nodes_type.3 \ + man/ibnd_linkspeed_str.3 \ + man/ibnd_linkstate_str.3 \ + man/ibnd_linkwidth_str.3 \ + man/ibnd_node_type_str.3 \ + man/ibnd_physstate_str.3 \ + man/ibnd_update_node.3 \ + man/ibnd_show_progress.3 + +EXTRA_DIST = libibnetdisc.spec.in libibnetdisc.spec \ + $(srcdir)/src/libibnetdisc.map libibnetdisc.ver autogen.sh + diff --git a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h new file mode 100644 index 0000000..cdee2bd --- /dev/null +++ b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h @@ -0,0 +1,276 @@ +/* + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef _IBNETDISC_H_ +#define _IBNETDISC_H_ + +#include +#include + +#define MAXHOPS 63 + +/* HASH table defines */ +#define HASHGUID(guid) ((uint32_t)(((uint32_t)(guid) * 101) ^ ((uint32_t)((guid) >> 32) * 103))) +#define HTSZ 137 + +#define IBND_DEBUG(str, args...) \ + if (ibdebug) printf("%s:%d; "str, __FILE__, __LINE__, ##args) +#define IBND_ERROR(str, args...) \ + fprintf(stderr, "%s:%d; "str, __FILE__, __LINE__, ##args) + +/** ========================================================================= + * ENUM definitions + */ +typedef enum { + IBND_CA_NODE = 1, + IBND_SWITCH_NODE = 2, + IBND_ROUTER_NODE = 3 +} ibnd_node_type_t; + +typedef enum { + IBND_LINK_DOWN = 1, + IBND_LINK_INIT = 2, + IBND_LINK_ARMED = 3, + IBND_LINK_ACTIVE = 4 +} ibnd_link_state_t; + +/** ========================================================================= + * Node + */ +typedef struct switch_info { + int smaenhsp0; +} ibnd_switch_info_t; + +typedef struct node_info { + int base_ver; + int class_ver; + int type; + int numports; + uint64_t sysimgguid; + uint64_t nodeguid; + uint64_t nodeportguid; + uint16_t partition_cap; + uint32_t devid; + uint32_t revision; + int localport; + uint32_t vendid; +} ibnd_node_info_t; + +struct ib_fabric; /* forward declare */ +struct chassis; /* forward declare */ +struct port; /* forward declare */ + +typedef struct node { + struct node *next; /* all node list in fabric */ + struct ib_fabric *fabric; /* the fabric node belongs to */ + + ib_portid_t path_portid; /* path from "from_node" */ + int dist; /* num of hops from "from_node" */ + int smalid; + int smalmc; + ibnd_switch_info_t sw_info; + ibnd_node_info_t info; + char nodedesc[64]; + struct port **ports; /* in order array of port pointers */ + /* the size of this array is info.numports + 1 */ + /* items MAY BE NULL! (ie 0 == switches only) */ + + /* chassis info */ + struct node *next_chassis_node; /* next node in ibnd_chassis_t->nodes */ + struct chassis *chassis; /* if != NULL the chassis this node belongs to */ + unsigned char ch_type; + unsigned char ch_anafanum; + unsigned char ch_slotnum; + unsigned char ch_slot; +} ibnd_node_t; + +/** ========================================================================= + * Port + */ +typedef struct port_info { + int lid; + int smlid; + int link_speed_supported; + int link_speed_enabled; + int link_speed_active; + int link_state; + int phys_state; + int link_down_def_state; + int mkey_prot_bits; + int lmc; + int neighbor_mtu; + int smsl; + int init_type; + int vl_capability; + int vl_high_limit; + int vl_arb_high_cap; + int vl_arb_low_cap; + int init_reply; + int mtu_cap; + int vl_stall_count; + int hoq_lifetime; + int oper_vls; + int partition_enforce_in; + int partition_enforce_out; + int filter_raw_in; + int filter_raw_out; + int mkey_violations; + int pkey_violations; + int qkey_violations; + int guid_capabilities; + int client_rereg; + int subnet_timeout; + int response_time_val; + int local_phys_error; + int overrun_error; + int max_credit_hint; + uint32_t link_round_trip; + int local_port; + int link_width_supported; + int link_width_enabled; + int link_width_active; + int diag_code; + int mkey_lease; + uint32_t capability_mask; + uint64_t mkey; + uint64_t gid_prefix; +} ibnd_port_info_t; + +typedef struct port { + uint64_t guid; + int portnum; + int ext_portnum; /* optional if != 0 external port num */ + ibnd_node_t *node; /* node this port belongs to */ + ibnd_port_info_t info; + struct port *remoteport; /* null if SMA, or does not exist */ +} ibnd_port_t; + + +/** ========================================================================= + * Chassis data + */ +typedef struct chassis { + struct chassis *next; + uint64_t chassisguid; + int chassisnum; + + /* generic grouping by SystemImageGUID */ + int nodecount; + ibnd_node_t *nodes; + + /* specific to voltaire type nodes */ +#define SPINES_MAX_NUM 12 +#define LINES_MAX_NUM 36 + ibnd_node_t *spinenode[SPINES_MAX_NUM + 1]; + ibnd_node_t *linenode[LINES_MAX_NUM + 1]; +} ibnd_chassis_t; + +/** ========================================================================= + * Fabric + * Main fabric object which is returned and represents the data discovered + */ +typedef struct ib_fabric { + /* the node the discover was initiated from + * "from" parameter in ibnd_discover_fabric + * or by default the node you ar running on + */ + ibnd_node_t *from_node; + /* NULL term list of all nodes in the fabric */ + ibnd_node_t *nodes; + /* NULL terminated list of all chassis found in the fabric */ + ibnd_chassis_t *chassis; + int maxhops_discovered; +} ibnd_fabric_t; + + +/** ========================================================================= + * Initialization (fabric operations) + */ +void ibnd_debug(int i); +void ibnd_show_progress(int i); + +ibnd_fabric_t *ibnd_discover_fabric(char *dev_name, int dev_port, + int timeout_ms, ib_portid_t *from, int hops); + /** + * dev_name: (required) local device name to use to access the fabric + * dev_port: (required) local device port to use to access the fabric + * timeout_ms: (required) gives the timeout for a _SINGLE_ query on + * the fabric. So if there are mutiple nodes not + * responding this may result in a lengthy delay. + * from: (optional) specify the node to start scanning from. + * If NULL start from the node we are running on. + * hops: (optional) Specify how much of the fabric to traverse. + * negative value == scan entire fabric + */ +void ibnd_destroy_fabric(ibnd_fabric_t *fabric); + +/** ========================================================================= + * Node operations + */ +ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid); +ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str); +ibnd_node_t *ibnd_update_node(ibnd_node_t *node); + +typedef void (*ibnd_iter_node_func_t)(ibnd_node_t *node, void *user_data); +void ibnd_iter_nodes(ibnd_fabric_t *fabric, + ibnd_iter_node_func_t func, + void *user_data); +void ibnd_iter_nodes_type(ibnd_fabric_t *fabric, + ibnd_iter_node_func_t func, + ibnd_node_type_t node_type, + void *user_data); + +/** ========================================================================= + * Str convert functions + */ +char *ibnd_linkwidth_str(int link_width); +char *ibnd_linkstate_str(int link_state); +char *ibnd_physstate_str(int phys_state); +const char *ibnd_node_type_str(ibnd_node_t *node); +const char *ibnd_node_type_str_short(ibnd_node_t *node); +char *ibnd_linkspeed_str(int link_speed, int data_rate); + /* if data_rate == 0 use "SDR", "DDR", etc. */ + /* if data_rate == 1 use "2.5 Gbps", "5.0 Gbps", etc. */ + +/** ========================================================================= + * Chassis queries + */ +uint64_t ibnd_get_chassis_guid(ibnd_fabric_t *fabric, unsigned char chassisnum); +char *ibnd_get_chassis_type(ibnd_node_t *node); +char *ibnd_get_chassis_slot_str(ibnd_node_t *node, char *str, size_t size); + +int ibnd_is_xsigo_guid(uint64_t guid); +int ibnd_is_xsigo_tca(uint64_t guid); +int ibnd_is_xsigo_hca(uint64_t guid); + +#endif /* _IBNETDISC_H_ */ diff --git a/infiniband-diags/libibnetdisc/libibnetdisc.ver b/infiniband-diags/libibnetdisc/libibnetdisc.ver new file mode 100644 index 0000000..a0a5f3c --- /dev/null +++ b/infiniband-diags/libibnetdisc/libibnetdisc.ver @@ -0,0 +1,9 @@ +# In this file we track the current API version +# of the IB net discover interface (and libraries) +# The version is built of the following +# tree numbers: +# API_REV:RUNNING_REV:AGE +# API_REV - advance on any added API +# RUNNING_REV - advance any change to the vendor files +# AGE - number of backward versions the API still supports +LIBVERSION=1:0:0 diff --git a/infiniband-diags/libibnetdisc/man/ibnd_debug.3 b/infiniband-diags/libibnetdisc/man/ibnd_debug.3 new file mode 100644 index 0000000..a4076fc --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_debug.3 @@ -0,0 +1,2 @@ +.\".TH IBND_DEBUG 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_discover_fabric.3 diff --git a/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 b/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 new file mode 100644 index 0000000..8fe20ae --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 @@ -0,0 +1,2 @@ +.\".TH IBND_DESTROY_FABRIC 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_discover_fabric.3 diff --git a/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 b/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 new file mode 100644 index 0000000..44d8c65 --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 @@ -0,0 +1,49 @@ +.TH IBND_DISCOVER_FABRIC 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" +.SH "NAME" +ibnd_discover_fabric, ibnd_destroy_fabric, ibnd_debug ibnd_show_progress \- initialize ibnetdiscover library. +.SH "SYNOPSIS" +.nf +.B #include +.sp +.BI "ibnd_fabric_t *ibnd_discover_fabric(char *dev_name, int dev_port, int timeout_ms, ib_portid_t *from, int hops)" +.BI "void ibnd_destroy_fabric(ibnd_fabric_t *fabric)" +.BI "void ibnd_debug(int i)" +.BI "void ibnd_show_progress(int i)" + + +.SH "DESCRIPTION" +.B ibnd_discover_fabric() +Discover the fabric connected to the port specified by dev_name and dev_port, using a timeout specified. The "from" and "hops" parameters are optional and allow one to scan part of a fabric by specifying a node "from" and a number of hops away from that node to scan, "hops". This gives the user a "sub-fabric" which is "centered" anywhere they chose. + +.B ibnd_destroy_fabric() +free all memory and resources associated with the fabric. + +.B ibnd_debug() +Set the debug level to be printed as library operations take place. + +.B ibnd_debug() +Indicate that the library should print debug output which shows it's progress +through the fabric. + +.SH "RETURN VALUE" +.B ibnd_discover_fabric() +return NULL on failure, otherwise a valid ibnd_fabric_t object. + +.B ibnd_destory_fabric(), ibnd_debug() +NONE + +.SH "EXAMPLES" + +.B Discover the entire fabric connected to device "mthca0", port 1. + + ibnd_discover_fabric("mthca0", 1, 100, NULL, 0); + +.B Discover only a single node and those nodes connected to it. + + str2drpath(&(port_id.drpath), from, 0, 0); + + ibnd_discover_fabric("mthca0", 1, 100, &port_id, 1); + +.SH "AUTHORS" +.TP +Ira Weiny diff --git a/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 b/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 new file mode 100644 index 0000000..612e501 --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 @@ -0,0 +1,2 @@ +.\".TH IBND_FIND_NODE_DR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_find_node_guid.3 diff --git a/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 b/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 new file mode 100644 index 0000000..676b528 --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 @@ -0,0 +1,25 @@ +.TH IBND_FIND_NODE_GUID 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" +.SH "NAME" +ibnd_find_node_guid, ibnd_find_node_dr \- given a fabric object find the node object within it which matches the guid or directed route specified. + +.SH "SYNOPSIS" +.nf +.B #include +.sp +.BI "ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid)" +.BI "ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str)" + +.SH "DESCRIPTION" +.B ibnd_find_node_guid() +Given a fabric object and a guid, return the ibnd_node_t object with that node guid. +.B ibnd_find_node_dr() +Given a fabric object and a directed route, return the ibnd_node_t object with +that directed route. + +.SH "RETURN VALUE" +.B ibnd_find_node_guid(), ibnd_find_node_dr() +return NULL on failure, otherwise a valid ibnd_node_t object. + +.SH "AUTHORS" +.TP +Ira Weiny diff --git a/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 new file mode 100644 index 0000000..7199dfb --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 @@ -0,0 +1,24 @@ +.TH IBND_ITER_NODES 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" +.SH "NAME" +ibnd_iter_nodes, ibnd_iter_nodes_type \- given a fabric object and a function itterate over the nodes in the fabric. + +.SH "SYNOPSIS" +.nf +.B #include +.sp +.BI "void ibnd_iter_nodes(ibnd_fabric_t *fabric, ibnd_iter_func_t func, void *user_data)" +.BI "void ibnd_iter_nodes_type(ibnd_fabric_t *fabric, ibnd_iter_func_t func, ibnd_node_type_t type, void *user_data)" + +.SH "DESCRIPTION" +.B ibnd_iter_nodes() +Itterate through all the nodes in the fabric and call "func" on them. +.B ibnd_iter_nodes_type() +The same as ibnd_iter_nodes except to limit the iteration to the nodes with the specified type. + +.SH "RETURN VALUE" +.B ibnd_iter_nodes(), ibnd_iter_nodes_type() +NONE + +.SH "AUTHORS" +.TP +Ira Weiny diff --git a/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 new file mode 100644 index 0000000..878547c --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 @@ -0,0 +1,2 @@ +.\".TH IBND_FIND_NODES_TYPE 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_find_nodes.3 diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 new file mode 100644 index 0000000..128cd3e --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 @@ -0,0 +1,2 @@ +.\".TH IBND_LINKSPEED_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_linkwidth_str.3 diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 new file mode 100644 index 0000000..2fa9189 --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 @@ -0,0 +1,2 @@ +.\".TH IBND_LINKSTATE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_linkwidth_str.3 diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 new file mode 100644 index 0000000..2cd4f0a --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 @@ -0,0 +1,26 @@ +.TH IBND_LINKWIDTH_STR 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" +.SH "NAME" +ibnd_linkwidth_str, ibnd_linkspeed_str, ibnd_linkstate_str, ibnd_physstate_str, ibnd_node_type_str \- prety string functions. + +.SH "SYNOPSIS" +.nf +.B #include +.sp +.BI +.BI "char *ibnd_linkwidth_str(int link_width)" +.BI "char *ibnd_linkspeed_str(int link_speed)" +.BI "char *ibnd_linkstate_str(int link_state)" +.BI "char *ibnd_physstate_str(int phys_state)" +.BI "const char *ibnd_node_type_str(ibnd_node_t *node)" +.BI "const char *ibnd_node_type_str_short(ibnd_node_t *node)" + +.SH "DESCRIPTION" +Return user readable strings for the values given. + +.BI "const char *ibnd_node_type_str_short(ibnd_node_t *node)" +Returns a shorter abbreviated version of the string. + + +.SH "AUTHORS" +.TP +Ira Weiny diff --git a/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 new file mode 100644 index 0000000..77dbf07 --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 @@ -0,0 +1,2 @@ +.\".TH IBND_NODE_TYPE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_linkwidth_str.3 diff --git a/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 new file mode 100644 index 0000000..62feb6e --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 @@ -0,0 +1,2 @@ +.\".TH IBND_NODE_TYPE_STR_SHORT 3 "Aug 05, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_linkwidth_str.3 diff --git a/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 new file mode 100644 index 0000000..aeeaeb7 --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 @@ -0,0 +1,2 @@ +.\".TH IBND_PHYSSTATE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_physstate_str.3 diff --git a/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 b/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 new file mode 100644 index 0000000..280af31 --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 @@ -0,0 +1,2 @@ +.\".TH IBND_SHOW_PROGRESS 3 "Nov 26, 2008" "OpenIB" "OpenIB Programmer's Manual" +.so man3/ibnd_discover_fabric.3 diff --git a/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 b/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 new file mode 100644 index 0000000..d3aa206 --- /dev/null +++ b/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 @@ -0,0 +1,21 @@ +.TH IBND_UPDATE_NODE 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" +.SH "NAME" +ibnd_update_node \- Update the node specified with new data from the fabric. + +.SH "SYNOPSIS" +.nf +.B #include +.sp +.BI "ibnd_node_t *ibnd_update_node(ibnd_node_t *node)" + +.SH "DESCRIPTION" +.B ibnd_update_node() +Update the node info, port info, and node description of the node specified. + +.SH "RETURN VALUE" +.B ibnd_update_node() +Return NULL on failure, otherwise a valid ibnd_node_t object which is part of the fabric object. + +.SH "AUTHORS" +.TP +Ira Weiny diff --git a/infiniband-diags/libibnetdisc/src/chassis.c b/infiniband-diags/libibnetdisc/src/chassis.c new file mode 100644 index 0000000..41f325e --- /dev/null +++ b/infiniband-diags/libibnetdisc/src/chassis.c @@ -0,0 +1,818 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +/*========================================================*/ +/* FABRIC SCANNER SPECIFIC DATA */ +/*========================================================*/ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include + +#include +#include + +#include "internal.h" +#include "chassis.h" + +static char *ChassisTypeStr[5] = { "", "ISR9288", "ISR9096", "ISR2012", "ISR2004" }; +static char *ChassisSlotTypeStr[4] = { "", "Line", "Spine", "SRBD" }; + +char *ibnd_get_chassis_type(ibnd_node_t *node) +{ + /* Currently, only if Voltaire chassis */ + if (node->info.vendid != VTR_VENDOR_ID) + return (NULL); + if (!node->chassis) + return (NULL); + if (node->ch_type == UNRESOLVED_CT + || node->ch_type > ISR2004_CT) + return (NULL); + return ChassisTypeStr[node->ch_type]; +} + +char *ibnd_get_chassis_slot_str(ibnd_node_t *node, char *str, size_t size) +{ + /* Currently, only if Voltaire chassis */ + if (node->info.vendid != VTR_VENDOR_ID) + return (NULL); + if (!node->chassis) + return (NULL); + if (node->ch_slot == UNRESOLVED_CS + || node->ch_slot > SRBD_CS) + return (NULL); + if (!str) + return (NULL); + snprintf(str, size, "%s %d Chip %d", + ChassisSlotTypeStr[node->ch_slot], + node->ch_slotnum, + node->ch_anafanum); + return (str); +} + +static ibnd_chassis_t *find_chassisnum(struct ibnd_fabric *fabric, unsigned char chassisnum) +{ + ibnd_chassis_t *current; + + for (current = fabric->first_chassis; current; current = current->next) { + if (current->chassisnum == chassisnum) + return current; + } + + return NULL; +} + +static uint64_t topspin_chassisguid(uint64_t guid) +{ + /* Byte 3 in system image GUID is chassis type, and */ + /* Byte 4 is location ID (slot) so just mask off byte 4 */ + return guid & 0xffffffff00ffffffULL; +} + +int ibnd_is_xsigo_guid(uint64_t guid) +{ + if ((guid & 0xffffff0000000000ULL) == 0x0013970000000000ULL) + return 1; + else + return 0; +} + +static int is_xsigo_leafone(uint64_t guid) +{ + if ((guid & 0xffffffffff000000ULL) == 0x0013970102000000ULL) + return 1; + else + return 0; +} + +int ibnd_is_xsigo_hca(uint64_t guid) +{ + /* NodeType 2 is HCA */ + if ((guid & 0xffffffff00000000ULL) == 0x0013970200000000ULL) + return 1; + else + return 0; +} + +int ibnd_is_xsigo_tca(uint64_t guid) +{ + /* NodeType 3 is TCA */ + if ((guid & 0xffffffff00000000ULL) == 0x0013970300000000ULL) + return 1; + else + return 0; +} + +static int is_xsigo_ca(uint64_t guid) +{ + if (ibnd_is_xsigo_hca(guid) || ibnd_is_xsigo_tca(guid)) + return 1; + else + return 0; +} + +static int is_xsigo_switch(uint64_t guid) +{ + if ((guid & 0xffffffff00000000ULL) == 0x0013970100000000ULL) + return 1; + else + return 0; +} + +static uint64_t xsigo_chassisguid(ibnd_node_t *node) +{ + if (!is_xsigo_ca(node->info.sysimgguid)) { + /* Byte 3 is NodeType and byte 4 is PortType */ + /* If NodeType is 1 (switch), PortType is masked */ + if (is_xsigo_switch(node->info.sysimgguid)) + return node->info.sysimgguid & 0xffffffff00ffffffULL; + else + return node->info.sysimgguid; + } else { + if (!node->ports || !node->ports[1]) + return (0); + + /* Is there a peer port ? */ + if (!node->ports[1]->remoteport) + return node->info.sysimgguid; + + /* If peer port is Leaf 1, use its chassis GUID */ + if (is_xsigo_leafone(node->ports[1]->remoteport->node->info.sysimgguid)) + return node->ports[1]->remoteport->node->info.sysimgguid & + 0xffffffff00ffffffULL; + else + return node->info.sysimgguid; + } +} + +static uint64_t get_chassisguid(ibnd_node_t *node) +{ + if (node->info.vendid == TS_VENDOR_ID || node->info.vendid == SS_VENDOR_ID) + return topspin_chassisguid(node->info.sysimgguid); + else if (node->info.vendid == XS_VENDOR_ID || ibnd_is_xsigo_guid(node->info.sysimgguid)) + return xsigo_chassisguid(node); + else + return node->info.sysimgguid; +} + +static ibnd_chassis_t *find_chassisguid(ibnd_node_t *node) +{ + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(node->fabric); + ibnd_chassis_t *current; + uint64_t chguid; + + chguid = get_chassisguid(node); + for (current = f->first_chassis; current; current = current->next) { + if (current->chassisguid == chguid) + return current; + } + + return NULL; +} + +uint64_t ibnd_get_chassis_guid(ibnd_fabric_t *fabric, unsigned char chassisnum) +{ + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); + ibnd_chassis_t *chassis; + + chassis = find_chassisnum(f, chassisnum); + if (chassis) + return chassis->chassisguid; + else + return 0; +} + +static int is_router(struct ibnd_node *n) +{ + return (n->node.info.devid == VTR_DEVID_IB_FC_ROUTER || + n->node.info.devid == VTR_DEVID_IB_IP_ROUTER); +} + +static int is_spine_9096(struct ibnd_node *n) +{ + return (n->node.info.devid == VTR_DEVID_SFB4 || + n->node.info.devid == VTR_DEVID_SFB4_DDR); +} + +static int is_spine_9288(struct ibnd_node *n) +{ + return (n->node.info.devid == VTR_DEVID_SFB12 || + n->node.info.devid == VTR_DEVID_SFB12_DDR); +} + +static int is_spine_2004(struct ibnd_node *n) +{ + return (n->node.info.devid == VTR_DEVID_SFB2004); +} + +static int is_spine_2012(struct ibnd_node *n) +{ + return (n->node.info.devid == VTR_DEVID_SFB2012); +} + +static int is_spine(struct ibnd_node *n) +{ + return (is_spine_9096(n) || is_spine_9288(n) || + is_spine_2004(n) || is_spine_2012(n)); +} + +static int is_line_24(struct ibnd_node *n) +{ + return (n->node.info.devid == VTR_DEVID_SLB24 || + n->node.info.devid == VTR_DEVID_SLB24_DDR || + n->node.info.devid == VTR_DEVID_SRB2004); +} + +static int is_line_8(struct ibnd_node *n) +{ + return (n->node.info.devid == VTR_DEVID_SLB8); +} + +static int is_line_2024(struct ibnd_node *n) +{ + return (n->node.info.devid == VTR_DEVID_SLB2024); +} + +static int is_line(struct ibnd_node *n) +{ + return (is_line_24(n) || is_line_8(n) || is_line_2024(n)); +} + +int is_chassis_switch(struct ibnd_node *n) +{ + return (is_spine(n) || is_line(n)); +} + +/* these structs help find Line (Anafa) slot number while using spine portnum */ +int line_slot_2_sfb4[25] = { 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4 }; +int anafa_line_slot_2_sfb4[25] = { 0, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2 }; +int line_slot_2_sfb12[25] = { 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10, 10, 11, 11, 12, 12 }; +int anafa_line_slot_2_sfb12[25] = { 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2 }; + +/* IPR FCR modules connectivity while using sFB4 port as reference */ +int ipr_slot_2_sfb4_port[25] = { 0, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1 }; + +/* these structs help find Spine (Anafa) slot number while using spine portnum */ +int spine12_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; +int anafa_spine12_slot_2_slb[25]= { 0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; +int spine4_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; +int anafa_spine4_slot_2_slb[25] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; +/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ + +static void get_sfb_slot(struct ibnd_node *node, ibnd_port_t *lineport) +{ + ibnd_node_t *n = (ibnd_node_t *)node; + + n->ch_slot = SPINE_CS; + if (is_spine_9096(node)) { + n->ch_type = ISR9096_CT; + n->ch_slotnum = spine4_slot_2_slb[lineport->portnum]; + n->ch_anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; + } else if (is_spine_9288(node)) { + n->ch_type = ISR9288_CT; + n->ch_slotnum = spine12_slot_2_slb[lineport->portnum]; + n->ch_anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; + } else if (is_spine_2012(node)) { + n->ch_type = ISR2012_CT; + n->ch_slotnum = spine12_slot_2_slb[lineport->portnum]; + n->ch_anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; + } else if (is_spine_2004(node)) { + n->ch_type = ISR2004_CT; + n->ch_slotnum = spine4_slot_2_slb[lineport->portnum]; + n->ch_anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; + } else { + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, + node->node.info.nodeguid); + } +} + +static void get_router_slot(struct ibnd_node *node, ibnd_port_t *spineport) +{ + ibnd_node_t *n = (ibnd_node_t *)node; + int guessnum = 0; + + node->ch_found = 1; + + n->ch_slot = SRBD_CS; + if (is_spine_9096(CONV_NODE_INTERNAL(spineport->node))) { + n->ch_type = ISR9096_CT; + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; + n->ch_anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; + } else if (is_spine_9288(CONV_NODE_INTERNAL(spineport->node))) { + n->ch_type = ISR9288_CT; + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; + /* this is a smart guess based on nodeguids order on sFB-12 module */ + guessnum = spineport->node->info.nodeguid % 4; + /* module 1 <--> remote anafa 3 */ + /* module 2 <--> remote anafa 2 */ + /* module 3 <--> remote anafa 1 */ + n->ch_anafanum = (guessnum == 3 ? 1 : (guessnum == 1 ? 3 : 2)); + } else if (is_spine_2012(CONV_NODE_INTERNAL(spineport->node))) { + n->ch_type = ISR2012_CT; + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; + /* this is a smart guess based on nodeguids order on sFB-12 module */ + guessnum = spineport->node->info.nodeguid % 4; + // module 1 <--> remote anafa 3 + // module 2 <--> remote anafa 2 + // module 3 <--> remote anafa 1 + n->ch_anafanum = (guessnum == 3? 1 : (guessnum == 1 ? 3 : 2)); + } else if (is_spine_2004(CONV_NODE_INTERNAL(spineport->node))) { + n->ch_type = ISR2004_CT; + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; + n->ch_anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; + } else { + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, + spineport->node->info.nodeguid); + } +} + +static void get_slb_slot(ibnd_node_t *n, ibnd_port_t *spineport) +{ + n->ch_slot = LINE_CS; + if (is_spine_9096(CONV_NODE_INTERNAL(spineport->node))) { + n->ch_type = ISR9096_CT; + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; + n->ch_anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; + } else if (is_spine_9288(CONV_NODE_INTERNAL(spineport->node))) { + n->ch_type = ISR9288_CT; + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; + n->ch_anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; + } else if (is_spine_2012(CONV_NODE_INTERNAL(spineport->node))) { + n->ch_type = ISR2012_CT; + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; + n->ch_anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; + } else if (is_spine_2004(CONV_NODE_INTERNAL(spineport->node))) { + n->ch_type = ISR2004_CT; + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; + n->ch_anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; + } else { + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, + spineport->node->info.nodeguid); + } +} + +/* forward declare this */ +static void voltaire_portmap(ibnd_port_t *port); +/* + This function called for every Voltaire node in fabric + It could be optimized so, but time overhead is very small + and its only diag.util +*/ +static void fill_voltaire_chassis_record(struct ibnd_node *node) +{ + ibnd_node_t *n = (ibnd_node_t *)node; + int p = 0; + ibnd_port_t *port; + struct ibnd_node *remnode = 0; + + if (node->ch_found) /* somehow this node has already been passed */ + return; + node->ch_found = 1; + + /* node is router only in case of using unique lid */ + /* (which is lid of chassis router port) */ + /* in such case node->ports is actually a requested port... */ + if (is_router(node)) { + /* find the remote node */ + for (p = 1; p <= node->node.info.numports; p++) { + port = node->node.ports[p]; + if (port && is_spine(CONV_NODE_INTERNAL(port->remoteport->node))) + get_router_slot(node, port->remoteport); + } + } else if (is_spine(node)) { + for (p = 1; p <= node->node.info.numports; p++) { + port = node->node.ports[p]; + if (!port || !port->remoteport) + continue; + remnode = CONV_NODE_INTERNAL(port->remoteport->node); + if (remnode->node.info.type != IBND_SWITCH_NODE) { + if (!remnode->ch_found) + get_router_slot(remnode, port); + continue; + } + if (!n->ch_type) + /* we assume here that remoteport belongs to line */ + get_sfb_slot(node, port->remoteport); + + /* we could break here, but need to find if more routers connected */ + } + + } else if (is_line(node)) { + for (p = 1; p <= node->node.info.numports; p++) { + port = node->node.ports[p]; + if (!port || port->portnum > 12 || !port->remoteport) + continue; + /* we assume here that remoteport belongs to spine */ + get_slb_slot(n, port->remoteport); + break; + } + } + + /* for each port of this node, map external ports */ + for (p = 1; p <= node->node.info.numports; p++) { + port = node->node.ports[p]; + if (!port) + continue; + voltaire_portmap(port); + } + + return; +} + +static int get_line_index(ibnd_node_t *node) +{ + int retval = 3 * (node->ch_slotnum - 1) + node->ch_anafanum; + + if (retval > LINES_MAX_NUM || retval < 1) + IBPANIC("Internal error"); + return retval; +} + +static int get_spine_index(ibnd_node_t *node) +{ + int retval; + + if (is_spine_9288(CONV_NODE_INTERNAL(node)) || is_spine_2012(CONV_NODE_INTERNAL(node))) + retval = 3 * (node->ch_slotnum - 1) + node->ch_anafanum; + else + retval = node->ch_slotnum; + + if (retval > SPINES_MAX_NUM || retval < 1) + IBPANIC("Internal error"); + return retval; +} + +static void insert_line_router(ibnd_node_t *node, ibnd_chassis_t *chassis) +{ + int i = get_line_index(node); + + if (chassis->linenode[i]) + return; /* already filled slot */ + + chassis->linenode[i] = node; + node->chassis = chassis; +} + +static void insert_spine(ibnd_node_t *node, ibnd_chassis_t *chassis) +{ + int i = get_spine_index(node); + + if (chassis->spinenode[i]) + return; /* already filled slot */ + + chassis->spinenode[i] = node; + node->chassis = chassis; +} + +static void pass_on_lines_catch_spines(ibnd_chassis_t *chassis) +{ + ibnd_node_t *node, *remnode; + ibnd_port_t *port; + int i, p; + + for (i = 1; i <= LINES_MAX_NUM; i++) { + node = chassis->linenode[i]; + + if (!(node && is_line(CONV_NODE_INTERNAL(node)))) + continue; /* empty slot or router */ + + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (!port || port->portnum > 12 || !port->remoteport) + continue; + + remnode = port->remoteport->node; + + if (!CONV_NODE_INTERNAL(remnode)->ch_found) + continue; /* some error - spine not initialized ? FIXME */ + insert_spine(remnode, chassis); + } + } +} + +static void pass_on_spines_catch_lines(ibnd_chassis_t *chassis) +{ + ibnd_node_t *node, *remnode; + ibnd_port_t *port; + int i, p; + + for (i = 1; i <= SPINES_MAX_NUM; i++) { + node = chassis->spinenode[i]; + if (!node) + continue; /* empty slot */ + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (!port || !port->remoteport) + continue; + remnode = port->remoteport->node; + + if (!CONV_NODE_INTERNAL(remnode)->ch_found) + continue; /* some error - line/router not initialized ? FIXME */ + insert_line_router(remnode, chassis); + } + } +} + +/* + Stupid interpolation algorithm... + But nothing to do - have to be compliant with VoltaireSM/NMS +*/ +static void pass_on_spines_interpolate_chguid(ibnd_chassis_t *chassis) +{ + ibnd_node_t *node; + int i; + + for (i = 1; i <= SPINES_MAX_NUM; i++) { + node = chassis->spinenode[i]; + if (!node) + continue; /* skip the empty slots */ + + /* take first guid minus one to be consistent with SM */ + chassis->chassisguid = node->info.nodeguid - 1; + break; + } +} + +/* + This function fills chassis structure with all nodes + in that chassis + chassis structure = structure of one standalone chassis +*/ +static void build_chassis(struct ibnd_node *node, ibnd_chassis_t *chassis) +{ + int p = 0; + struct ibnd_node *remnode = 0; + ibnd_port_t *port = 0; + + /* we get here with node = chassis_spine */ + insert_spine((ibnd_node_t *)node, chassis); + + /* loop: pass on all ports of node */ + for (p = 1; p <= node->node.info.numports; p++ ) { + port = node->node.ports[p]; + if (!port || !port->remoteport) + continue; + remnode = CONV_NODE_INTERNAL(port->remoteport->node); + + if (!remnode->ch_found) + continue; /* some error - line or router not initialized ? FIXME */ + + insert_line_router(&(remnode->node), chassis); + } + + pass_on_lines_catch_spines(chassis); + /* this pass needed for to catch routers, since routers connected only */ + /* to spines in slot 1 or 4 and we could miss them first time */ + pass_on_spines_catch_lines(chassis); + + /* additional 2 passes needed for to overcome a problem of pure "in-chassis" */ + /* connectivity - extra pass to ensure that all related chips/modules */ + /* inserted into the chassis */ + pass_on_lines_catch_spines(chassis); + pass_on_spines_catch_lines(chassis); + pass_on_spines_interpolate_chguid(chassis); +} + +/*========================================================*/ +/* INTERNAL TO EXTERNAL PORT MAPPING */ +/*========================================================*/ + +/* +Description : On ISR9288/9096 external ports indexing + is not matching the internal ( anafa ) port + indexes. Use this MAP to translate the data you get from + the OpenIB diagnostics (smpquery, ibroute, ibtracert, etc.) + + +Module : sLB-24 + anafa 1 anafa 2 +ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 +int port | 22 23 24 18 17 16 | 22 23 24 18 17 16 +ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 +int port | 19 20 21 15 14 13 | 19 20 21 15 14 13 +------------------------------------------------ + +Module : sLB-8 + anafa 1 anafa 2 +ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 +int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 +ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 +int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 + +-----------> + anafa 1 anafa 2 +ext port | - - 5 - - 6 | - - 7 - - 8 +int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 +ext port | - - 1 - - 2 | - - 3 - - 4 +int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 +------------------------------------------------ + +Module : sLB-2024 + +ext port | 13 14 15 16 17 18 19 20 21 22 23 24 +A1 int port| 13 14 15 16 17 18 19 20 21 22 23 24 +ext port | 1 2 3 4 5 6 7 8 9 10 11 12 +A2 int port| 13 14 15 16 17 18 19 20 21 22 23 24 +--------------------------------------------------- + +*/ + +int int2ext_map_slb24[2][25] = { + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 5, 4, 18, 17, 16, 1, 2, 3, 13, 14, 15 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 11, 10, 24, 23, 22, 7, 8, 9, 19, 20, 21 } + }; +int int2ext_map_slb8[2][25] = { + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 6, 6, 6, 1, 1, 1, 5, 5, 5 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 8, 8, 8, 3, 3, 3, 7, 7, 7 } + }; +int int2ext_map_slb2024[2][25] = { + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }, + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 } + }; +/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ + +/* map internal ports to external ports if appropriate */ +static void +voltaire_portmap(ibnd_port_t *port) +{ + struct ibnd_node *n = CONV_NODE_INTERNAL(port->node); + int portnum = port->portnum; + int chipnum = 0; + ibnd_node_t *node = port->node; + + if (!n->ch_found || !is_line(CONV_NODE_INTERNAL(node)) || (portnum < 13 || portnum > 24)) { + port->ext_portnum = 0; + return; + } + + if (port->node->ch_anafanum < 1 || port->node->ch_anafanum > 2) { + port->ext_portnum = 0; + return; + } + + chipnum = port->node->ch_anafanum - 1; + + if (is_line_24(CONV_NODE_INTERNAL(node))) + port->ext_portnum = int2ext_map_slb24[chipnum][portnum]; + else if (is_line_2024(CONV_NODE_INTERNAL(node))) + port->ext_portnum = int2ext_map_slb2024[chipnum][portnum]; + else + port->ext_portnum = int2ext_map_slb8[chipnum][portnum]; +} + +static void add_chassis(struct ibnd_fabric *fabric) +{ + if (!(fabric->current_chassis = calloc(1, sizeof(ibnd_chassis_t)))) + IBPANIC("out of mem"); + + if (fabric->first_chassis == NULL) { + fabric->first_chassis = fabric->current_chassis; + fabric->last_chassis = fabric->current_chassis; + } else { + fabric->last_chassis->next = fabric->current_chassis; + fabric->last_chassis = fabric->current_chassis; + } +} + +static void +add_node_to_chassis(ibnd_chassis_t *chassis, ibnd_node_t *node) +{ + node->chassis = chassis; + node->next_chassis_node = chassis->nodes; + chassis->nodes = node; +} + +/* + Main grouping function + Algorithm: + 1. pass on every Voltaire node + 2. catch spine chip for every Voltaire node + 2.1 build/interpolate chassis around this chip + 2.2 go to 1. + 3. pass on non Voltaire nodes (SystemImageGUID based grouping) + 4. now group non Voltaire nodes by SystemImageGUID + Returns: + Pointer to the first chassis in a NULL terminated list of chassis in + the fabric specified. +*/ +ibnd_chassis_t *group_nodes(struct ibnd_fabric *fabric) +{ + struct ibnd_node *node; + int dist; + int chassisnum = 0; + ibnd_chassis_t *chassis; + + fabric->first_chassis = NULL; + fabric->current_chassis = NULL; + + /* first pass on switches and build for every Voltaire node */ + /* an appropriate chassis record (slotnum and position) */ + /* according to internal connectivity */ + /* not very efficient but clear code so... */ + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { + if (node->node.info.vendid == VTR_VENDOR_ID) + fill_voltaire_chassis_record(node); + } + } + + /* separate every Voltaire chassis from each other and build linked list of them */ + /* algorithm: catch spine and find all surrounding nodes */ + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { + if (node->node.info.vendid != VTR_VENDOR_ID) + continue; + //if (!node->node.chrecord || node->node.chrecord->chassisnum || !is_spine(node)) + if (!node->ch_found + || (node->node.chassis && node->node.chassis->chassisnum) + || !is_spine(node)) + continue; + add_chassis(fabric); + fabric->current_chassis->chassisnum = ++chassisnum; + build_chassis(node, fabric->current_chassis); + } + } + + /* now make pass on nodes for chassis which are not Voltaire */ + /* grouped by common SystemImageGUID */ + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { + if (node->node.info.vendid == VTR_VENDOR_ID) + continue; + if (node->node.info.sysimgguid) { + chassis = find_chassisguid((ibnd_node_t *)node); + if (chassis) + chassis->nodecount++; + else { + /* Possible new chassis */ + add_chassis(fabric); + fabric->current_chassis->chassisguid = + get_chassisguid((ibnd_node_t *)node); + fabric->current_chassis->nodecount = 1; + } + } + } + } + + /* now, make another pass to see which nodes are part of chassis */ + /* (defined as chassis->nodecount > 1) */ + for (dist = 0; dist <= MAXHOPS; ) { + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { + if (node->node.info.vendid == VTR_VENDOR_ID) + continue; + if (node->node.info.sysimgguid) { + chassis = find_chassisguid((ibnd_node_t *)node); + if (chassis && chassis->nodecount > 1) { + if (!chassis->chassisnum) + chassis->chassisnum = ++chassisnum; + if (!node->ch_found) { + node->ch_found = 1; + add_node_to_chassis(chassis, (ibnd_node_t *)node); + } + } + } + } + if (dist == fabric->fabric.maxhops_discovered) + dist = MAXHOPS; /* skip to CAs */ + else + dist++; + } + + return (fabric->first_chassis); +} diff --git a/infiniband-diags/libibnetdisc/src/chassis.h b/infiniband-diags/libibnetdisc/src/chassis.h new file mode 100644 index 0000000..16dad49 --- /dev/null +++ b/infiniband-diags/libibnetdisc/src/chassis.h @@ -0,0 +1,85 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef _CHASSIS_H_ +#define _CHASSIS_H_ + +#include + +#include "internal.h" + +/*========================================================*/ +/* CHASSIS RECOGNITION SPECIFIC DATA */ +/*========================================================*/ + +/* Device IDs */ +#define VTR_DEVID_IB_FC_ROUTER 0x5a00 +#define VTR_DEVID_IB_IP_ROUTER 0x5a01 +#define VTR_DEVID_ISR9600_SPINE 0x5a02 +#define VTR_DEVID_ISR9600_LEAF 0x5a03 +#define VTR_DEVID_HCA1 0x5a04 +#define VTR_DEVID_HCA2 0x5a44 +#define VTR_DEVID_HCA3 0x6278 +#define VTR_DEVID_SW_6IB4 0x5a05 +#define VTR_DEVID_ISR9024 0x5a06 +#define VTR_DEVID_ISR9288 0x5a07 +#define VTR_DEVID_SLB24 0x5a09 +#define VTR_DEVID_SFB12 0x5a08 +#define VTR_DEVID_SFB4 0x5a0b +#define VTR_DEVID_ISR9024_12 0x5a0c +#define VTR_DEVID_SLB8 0x5a0d +#define VTR_DEVID_RLX_SWITCH_BLADE 0x5a20 +#define VTR_DEVID_ISR9024_DDR 0x5a31 +#define VTR_DEVID_SFB12_DDR 0x5a32 +#define VTR_DEVID_SFB4_DDR 0x5a33 +#define VTR_DEVID_SLB24_DDR 0x5a34 +#define VTR_DEVID_SFB2012 0x5a37 +#define VTR_DEVID_SLB2024 0x5a38 +#define VTR_DEVID_ISR2012 0x5a39 +#define VTR_DEVID_SFB2004 0x5a40 +#define VTR_DEVID_ISR2004 0x5a41 +#define VTR_DEVID_SRB2004 0x5a42 + +/* Vendor IDs (for chassis based systems) */ +#define VTR_VENDOR_ID 0x8f1 /* Voltaire */ +#define TS_VENDOR_ID 0x5ad /* Cisco */ +#define SS_VENDOR_ID 0x66a /* InfiniCon */ +#define XS_VENDOR_ID 0x1397 /* Xsigo */ + +enum ibnd_chassis_type { UNRESOLVED_CT, ISR9288_CT, ISR9096_CT, ISR2012_CT, ISR2004_CT }; +enum ibnd_chassis_slot_type { UNRESOLVED_CS, LINE_CS, SPINE_CS, SRBD_CS }; + +ibnd_chassis_t *group_nodes(struct ibnd_fabric *fabric); + +#endif /* _CHASSIS_H_ */ diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc.c b/infiniband-diags/libibnetdisc/src/ibnetdisc.c new file mode 100644 index 0000000..64e4ece --- /dev/null +++ b/infiniband-diags/libibnetdisc/src/ibnetdisc.c @@ -0,0 +1,872 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Laboratory + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include + +#include "internal.h" +#include "chassis.h" + +static int timeout_ms = 2000; +static int show_progress = 0; + +static char *linkwidth_str[] = { + "??", + "1x", + "4x", + "??", + "8x", + "??", + "??", + "??", + "12x" +}; + +static char *linkspeed_str[] = { + "???", + "SDR", + "DDR", + "???", + "QDR" +}; + +static char *linkspeed_datarate_str[] = { + "???", + "2.5 Gbps", + "5.0 Gbps", + "???", + "10.0 Gbps" +}; + +static char *linkstate_str[] = { + "No State", + "Down", + "Init", + "Armed", + "Active" +}; + +static char *physstate_str[] = { + "No State", + "Sleep", + "Polling", + "Disabled", + "PortConfigTraining", + "LinkUp", + "LinkErrorRecovery", + "Phy Test" +}; + +char * +ibnd_linkwidth_str(int link_width) +{ + if (link_width > 8) + return linkwidth_str[0]; + else + return linkwidth_str[link_width]; +} + +char * +ibnd_linkspeed_str(int link_speed, int data_rate) +{ + if (link_speed > 4) + return linkspeed_str[0]; + else if (data_rate) + return linkspeed_datarate_str[link_speed]; + else + return linkspeed_str[link_speed]; +} +char * +ibnd_linkstate_str(int link_state) +{ + if (link_state > 4) + return linkstate_str[0]; + else + return linkstate_str[link_state]; +} + +char * +ibnd_physstate_str(int phys_state) +{ + if (phys_state > 7) + return physstate_str[0]; + else + return physstate_str[phys_state]; +} + +void +decode_port_info(void * rcv_buf, ibnd_port_info_t *pi) +{ + mad_decode_field(rcv_buf, IB_PORT_LID_F, &pi->lid); + mad_decode_field(rcv_buf, IB_PORT_SMLID_F, &pi->smlid); + + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_SUPPORTED_F, &pi->link_speed_supported); + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_ENABLED_F, &pi->link_speed_enabled); + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_ACTIVE_F, &pi->link_speed_active); + + mad_decode_field(rcv_buf, IB_PORT_LOCAL_PORT_F, &pi->local_port); + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_SUPPORTED_F, &pi->link_width_supported); + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_ENABLED_F, &pi->link_width_enabled); + + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_ACTIVE_F, &pi->link_width_active); + + mad_decode_field(rcv_buf, IB_PORT_DIAG_F, &pi->diag_code); + mad_decode_field(rcv_buf, IB_PORT_MKEY_LEASE_F, &pi->mkey_lease); + mad_decode_field(rcv_buf, IB_PORT_CAPMASK_F, &pi->capability_mask); + mad_decode_field(rcv_buf, IB_PORT_MKEY_F, &pi->mkey); + mad_decode_field(rcv_buf, IB_PORT_GID_PREFIX_F, &pi->gid_prefix); + + mad_decode_field(rcv_buf, IB_PORT_STATE_F, &pi->link_state); + mad_decode_field(rcv_buf, IB_PORT_PHYS_STATE_F, &pi->phys_state); + + mad_decode_field(rcv_buf, IB_PORT_LINK_DOWN_DEF_F, &pi->link_down_def_state); + mad_decode_field(rcv_buf, IB_PORT_MKEY_PROT_BITS_F, &pi->mkey_prot_bits); + + mad_decode_field(rcv_buf, IB_PORT_LMC_F, &pi->lmc); + mad_decode_field(rcv_buf, IB_PORT_NEIGHBOR_MTU_F, &pi->neighbor_mtu); + mad_decode_field(rcv_buf, IB_PORT_SMSL_F, &pi->smsl); + mad_decode_field(rcv_buf, IB_PORT_INIT_TYPE_F, &pi->init_type); + + mad_decode_field(rcv_buf, IB_PORT_VL_CAP_F, &pi->vl_capability); + mad_decode_field(rcv_buf, IB_PORT_VL_HIGH_LIMIT_F, &pi->vl_high_limit); + mad_decode_field(rcv_buf, IB_PORT_VL_ARBITRATION_HIGH_CAP_F, &pi->vl_arb_high_cap); + mad_decode_field(rcv_buf, IB_PORT_VL_ARBITRATION_LOW_CAP_F, &pi->vl_arb_low_cap); + + mad_decode_field(rcv_buf, IB_PORT_INIT_TYPE_REPLY_F, &pi->init_reply); + mad_decode_field(rcv_buf, IB_PORT_MTU_CAP_F, &pi->mtu_cap); + mad_decode_field(rcv_buf, IB_PORT_VL_STALL_COUNT_F, &pi->vl_stall_count); + mad_decode_field(rcv_buf, IB_PORT_HOQ_LIFE_F, &pi->hoq_lifetime); + mad_decode_field(rcv_buf, IB_PORT_OPER_VLS_F, &pi->oper_vls); + mad_decode_field(rcv_buf, IB_PORT_PART_EN_INB_F, &pi->partition_enforce_in); + mad_decode_field(rcv_buf, IB_PORT_PART_EN_OUTB_F, &pi->partition_enforce_out); + mad_decode_field(rcv_buf, IB_PORT_FILTER_RAW_INB_F, &pi->filter_raw_in); + mad_decode_field(rcv_buf, IB_PORT_FILTER_RAW_OUTB_F, &pi->filter_raw_out); + mad_decode_field(rcv_buf, IB_PORT_MKEY_VIOL_F, &pi->mkey_violations); + mad_decode_field(rcv_buf, IB_PORT_PKEY_VIOL_F, &pi->pkey_violations); + mad_decode_field(rcv_buf, IB_PORT_QKEY_VIOL_F, &pi->qkey_violations); + + mad_decode_field(rcv_buf, IB_PORT_GUID_CAP_F, &pi->guid_capabilities); + + mad_decode_field(rcv_buf, IB_PORT_CLIENT_REREG_F, &pi->client_rereg); + mad_decode_field(rcv_buf, IB_PORT_SUBN_TIMEOUT_F, &pi->subnet_timeout); + mad_decode_field(rcv_buf, IB_PORT_RESP_TIME_VAL_F, &pi->response_time_val); + mad_decode_field(rcv_buf, IB_PORT_LOCAL_PHYS_ERR_F, &pi->local_phys_error); + mad_decode_field(rcv_buf, IB_PORT_OVERRUN_ERR_F, &pi->overrun_error); + mad_decode_field(rcv_buf, IB_PORT_MAX_CREDIT_HINT_F, &pi->max_credit_hint); + mad_decode_field(rcv_buf, IB_PORT_LINK_ROUND_TRIP_F, &pi->link_round_trip); +} + +static int +get_port_info(struct ibnd_fabric *fabric, struct ibnd_port *port, + int portnum, ib_portid_t *portid) +{ + char portinfo[64]; + void *pi = portinfo; + + port->port.portnum = portnum; + + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, portnum, timeout_ms, + fabric->ibmad_port)) + return -1; + + decode_port_info(pi, &port->port.info); + + IBND_DEBUG("portid %s portnum %d: lid %d state %d physstate %d %s %s\n", + portid2str(portid), portnum, port->port.info.lid, port->port.info.link_state, + port->port.info.phys_state, ibnd_linkwidth_str(port->port.info.link_width_active), + ibnd_linkspeed_str(port->port.info.link_speed_active, 0)); + return 1; +} + +static void +decode_node_info(void * rcv_buf, ibnd_node_info_t *ni) +{ + mad_decode_field(rcv_buf, IB_NODE_BASE_VERS_F, &ni->base_ver); + mad_decode_field(rcv_buf, IB_NODE_CLASS_VERS_F, &ni->class_ver); + mad_decode_field(rcv_buf, IB_NODE_TYPE_F, &ni->type); + mad_decode_field(rcv_buf, IB_NODE_NPORTS_F, &ni->numports); + mad_decode_field(rcv_buf, IB_NODE_SYSTEM_GUID_F, &ni->sysimgguid); + mad_decode_field(rcv_buf, IB_NODE_GUID_F, &ni->nodeguid); + mad_decode_field(rcv_buf, IB_NODE_PORT_GUID_F, &ni->nodeportguid); + mad_decode_field(rcv_buf, IB_NODE_PARTITION_CAP_F, &ni->partition_cap); + mad_decode_field(rcv_buf, IB_NODE_DEVID_F, &ni->devid); + mad_decode_field(rcv_buf, IB_NODE_REVISION_F, &ni->revision); + mad_decode_field(rcv_buf, IB_NODE_LOCAL_PORT_F, &ni->localport); + mad_decode_field(rcv_buf, IB_NODE_VENDORID_F, &ni->vendid); +} + +/* + * Returns -1 if error. + */ +static int +query_node_info(struct ibnd_fabric *fabric, struct ibnd_node *node, ib_portid_t *portid) +{ + char nodeinfo[64]; + void *ni = nodeinfo; + if (!smp_query_via(ni, portid, IB_ATTR_NODE_INFO, 0, timeout_ms, + fabric->ibmad_port)) + return -1; + decode_node_info(ni, &(node->node.info)); + return (0); +} + +/* + * Returns 0 if non switch node is found, 1 if switch is found, -1 if error. + */ +static int +query_node(struct ibnd_fabric *fabric, struct ibnd_node *inode, + struct ibnd_port *iport, ib_portid_t *portid) +{ + char portinfo[64]; + void *pi = portinfo; + char switchinfo[64]; + void *si = switchinfo; + ibnd_node_t *node = &(inode->node); + ibnd_port_t *port = &(iport->port); + void *nd = inode->node.nodedesc; + + if (query_node_info(fabric, inode, portid)) + return -1; + + port->portnum = node->info.localport; + port->guid = node->info.nodeportguid; + + if (!smp_query_via(nd, portid, IB_ATTR_NODE_DESC, 0, timeout_ms, + fabric->ibmad_port)) + return -1; + + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, 0, timeout_ms, + fabric->ibmad_port)) + return -1; + decode_port_info(pi, &port->info); + + if (node->info.type != IBND_SWITCH_NODE) + return 0; + + node->smalid = port->info.lid; + node->smalmc = port->info.lmc; + + /* after we have the sma information find out the real PortInfo for this port */ + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, node->info.localport, timeout_ms, + fabric->ibmad_port)) + return -1; + decode_port_info(pi, &port->info); + + if (!smp_query_via(si, portid, IB_ATTR_SWITCH_INFO, 0, timeout_ms, + fabric->ibmad_port)) + node->sw_info.smaenhsp0 = 0; /* assume base SP0 */ + else + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &node->sw_info.smaenhsp0); + + IBND_DEBUG("portid %s: got switch node %" PRIx64 " '%s'\n", + portid2str(portid), node->info.nodeguid, node->nodedesc); + return 1; +} + +static int +add_port_to_dpath(ib_dr_path_t *path, int nextport) +{ + if (path->cnt+2 >= sizeof(path->p)) + return -1; + ++path->cnt; + path->p[path->cnt] = nextport; + return path->cnt; +} + +static int +extend_dpath(struct ibnd_fabric *f, ib_dr_path_t *path, int nextport) +{ + int rc = add_port_to_dpath(path, nextport); + if ((rc != -1) && (path->cnt > f->fabric.maxhops_discovered)) + f->fabric.maxhops_discovered = path->cnt; + return (rc); +} + +static void +dump_endnode(ib_portid_t *path, char *prompt, + struct ibnd_node *node, struct ibnd_port *port) +{ + if (!show_progress) + return; + + printf("%s -> %s %s {%016" PRIx64 "} portnum %d lid %d-%d\"%s\"\n", + portid2str(path), prompt, + ibnd_node_type_str((ibnd_node_t *)node), + node->node.info.nodeguid, + node->node.info.type == IBND_SWITCH_NODE ? 0 : port->port.portnum, + port->port.info.lid, port->port.info.lid + (1 << port->port.info.lmc) - 1, + node->node.nodedesc); +} + +static struct ibnd_node * +find_existing_node(struct ibnd_fabric *fabric, struct ibnd_node *new) +{ + int hash = HASHGUID(new->node.info.nodeguid) % HTSZ; + struct ibnd_node *node; + + for (node = fabric->nodestbl[hash]; node; node = node->htnext) + if (node->node.info.nodeguid == new->node.info.nodeguid) + return node; + + return NULL; +} + +ibnd_node_t * +ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid) +{ + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); + int hash = HASHGUID(guid) % HTSZ; + struct ibnd_node *node; + + for (node = f->nodestbl[hash]; node; node = node->htnext) + if (node->node.info.nodeguid == guid) + return (ibnd_node_t *)node; + + return NULL; +} + +ibnd_node_t * +ibnd_update_node(ibnd_node_t *node) +{ + char portinfo[64]; + void *pi = portinfo; + ibnd_port_info_t port0_info; + char switchinfo[64]; + void *si = switchinfo; + void *nd = node->nodedesc; + int p = 0; + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(node->fabric); + struct ibnd_node *n = CONV_NODE_INTERNAL(node); + + if (query_node_info(f, n, &(n->node.path_portid))) + return (NULL); + + if (!smp_query_via(nd, &(n->node.path_portid), IB_ATTR_NODE_DESC, 0, timeout_ms, + f->ibmad_port)) + return (NULL); + + /* update all the port info's */ + for (p = 1; p >= n->node.info.numports; p++) { + get_port_info(f, CONV_PORT_INTERNAL(n->node.ports[p]), p, &(n->node.path_portid)); + } + + if (n->node.info.type != IBND_SWITCH_NODE) + goto done; + + if (!smp_query_via(pi, &(n->node.path_portid), IB_ATTR_PORT_INFO, 0, timeout_ms, + f->ibmad_port)) + return (NULL); + decode_port_info(pi, &port0_info); + + n->node.smalid = port0_info.lid; + n->node.smalmc = port0_info.lmc; + + if (!smp_query_via(si, &(n->node.path_portid), IB_ATTR_SWITCH_INFO, 0, timeout_ms, + f->ibmad_port)) + node->sw_info.smaenhsp0 = 0; /* assume base SP0 */ + else + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &n->node.sw_info.smaenhsp0); + +done: + return (node); +} + +ibnd_node_t * +ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str) +{ + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); + int i = 0; + ibnd_node_t *rc = f->fabric.from_node; + ib_dr_path_t path; + + if (str2drpath(&path, dr_str, 0, 0) == -1) { + return (NULL); + } + + for (i = 0; i <= path.cnt; i++) { + ibnd_port_t *remote_port = NULL; + if (path.p[i] == 0) + continue; + if (!rc->ports) + return (NULL); + + remote_port = rc->ports[path.p[i]]->remoteport; + if (!remote_port) + return (NULL); + + rc = remote_port->node; + } + + return (rc); +} + +static void +add_to_nodeguid_hash(struct ibnd_node *node, struct ibnd_node *hash[]) +{ + int hash_idx = HASHGUID(node->node.info.nodeguid) % HTSZ; + + node->htnext = hash[hash_idx]; + hash[hash_idx] = node; +} + +static void +add_to_portguid_hash(struct ibnd_port *port, struct ibnd_port *hash[]) +{ + int hash_idx = HASHGUID(port->port.guid) % HTSZ; + + port->htnext = hash[hash_idx]; + hash[hash_idx] = port; +} + +static void +add_to_type_list(struct ibnd_node*node, struct ibnd_fabric *fabric) +{ + switch (node->node.info.type) { + case IBND_CA_NODE: + node->type_next = fabric->ch_adapters; + fabric->ch_adapters = node; + break; + case IBND_SWITCH_NODE: + node->type_next = fabric->switches; + fabric->switches = node; + break; + case IBND_ROUTER_NODE: + node->type_next = fabric->routers; + fabric->routers = node; + break; + } +} + +static void +add_to_nodedist(struct ibnd_node *node, struct ibnd_fabric *fabric) +{ + int dist = node->node.dist; + if (node->node.info.type != IBND_SWITCH_NODE) + dist = MAXHOPS; /* special Ca list */ + + node->dnext = fabric->nodesdist[dist]; + fabric->nodesdist[dist] = node; +} + + +static struct ibnd_node * +create_node(struct ibnd_fabric *fabric, struct ibnd_node *temp, ib_portid_t *path, int dist) +{ + struct ibnd_node *node; + + node = malloc(sizeof(*node)); + if (!node) { + IBPANIC("OOM: node creation failed\n"); + return NULL; + } + + memcpy(node, temp, sizeof(*node)); + node->node.dist = dist; + node->node.path_portid = *path; + node->node.fabric = (ibnd_fabric_t *)fabric; + + add_to_nodeguid_hash(node, fabric->nodestbl); + + /* add this to the all nodes list */ + node->node.next = fabric->fabric.nodes; + fabric->fabric.nodes = (ibnd_node_t *)node; + + add_to_type_list(node, fabric); + add_to_nodedist(node, fabric); + + return node; +} + +static struct ibnd_port * +find_existing_port_node(struct ibnd_node *node, struct ibnd_port *port) +{ + if (port->port.portnum > node->node.info.numports || node->node.ports == NULL ) + return (NULL); + + return (CONV_PORT_INTERNAL(node->node.ports[port->port.portnum])); +} + +static struct ibnd_port * +add_port_to_node(struct ibnd_fabric *fabric, struct ibnd_node *node, struct ibnd_port *temp) +{ + struct ibnd_port *port; + + port = malloc(sizeof(*port)); + if (!port) + return NULL; + + memcpy(port, temp, sizeof(*port)); + port->port.node = (ibnd_node_t *)node; + port->port.ext_portnum = 0; + + if (node->node.ports == NULL) { + node->node.ports = calloc(sizeof(*node->node.ports), node->node.info.numports + 1); + if (!node->node.ports) { + IBND_ERROR("Failed to allocate the ports array\n"); + return (NULL); + } + } + + node->node.ports[temp->port.portnum] = (ibnd_port_t *)port; + + add_to_portguid_hash(port, fabric->portstbl); + return port; +} + +static void +link_ports(struct ibnd_node *node, struct ibnd_port *port, + struct ibnd_node *remotenode, struct ibnd_port *remoteport) +{ + IBND_DEBUG("linking: 0x%" PRIx64 " %p->%p:%u and 0x%" PRIx64 " %p->%p:%u\n", + node->node.info.nodeguid, node, port, port->port.portnum, + remotenode->node.info.nodeguid, remotenode, + remoteport, remoteport->port.portnum); + if (port->port.remoteport) + port->port.remoteport->remoteport = NULL; + if (remoteport->port.remoteport) + remoteport->port.remoteport->remoteport = NULL; + port->port.remoteport = (ibnd_port_t *)remoteport; + remoteport->port.remoteport = (ibnd_port_t *)port; +} + +static int +get_remote_node(struct ibnd_fabric *fabric, struct ibnd_node *node, struct ibnd_port *port, ib_portid_t *path, + int portnum, int dist) +{ + struct ibnd_node node_buf; + struct ibnd_port port_buf; + struct ibnd_node *remotenode, *oldnode; + struct ibnd_port *remoteport, *oldport; + + memset(&node_buf, 0, sizeof(node_buf)); + memset(&port_buf, 0, sizeof(port_buf)); + + IBND_DEBUG("handle node %p port %p:%d dist %d\n", node, port, portnum, dist); + if (port->port.info.phys_state != 5) /* LinkUp */ + return -1; + + if (extend_dpath(fabric, &path->drpath, portnum) < 0) + return -1; + + if (query_node(fabric, &node_buf, &port_buf, path) < 0) { + IBWARN("NodeInfo on %s failed, skipping port", + portid2str(path)); + path->drpath.cnt--; /* restore path */ + return -1; + } + + oldnode = find_existing_node(fabric, &node_buf); + if (oldnode) + remotenode = oldnode; + else if (!(remotenode = create_node(fabric, &node_buf, path, dist + 1))) + IBPANIC("no memory"); + + oldport = find_existing_port_node(remotenode, &port_buf); + if (oldport) { + remoteport = oldport; + } else if (!(remoteport = add_port_to_node(fabric, remotenode, &port_buf))) + IBPANIC("no memory"); + + dump_endnode(path, oldnode ? "known remote" : "new remote", + remotenode, remoteport); + + link_ports(node, port, remotenode, remoteport); + + path->drpath.cnt--; /* restore path */ + return 0; +} + +static void * +ibnd_init_port(char *dev_name, int dev_port) +{ + int mgmt_classes[2] = {IB_SMI_CLASS, IB_SMI_DIRECT_CLASS}; + + /* Crank up the mad lib */ + return (mad_rpc_open_port(dev_name, dev_port, mgmt_classes, 2)); +} + +ibnd_fabric_t * +ibnd_discover_fabric(char *dev_name, int dev_port, int timeout_ms, + ib_portid_t *from, int hops) +{ + struct ibnd_fabric *fabric = NULL; + ib_portid_t my_portid = {0}; + struct ibnd_node node_buf; + struct ibnd_port port_buf; + struct ibnd_node *node; + struct ibnd_port *port; + int i; + int dist = 0; + ib_portid_t *path; + int max_hops = MAXHOPS-1; /* default find everything */ + + /* if not everything how much? */ + if (hops >= 0) { + max_hops = hops; + } + + /* If not specified start from "my" port */ + if (!from) { + from = &my_portid; + } + + fabric = malloc(sizeof(*fabric)); + + if (!fabric) { + IBPANIC("OOM: failed to malloc ibnd_fabric_t\n"); + return (NULL); + } + + memset(fabric, 0, sizeof(*fabric)); + + fabric->ibmad_port = ibnd_init_port(dev_name, dev_port); + if (!fabric->ibmad_port) { + IBPANIC("OOM: failed to open \"%s\" port %d\n", + dev_name, dev_port); + goto error; + } + + IBND_DEBUG("from %s\n", portid2str(from)); + + memset(&node_buf, 0, sizeof(node_buf)); + memset(&port_buf, 0, sizeof(port_buf)); + + if (query_node(fabric, &node_buf, &port_buf, from) < 0) { + IBWARN("can't reach node %s\n", portid2str(from)); + goto error; + } + + node = create_node(fabric, &node_buf, from, 0); + if (!node) + goto error; + + fabric->fabric.from_node = (ibnd_node_t *)node; + + port = add_port_to_node(fabric, node, &port_buf); + if (!port) + IBPANIC("out of memory"); + + if (node->node.info.type != IBND_SWITCH_NODE && + get_remote_node(fabric, node, port, from, node->node.info.localport, 0) < 0) + return ((ibnd_fabric_t *)fabric); + + for (dist = 0; dist <= max_hops; dist++) { + + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { + + path = &node->node.path_portid; + + IBND_DEBUG("dist %d node %p\n", dist, node); + dump_endnode(path, "processing", node, port); + + for (i = 1; i <= node->node.info.numports; i++) { + if (i == node->node.info.localport) + continue; + + if (get_port_info(fabric, &port_buf, i, path) < 0) { + IBWARN("can't reach node %s port %d", portid2str(path), i); + continue; + } + + port = find_existing_port_node(node, &port_buf); + if (port) + continue; + + port = add_port_to_node(fabric, node, &port_buf); + if (!port) + IBPANIC("out of memory"); + + /* If switch, set port GUID to node port GUID */ + if (node->node.info.type == IBND_SWITCH_NODE) + port->port.guid = node->node.info.nodeportguid; + + get_remote_node(fabric, node, port, path, i, dist); + } + } + } + + fabric->fabric.chassis = group_nodes(fabric); + + return ((ibnd_fabric_t *)fabric); +error: + free(fabric); + return (NULL); +} + +static void +destroy_node(struct ibnd_node *node) +{ + int p = 0; + + for (p = 0; p <= node->node.info.numports; p++) { + free(node->node.ports[p]); + } + free(node->node.ports); + free(node); +} + +void +ibnd_destroy_fabric(ibnd_fabric_t *fabric) +{ + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); + int dist = 0; + struct ibnd_node *node = NULL; + struct ibnd_node *next = NULL; + ibnd_chassis_t *ch, *ch_next; + + ch = f->first_chassis; + while (ch) { + ch_next = ch->next; + free(ch); + ch = ch_next; + } + for (dist = 0; dist <= MAXHOPS; dist++) { + node = f->nodesdist[dist]; + while (node) { + next = node->dnext; + destroy_node(node); + node = next; + } + } + if (f->ibmad_port) + mad_rpc_close_port(f->ibmad_port); + free(f); +} + +void +ibnd_debug(int i) +{ + if (i) { + ibdebug++; + madrpc_show_errors(1); + umad_debug(i); + } else { + ibdebug = 0; + madrpc_show_errors(0); + umad_debug(0); + } +} + +void +ibnd_show_progress(int i) +{ + show_progress = i; +} + +const char* +ibnd_node_type_str(ibnd_node_t *node) +{ + switch(node->info.type) { + case IBND_CA_NODE: return "Ca"; + case IBND_SWITCH_NODE: return "Switch"; + case IBND_ROUTER_NODE: return "Router"; + } + return "??"; +} + +const char* +ibnd_node_type_str_short(ibnd_node_t *node) +{ + switch(node->info.type) { + case IBND_SWITCH_NODE: return "SW"; + case IBND_CA_NODE: return "CA"; + case IBND_ROUTER_NODE: return "RT"; + } + return "??"; +} + + +void +ibnd_iter_nodes(ibnd_fabric_t *fabric, + ibnd_iter_node_func_t func, + void *user_data) +{ + ibnd_node_t *cur = NULL; + + for (cur = fabric->nodes; cur; cur = cur->next) { + func(cur, user_data); + } +} + + +void +ibnd_iter_nodes_type(ibnd_fabric_t *fabric, + ibnd_iter_node_func_t func, + ibnd_node_type_t node_type, + void *user_data) +{ + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); + struct ibnd_node *list = NULL; + struct ibnd_node *cur = NULL; + + switch (node_type) { + case IBND_SWITCH_NODE: + list = f->switches; + break; + case IBND_CA_NODE: + list = f->ch_adapters; + break; + case IBND_ROUTER_NODE: + list = f->routers; + break; + default: + IBND_DEBUG("Invalid node_type specified %d\n", node_type); + break; + } + + for (cur = list; cur; cur = cur->type_next) { + func((ibnd_node_t *)cur, user_data); + } +} + diff --git a/infiniband-diags/libibnetdisc/src/internal.h b/infiniband-diags/libibnetdisc/src/internal.h new file mode 100644 index 0000000..89f238f --- /dev/null +++ b/infiniband-diags/libibnetdisc/src/internal.h @@ -0,0 +1,82 @@ +/* + * Copyright (c) 2008 Lawrence Livermore National Laboratory + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +/** ========================================================================= + * Define the internal data structures. + */ + +#ifndef _INTERNAL_H_ +#define _INTERNAL_H_ + +#include + +struct ibnd_node { + /* This member MUST BE FIRST */ + ibnd_node_t node; + + /* internal use only */ + unsigned char ch_found; + struct ibnd_node *htnext; /* hash table list */ + struct ibnd_node *dnext; /* nodesdist next */ + struct ibnd_node *type_next; /* next based on type */ +}; +#define CONV_NODE_INTERNAL(node) ((struct ibnd_node *)node) + +struct ibnd_port { + /* This member MUST BE FIRST */ + ibnd_port_t port; + + /* internal use only */ + struct ibnd_port *htnext; +}; +#define CONV_PORT_INTERNAL(port) ((struct ibnd_port *)port) + +struct ibnd_fabric { + /* This member MUST BE FIRST */ + ibnd_fabric_t fabric; + + /* internal use only */ + void *ibmad_port; + struct ibnd_node *nodestbl[HTSZ]; + struct ibnd_port *portstbl[HTSZ]; + struct ibnd_node *nodesdist[MAXHOPS+1]; + ibnd_chassis_t *first_chassis; + ibnd_chassis_t *current_chassis; + ibnd_chassis_t *last_chassis; + struct ibnd_node *switches; + struct ibnd_node *ch_adapters; + struct ibnd_node *routers; +}; +#define CONV_FABRIC_INTERNAL(fabric) ((struct ibnd_fabric *)fabric) + +#endif /* _INTERNAL_H_ */ diff --git a/infiniband-diags/libibnetdisc/src/libibnetdisc.map b/infiniband-diags/libibnetdisc/src/libibnetdisc.map new file mode 100644 index 0000000..5e8c315 --- /dev/null +++ b/infiniband-diags/libibnetdisc/src/libibnetdisc.map @@ -0,0 +1,27 @@ +IBNETDISC_1.0 { + global: + ibnd_debug; + ibnd_show_progress; + ibnd_discover_fabric; + ibnd_cache_fabric; + ibnd_read_fabric; + ibnd_destroy_fabric; + ibnd_find_node_guid; + ibnd_update_node; + ibnd_find_node_dr; + ibnd_linkwidth_str; + ibnd_linkspeed_str; + ibnd_node_type_str; + ibnd_node_type_str_short; + ibnd_is_xsigo_guid; + ibnd_is_xsigo_tca; + ibnd_is_xsigo_hca; + ibnd_get_chassis_guid; + ibnd_get_chassis_type; + ibnd_get_chassis_slot_str; + ibnd_linkstate_str; + ibnd_physstate_str; + ibnd_iter_nodes; + ibnd_iter_nodes_type; + local: *; +}; diff --git a/infiniband-diags/libibnetdisc/test/iblinkinfotest.c b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c new file mode 100644 index 0000000..6e63f4a --- /dev/null +++ b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c @@ -0,0 +1,395 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +char *argv0 = "iblinkinfotest"; +static FILE *f; + +static char *node_name_map_file = NULL; +static nn_map_t *node_name_map = NULL; + +static int timeout_ms = 500; + +static int debug = 0; +#define DEBUG(str, args...) \ + if (debug) fprintf(stderr, str, ##args) + +static int down_links_only = 0; +static int line_mode = 0; +static int add_sw_settings = 0; +static int print_port_guids = 0; + +static unsigned int +get_max(unsigned int num) +{ + unsigned int v = num; // 32-bit word to find the log base 2 of + unsigned r = 0; // r will be lg(v) + + while (v >>= 1) // unroll for more speed... + { + r++; + } + + return (1 << r); +} + +void +get_msg(char *width_msg, char *speed_msg, int msg_size, ibnd_port_t *port) +{ + int max_speed = 0; + + int max_width = get_max(port->info.link_width_supported + & port->remoteport->info.link_width_supported); + if ((max_width & port->info.link_width_active) == 0) { + // we are not at the max supported width + // print what we could be at. + snprintf(width_msg, msg_size, "Could be %s", + ibnd_linkwidth_str(max_width)); + } + + max_speed = get_max(port->info.link_speed_supported + & port->remoteport->info.link_speed_supported); + if ((max_speed & port->info.link_speed_active) == 0) { + // we are not at the max supported speed + // print what we could be at. + snprintf(speed_msg, msg_size, "Could be %s", + ibnd_linkspeed_str(max_speed, 1)); + } +} + +void +print_port(ibnd_node_t *node, ibnd_port_t *port) +{ + char remote_guid_str[256]; + char remote_str[256]; + char link_str[256]; + char width_msg[256]; + char speed_msg[256]; + char ext_port_str[256]; + + if (!port) + return; + + remote_guid_str[0] = '\0'; + remote_str[0] = '\0'; + link_str[0] = '\0'; + width_msg[0] = '\0'; + speed_msg[0] = '\0'; + + if (port->remoteport) { + char remote_name_buf[256]; + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); + + if (port->remoteport->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); + else + ext_port_str[0] = '\0'; + + get_msg(width_msg, speed_msg, 256, port); + if (line_mode) { + if (print_port_guids) { + snprintf(remote_guid_str, 256, + "0x%016lx ", + port->remoteport->guid); + } else { + snprintf(remote_guid_str, 256, + "0x%016lx ", + port->remoteport->node->info.nodeguid); + } + } + + snprintf(remote_str, 256, + "%s%6d %4d[%2s] \"%s\" (%s %s)\n", + remote_guid_str, + port->remoteport->info.lid ? + port->remoteport->info.lid : + port->remoteport->node->smalid, + port->remoteport->portnum, + ext_port_str, + remap_node_name(node_name_map, + port->remoteport->node->info.nodeguid, + remote_name_buf), + width_msg, + speed_msg + ); + } else { + snprintf(remote_str, 256, + "%6s %4s[%2s] \"\" ( )\n", "", "", ""); + } + + if (add_sw_settings) { + snprintf(link_str, 256, + "(%3s %s %6s/%8s) (HOQ:%d VL_Stall:%d)", + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active, 1), + ibnd_linkstate_str(port->info.link_state), + ibnd_physstate_str(port->info.phys_state), + port->info.hoq_lifetime, + port->info.vl_stall_count + ); + } else { + snprintf(link_str, 256, + "(%3s %s %6s/%8s)", + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active, 1), + ibnd_linkstate_str(port->info.link_state), + ibnd_physstate_str(port->info.phys_state) + ); + } + + if (port->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->ext_portnum); + else + ext_port_str[0] = '\0'; + + if (line_mode) { + char name_buf[256]; + strncpy(name_buf, node->nodedesc, 256); + printf("0x%016lx \"%30s\" %6d %4d[%2s] ==%s==> %s", + node->info.nodeguid, + remap_node_name(node_name_map, + node->info.nodeguid, + name_buf), + node->smalid, port->portnum, + ext_port_str, + link_str, + remote_str + ); + } else { + printf(" %6d %4d[%2s] ==%s==> %s", + node->smalid, port->portnum, + ext_port_str, + link_str, + remote_str + ); + } +} + +void +print_switch(ibnd_node_t *node, void *user_data) +{ + int i = 0; + + if (!line_mode) { + char name_buf[256]; + strncpy(name_buf, node->nodedesc, 256); + printf("Switch 0x%016lx %s:\n", + node->info.nodeguid, + remap_node_name(node_name_map, + node->info.nodeguid, + name_buf)); + } + + for (i = 1; i <= node->info.numports; i++) { + ibnd_port_t *port = node->ports[i]; + if (!port) + continue; + if (!down_links_only || port->info.link_state == IBND_LINK_DOWN) { + print_port(node, port); + } + } +} + +void +usage(void) +{ + fprintf(stderr, + "Usage: %s [-hclp -S -D -C -P ]\n" + " Report link speed and connection for each port of each switch which is active\n" + " -h This help message\n" + " -S output only the node specified by guid\n" + " -D print only node specified by \n" + " -f specify node to start \"from\"\n" + " -n Number of hops to include away from specified node\n" + " -d print only down links\n" + " -l (line mode) print all information for each link on each line\n" + " -p print additional switch settings (PktLifeTime,HoqLife,VLStallCount)\n" + + + " -t timeout for any single fabric query\n" + " -s show errors\n" + " --node-name-map use specified node name map\n" + + " -C use selected Channel Adaptor name for queries\n" + " -P use selected channel adaptor port for queries\n" + " -g print port guids instead of node guids\n" + " --debug print debug messages\n" + , + argv0); + exit(-1); +} + +int +main(int argc, char **argv) +{ + char *ca = 0; + int ca_port = 0; + ibnd_fabric_t *fabric = NULL; + uint64_t guid = 0; + char *dr_path = NULL; + char *from = NULL; + int hops = 0; + ib_portid_t port_id; + + static char const str_opts[] = "S:D:n:C:P:t:sldgphuf:"; + static const struct option long_opts[] = { + { "S", 1, 0, 'S'}, + { "D", 1, 0, 'D'}, + { "num-hops", 1, 0, 'n'}, + { "down-links-only", 0, 0, 'd'}, + { "line-mode", 0, 0, 'l'}, + { "ca-name", 1, 0, 'C'}, + { "ca-port", 1, 0, 'P'}, + { "timeout", 1, 0, 't'}, + { "show", 0, 0, 's'}, + { "print-port-guids", 0, 0, 'g'}, + { "print-additional", 0, 0, 'p'}, + { "help", 0, 0, 'h'}, + { "usage", 0, 0, 'u'}, + { "node-name-map", 1, 0, 1}, + { "debug", 0, 0, 2}, + { "from", 1, 0, 'f'}, + { } + }; + + f = stdout; + + argv0 = argv[0]; + + while (1) { + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); + if ( ch == -1 ) + break; + switch(ch) { + case 1: + node_name_map_file = strdup(optarg); + break; + case 2: + debug = 1; + ibnd_debug(1); + break; + case 'f': + from = strdup(optarg); + break; + case 'C': + ca = strdup(optarg); + break; + case 'P': + ca_port = strtoul(optarg, 0, 0); + break; + case 'D': + dr_path = strdup(optarg); + break; + case 'n': + hops = (int)strtol(optarg, NULL, 0); + break; + case 'd': + down_links_only = 1; + break; + case 'l': + line_mode = 1; + break; + case 't': + timeout_ms = strtoul(optarg, 0, 0); + break; + case 'g': + print_port_guids = 1; + break; + case 'S': + guid = (uint64_t)strtoull(optarg, 0, 0); + break; + case 'p': + add_sw_settings = 1; + break; + default: + usage(); + break; + } + } + argc -= optind; + argv += optind; + + if (argc && !(f = fopen(argv[0], "w"))) + fprintf(stderr, "can't open file %s for writing", argv[0]); + + node_name_map = open_node_name_map(node_name_map_file); + + if (from) { + /* only scan part of the fabric */ + str2drpath(&(port_id.drpath), from, 0, 0); + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + guid = 0; + } else { + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + } + + if (guid) { + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); + print_switch(sw, NULL); + } else if (dr_path) { + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); + print_switch(sw, NULL); + } else { + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); + } + + ibnd_destroy_fabric(fabric); + + close_node_name_map(node_name_map); + exit(0); +} diff --git a/infiniband-diags/libibnetdisc/test/ibnetdisctest.c b/infiniband-diags/libibnetdisc/test/ibnetdisctest.c new file mode 100644 index 0000000..fc6e234 --- /dev/null +++ b/infiniband-diags/libibnetdisc/test/ibnetdisctest.c @@ -0,0 +1,675 @@ +/* + * Copyright (c) 2004-2008 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +static int verbose; +#define LIST_CA_NODE (1 << IBND_CA_NODE) +#define LIST_SWITCH_NODE (1 << IBND_SWITCH_NODE) +#define LIST_ROUTER_NODE (1 << IBND_ROUTER_NODE) + +char *argv0 = "ibnetdiscover"; +static FILE *f; + +static char *node_name_map_file = NULL; +static nn_map_t *node_name_map = NULL; + +static int timeout_ms = 2000; + +static int debug = 0; +#define DEBUG(str, args...) \ + if (debug) fprintf(stderr, str, ##args) + + +char * +node_name(ibnd_node_t *node) +{ + static char buf[256]; + + switch(node->info.type) { + case IBND_CA_NODE: + sprintf(buf, "\"%s", "H"); + break; + case IBND_SWITCH_NODE: + sprintf(buf, "\"%s", "S"); + break; + case IBND_ROUTER_NODE: + sprintf(buf, "\"%s", "R"); + break; + default: + sprintf(buf, "\"%s", "?"); + break; + } + sprintf(buf+2, "-%016" PRIx64 "\"", node->info.nodeguid); + + return buf; +} + +void +list_node(ibnd_node_t *node, void *user_data) +{ + char *nodename = remap_node_name(node_name_map, node->info.nodeguid, + node->nodedesc); + + fprintf(f, "%s\t : 0x%016" PRIx64 " ports %d devid 0x%x vendid 0x%x \"%s\"\n", + ibnd_node_type_str(node), + node->info.nodeguid, node->info.numports, node->info.devid, + node->info.vendid, + nodename); + + free(nodename); +} + +void +list_nodes(ibnd_fabric_t *fabric, int list) +{ + if (list & LIST_CA_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_CA_NODE, NULL); + } + if (list & LIST_SWITCH_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_SWITCH_NODE, NULL); + } + if (list & LIST_ROUTER_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_ROUTER_NODE, NULL); + } +} + +void +out_ids(ibnd_node_t *node, int group, char *chname) +{ + fprintf(f, "\nvendid=0x%x\ndevid=0x%x\n", node->info.vendid, node->info.devid); + if (node->info.sysimgguid) + fprintf(f, "sysimgguid=0x%" PRIx64, node->info.sysimgguid); + if (group + && node->chassis && node->chassis->chassisnum) { + fprintf(f, "\t\t# Chassis %d", node->chassis->chassisnum); + if (chname) + fprintf(f, " (%s)", clean_nodedesc(chname)); + if (ibnd_is_xsigo_tca(node->info.nodeguid) + && node->ports[1] + && node->ports[1]->remoteport) + fprintf(f, " slot %d", node->ports[1]->remoteport->portnum); + } + fprintf(f, "\n"); +} + + +uint64_t +out_chassis(ibnd_fabric_t *fabric, int chassisnum) +{ + uint64_t guid; + + fprintf(f, "\nChassis %d", chassisnum); + guid = ibnd_get_chassis_guid(fabric, chassisnum); + if (guid) + fprintf(f, " (guid 0x%" PRIx64 ")", guid); + fprintf(f, "\n"); + return guid; +} + +void +out_switch(ibnd_node_t *node, int group, char *chname) +{ + char *str; + char str2[256]; + char *nodename = NULL; + + out_ids(node, group, chname); + fprintf(f, "switchguid=0x%" PRIx64, node->info.nodeguid); + fprintf(f, "(%" PRIx64 ")", node->info.nodeportguid); + if (group) { + str = ibnd_get_chassis_type(node); + if (str) + fprintf(f, "%s ", str); + str = ibnd_get_chassis_slot_str(node, str2, 256); + if (str) + fprintf(f, "%s", str); + } + + nodename = remap_node_name(node_name_map, node->info.nodeguid, + node->nodedesc); + + fprintf(f, "\nSwitch\t%d %s\t\t# \"%s\" %s port 0 lid %d lmc %d\n", + node->info.numports, node_name(node), + nodename, + node->sw_info.smaenhsp0 ? "enhanced" : "base", + node->smalid, node->smalmc); + + free(nodename); +} + +void +out_ca(ibnd_node_t *node, int group, char *chname) +{ + char *node_type; + char *node_type2; + + out_ids(node, group, chname); + switch(node->info.type) { + case IBND_CA_NODE: + node_type = "ca"; + node_type2 = "Ca"; + break; + case IBND_ROUTER_NODE: + node_type = "rt"; + node_type2 = "Rt"; + break; + default: + node_type = "???"; + node_type2 = "???"; + break; + } + + fprintf(f, "%sguid=0x%" PRIx64 "\n", node_type, node->info.nodeguid); + fprintf(f, "%s\t%d %s\t\t# \"%s\"", + node_type2, node->info.numports, node_name(node), + clean_nodedesc(node->nodedesc)); + if (group && ibnd_is_xsigo_hca(node->info.nodeguid)) + fprintf(f, " (scp)"); + fprintf(f, "\n"); +} + +#define OUT_BUFFER_SIZE 16 +static char * +out_ext_port(ibnd_port_t *port, int group) +{ + static char mapping[OUT_BUFFER_SIZE]; + + if (group && port->ext_portnum != 0) { + snprintf(mapping, OUT_BUFFER_SIZE, + "[ext %d]", port->ext_portnum); + return (mapping); + } + + return (NULL); +} + +void +out_switch_port(ibnd_port_t *port, int group) +{ + char *ext_port_str = NULL; + char *rem_nodename = NULL; + + DEBUG("port %p:%d remoteport %p\n", port, port->portnum, port->remoteport); + fprintf(f, "[%d]", port->portnum); + + ext_port_str = out_ext_port(port, group); + if (ext_port_str) + fprintf(f, "%s", ext_port_str); + + rem_nodename = remap_node_name(node_name_map, + port->remoteport->node->info.nodeguid, + port->remoteport->node->nodedesc); + + ext_port_str = out_ext_port(port->remoteport, group); + fprintf(f, "\t%s[%d]%s", + node_name(port->remoteport->node), + port->remoteport->portnum, + ext_port_str ? ext_port_str : ""); + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) + fprintf(f, "(%" PRIx64 ") ", port->remoteport->guid); + fprintf(f, "\t\t# \"%s\" lid %d %s%s", + rem_nodename, + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active, 0)); + + if (ibnd_is_xsigo_tca(port->remoteport->guid)) + fprintf(f, " slot %d", port->portnum); + else if (ibnd_is_xsigo_hca(port->remoteport->guid)) + fprintf(f, " (scp)"); + fprintf(f, "\n"); + + free(rem_nodename); +} + +void +out_ca_port(ibnd_port_t *port, int group) +{ + char *str = NULL; + char *rem_nodename = NULL; + + fprintf(f, "[%d]", port->portnum); + if (port->node->info.type != IBND_SWITCH_NODE) + fprintf(f, "(%" PRIx64 ") ", port->guid); + fprintf(f, "\t%s[%d]", + node_name(port->remoteport->node), + port->remoteport->portnum); + str = out_ext_port(port->remoteport, group); + if (str) + fprintf(f, "%s", str); + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) + fprintf(f, " (%" PRIx64 ") ", port->remoteport->guid); + + rem_nodename = remap_node_name(node_name_map, + port->remoteport->node->info.nodeguid, + port->remoteport->node->nodedesc); + + fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d %s%s\n", + port->info.lid, port->info.lmc, rem_nodename, + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active, 0)); + + free(rem_nodename); +} + +struct iter_user_data { + int group; + int skip_chassis_nodes; +}; + +static void +switch_iter_func(ibnd_node_t *node, void *iter_user_data) +{ + ibnd_port_t *port; + int p = 0; + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; + + DEBUG("SWITCH: node %p\n", node); + + /* skip chassis based switches if flagged */ + if (data->skip_chassis_nodes && node->chassis && node->chassis->chassisnum) + return; + + out_switch(node, data->group, NULL); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_switch_port(port, data->group); + } +} + +static void +ca_iter_func(ibnd_node_t *node, void *iter_user_data) +{ + ibnd_port_t *port; + int p = 0; + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; + + DEBUG("CA: node %p\n", node); + /* Now, skip chassis based CAs */ + if (data->group && node->chassis && node->chassis->chassisnum) + return; + out_ca(node, data->group, NULL); + + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_ca_port(port, data->group); + } +} + +static void +router_iter_func(ibnd_node_t *node, void *iter_user_data) +{ + ibnd_port_t *port; + int p = 0; + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; + + DEBUG("RT: node %p\n", node); + /* Now, skip chassis based RTs */ + if (data->group && node->chassis && node->chassis->chassisnum) + return; + out_ca(node, data->group, NULL); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_ca_port(port, data->group); + } +} + +int +dump_topology(int group, ibnd_fabric_t *fabric) +{ + ibnd_node_t *node; + ibnd_port_t *port; + int i = 0, p = 0; + time_t t = time(0); + uint64_t chguid; + char *chname = NULL; + struct iter_user_data iter_user_data; + + fprintf(f, "#\n# Topology file: generated on %s#\n", ctime(&t)); + fprintf(f, "# Max of %d hops discovered\n", fabric->maxhops_discovered); + fprintf(f, "# Initiated from node %016" PRIx64 " port %016" PRIx64 "\n", + fabric->from_node->info.nodeguid, fabric->from_node->info.nodeportguid); + + /* Make pass on switches */ + if (group) { + ibnd_chassis_t *ch = NULL; + + /* Chassis based switches first */ + for (ch = fabric->chassis; ch; ch = ch->next) { + int n = 0; + + if (!ch->chassisnum) + continue; + chguid = out_chassis(fabric, ch->chassisnum); + + chname = NULL; +/** + * Will this work for Xsigo? + */ + if (ibnd_is_xsigo_guid(chguid)) { + for (node = ch->nodes; node; + node = node->next_chassis_node) { + if (ibnd_is_xsigo_hca(node->info.nodeguid)) { + chname = node->nodedesc; + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); + } + } + +#if 0 +/** + * vs. this? + * I don't want to expose the nodesdist array to the end user. + */ + for (node = fabric->nodesdist[MAXHOPS]; node; node = node->dnext) { + if (!node->chrecord || + !node->chrecord->chassisnum) + continue; + + if (node->chrecord->chassisnum != ch->chassisnum) + continue; + + if (ibnd_is_xsigo_hca(node->nodeguid)) { + chname = node->nodedesc; + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); + } + } +#endif + } + + fprintf(f, "\n# Spine Nodes"); + for (n = 1; n <= SPINES_MAX_NUM; n++) { + if (ch->spinenode[n]) { + out_switch(ch->spinenode[n], group, chname); + for (p = 1; p <= ch->spinenode[n]->info.numports; p++) { + port = ch->spinenode[n]->ports[p]; + if (port && port->remoteport) + out_switch_port(port, group); + } + } + } + fprintf(f, "\n# Line Nodes"); + for (n = 1; n <= LINES_MAX_NUM; n++) { + if (ch->linenode[n]) { + out_switch(ch->linenode[n], group, chname); + for (p = 1; p <= ch->linenode[n]->info.numports; p++) { + port = ch->linenode[n]->ports[p]; + if (port && port->remoteport) + out_switch_port(port, group); + } + } + } + + fprintf(f, "\n# Chassis Switches"); + for (node = ch->nodes; node; + node = node->next_chassis_node) { + if (node->info.type == IBND_SWITCH_NODE) { + out_switch(node, group, chname); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_switch_port(port, group); + } + } + } + + fprintf(f, "\n# Chassis CAs"); + for (node = ch->nodes; node; + node = node->next_chassis_node) { + if (node->info.type == IBND_CA_NODE) { + out_ca(node, group, chname); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_ca_port(port, group); + } + } + } + + } + + } else { /* !group */ + iter_user_data.group = group; + iter_user_data.skip_chassis_nodes = 0; + + ibnd_iter_nodes_type(fabric, switch_iter_func, + IBND_SWITCH_NODE, &iter_user_data); + } + + chname = NULL; + if (group) { + iter_user_data.group = group; + iter_user_data.skip_chassis_nodes = 1; + + fprintf(f, "\nNon-Chassis Nodes\n"); + ibnd_iter_nodes_type(fabric, switch_iter_func, + IBND_SWITCH_NODE, &iter_user_data); + + } + + iter_user_data.group = group; + iter_user_data.skip_chassis_nodes = 0; + + /* Make pass on CAs */ + ibnd_iter_nodes_type(fabric, ca_iter_func, IBND_CA_NODE, + &iter_user_data); + + /* make pass on routers */ + ibnd_iter_nodes_type(fabric, router_iter_func, IBND_ROUTER_NODE, + &iter_user_data); + + return i; +} + + +void dump_ports_report (ibnd_node_t *node, void *user_data) +{ + int p = 0; + ibnd_port_t *port = NULL; + + /* for each port */ + for (p = node->info.numports, port = node->ports[p]; + p > 0; + port = node->ports[--p]) { + if (port == NULL) + continue; + + fprintf(stdout, + "%2s %5d %2d 0x%016" PRIx64 " %s %s", + ibnd_node_type_str_short(node), + node->info.type == IBND_SWITCH_NODE ? node->smalid : port->info.lid, + port->portnum, + port->guid, + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active, 0)); + if (port->remoteport) + fprintf(stdout, + " - %2s %5d %2d 0x%016" PRIx64 + " ( '%s' - '%s' )\n", + ibnd_node_type_str_short(port->remoteport->node), + port->remoteport->node->info.type == IBND_SWITCH_NODE ? + port->remoteport->node->smalid : port->remoteport->info.lid, + port->remoteport->portnum, + port->remoteport->guid, + port->node->nodedesc, + port->remoteport->node->nodedesc); + else + fprintf(stdout, "%36s'%s'\n", "", + port->node->nodedesc); + } +} + +void +usage(void) +{ + fprintf(stderr, "Usage: %s [-d(ebug)] -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -R(outer_list) -V(ersion) -C ca_name -P ca_port " + "-t(imeout) timeout_ms --node-name-map node-name-map] -p(orts) []\n", + argv0); + fprintf(stderr, " --node-name-map specify a node name map file\n"); + exit(-1); +} + +int +main(int argc, char **argv) +{ + int list = 0; + char *ca = 0; + int ca_port = 0; + int group = 0; + int ports_report = 0; + ibnd_fabric_t *fabric = NULL; + + static char const str_opts[] = "C:P:t:devslgHSRpVhu"; + static const struct option long_opts[] = { + { "C", 1, 0, 'C'}, + { "P", 1, 0, 'P'}, + { "debug", 0, 0, 'd'}, + { "verbose", 0, 0, 'v'}, + { "show", 0, 0, 's'}, + { "list", 0, 0, 'l'}, + { "grouping", 0, 0, 'g'}, + { "Hca_list", 0, 0, 'H'}, + { "Switch_list", 0, 0, 'S'}, + { "Router_list", 0, 0, 'R'}, + { "timeout", 1, 0, 't'}, + { "node-name-map", 1, 0, 1}, + { "ports", 0, 0, 'p'}, + { "Version", 0, 0, 'V'}, + { "help", 0, 0, 'h'}, + { "usage", 0, 0, 'u'}, + { } + }; + + f = stdout; + + argv0 = argv[0]; + + while (1) { + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); + if ( ch == -1 ) + break; + switch(ch) { + case 1: + node_name_map_file = strdup(optarg); + break; + case 'C': + ca = optarg; + break; + case 'P': + ca_port = strtoul(optarg, 0, 0); + break; + case 'd': + debug = 1; + ibnd_debug(1); + break; + case 't': + timeout_ms = strtoul(optarg, 0, 0); + break; + case 'v': + verbose++; + break; + case 's': + ibnd_show_progress(1); + break; + case 'l': + list = LIST_CA_NODE | LIST_SWITCH_NODE | LIST_ROUTER_NODE; + break; + case 'g': + group = 1; + break; + case 'S': + list |= LIST_SWITCH_NODE; + break; + case 'H': + list |= LIST_CA_NODE; + break; + case 'R': + list |= LIST_ROUTER_NODE; + break; + case 'p': + ports_report = 1; + break; + default: + usage(); + break; + } + } + argc -= optind; + argv += optind; + + if (argc && !(f = fopen(argv[0], "w"))) + fprintf(stderr, "can't open file %s for writing", argv[0]); + + node_name_map = open_node_name_map(node_name_map_file); + + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + + if (ports_report) + ibnd_iter_nodes(fabric, + dump_ports_report, + NULL); + else if (list) + list_nodes(fabric, list); + else + dump_topology(group, fabric); + + ibnd_destroy_fabric(fabric); + close_node_name_map(node_name_map); + exit(0); +} diff --git a/infiniband-diags/libibnetdisc/test/testleaks.c b/infiniband-diags/libibnetdisc/test/testleaks.c new file mode 100644 index 0000000..3fbf7af --- /dev/null +++ b/infiniband-diags/libibnetdisc/test/testleaks.c @@ -0,0 +1,268 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +char *argv0 = "iblinkinfotest"; +static FILE *f; + +static int timeout_ms = 500; + +void +print_port(ibnd_node_t *node, ibnd_port_t *port) +{ + char remote_guid_str[256]; + char remote_str[256]; + char link_str[256]; + char speed_msg[256]; + char ext_port_str[256]; + + if (!port) + return; + + remote_guid_str[0] = '\0'; + remote_str[0] = '\0'; + link_str[0] = '\0'; + speed_msg[0] = '\0'; + + if (port->remoteport) { + char remote_name_buf[256]; + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); + + if (port->remoteport->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); + else + ext_port_str[0] = '\0'; + + snprintf(remote_str, 256, + "%s%6d %4d[%2s] \"%s\" (%s)\n", + remote_guid_str, + port->remoteport->info.lid ? + port->remoteport->info.lid : + port->remoteport->node->smalid, + port->remoteport->portnum, + ext_port_str, + port->remoteport->node->nodedesc, + speed_msg + ); + } else { + snprintf(remote_str, 256, + "%6s %4s[%2s] \"\" ( )\n", "", "", ""); + } + + snprintf(link_str, 256, + "(%3s %s %6s/%8s)", + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active, 0), + ibnd_linkstate_str(port->info.link_state), + ibnd_physstate_str(port->info.phys_state) + ); + + if (port->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->ext_portnum); + else + ext_port_str[0] = '\0'; + + printf(" %6d %4d[%2s] ==%s==> %s", + node->smalid, port->portnum, + ext_port_str, + link_str, + remote_str + ); +} + +void +print_switch(ibnd_node_t *node, void *user_data) +{ + int i = 0; + + for (i = 1; i <= node->info.numports; i++) { + ibnd_port_t *port = node->ports[i]; + if (!port) + continue; + if (port->info.link_state == IBND_LINK_DOWN) { + print_port(node, port); + } + } +} + +void +usage(void) +{ + fprintf(stderr, + "Usage: %s [-hclp -S -D -C -P ]\n" + " Report link speed and connection for each port of each switch which is active\n" + " -h This help message\n" + " -i Number of iterations to run (default -1 == infinate)\n" + + " -S output only the node specified by guid\n" + " -D print only node specified by \n" + " -f specify node to start \"from\"\n" + " -n Number of hops to include away from specified node\n" + + " -t timeout for any single fabric query\n" + " -s show errors\n" + + " -C use selected Channel Adaptor name for queries\n" + " -P use selected channel adaptor port for queries\n" + " --debug print debug messages\n" + , + argv0); + exit(-1); +} + +int +main(int argc, char **argv) +{ + char *ca = 0; + int ca_port = 0; + ibnd_fabric_t *fabric = NULL; + uint64_t guid = 0; + char *dr_path = NULL; + char *from = NULL; + int hops = 0; + ib_portid_t port_id; + int iters = -1; + + static char const str_opts[] = "S:D:n:C:P:t:shuf:i:"; + static const struct option long_opts[] = { + { "S", 1, 0, 'S'}, + { "D", 1, 0, 'D'}, + { "num-hops", 1, 0, 'n'}, + { "ca-name", 1, 0, 'C'}, + { "ca-port", 1, 0, 'P'}, + { "timeout", 1, 0, 't'}, + { "show", 0, 0, 's'}, + { "help", 0, 0, 'h'}, + { "usage", 0, 0, 'u'}, + { "debug", 0, 0, 2}, + { "from", 1, 0, 'f'}, + { "iters", 1, 0, 'i'}, + { } + }; + + f = stdout; + + argv0 = argv[0]; + + while (1) { + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); + if ( ch == -1 ) + break; + switch(ch) { + case 2: + ibnd_debug(1); + break; + case 'f': + from = strdup(optarg); + break; + case 'C': + ca = strdup(optarg); + break; + case 'P': + ca_port = strtoul(optarg, 0, 0); + break; + case 'D': + dr_path = strdup(optarg); + break; + case 'n': + hops = (int)strtol(optarg, NULL, 0); + break; + case 'i': + iters = (int)strtol(optarg, NULL, 0); + break; + case 't': + timeout_ms = strtoul(optarg, 0, 0); + break; + case 'S': + guid = (uint64_t)strtoull(optarg, 0, 0); + break; + default: + usage(); + break; + } + } + argc -= optind; + argv += optind; + + while (iters == -1 || iters-- > 0) { + if (from) { + /* only scan part of the fabric */ + str2drpath(&(port_id.drpath), from, 0, 0); + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + guid = 0; + } else { + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + } + +#if 0 + if (guid) { + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); + print_switch(sw, NULL); + } else if (dr_path) { + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); + print_switch(sw, NULL); + } else { + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); + } +#endif + + ibnd_destroy_fabric(fabric); + } + + exit(0); +} -- 1.5.4.5 From weiny2 at llnl.gov Thu Dec 11 16:20:43 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 11 Dec 2008 16:20:43 -0800 Subject: [ofa-general] [PATCH V2 2/3] Convert iblinkinfo.pl to C and use new ibnetdisc library. Message-ID: <20081211162043.26da1b09.weiny2@llnl.gov> >From e1761012d91ec0655564bed41e8f2a5e95a36569 Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Mon, 1 Dec 2008 14:55:10 -0800 Subject: [PATCH] Convert iblinkinfo.pl to C and use new ibnetdisc library. Signed-off-by: Ira Weiny --- infiniband-diags/Makefile.am | 12 +- infiniband-diags/scripts/iblinkinfo.pl | 327 ------------------------ infiniband-diags/src/iblinkinfo.c | 423 ++++++++++++++++++++++++++++++++ 3 files changed, 432 insertions(+), 330 deletions(-) delete mode 100755 infiniband-diags/scripts/iblinkinfo.pl create mode 100644 infiniband-diags/src/iblinkinfo.c diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am index 8e8c3c1..274ef89 100644 --- a/infiniband-diags/Makefile.am +++ b/infiniband-diags/Makefile.am @@ -1,6 +1,7 @@ SUBDIRS = libibnetdisc -INCLUDES = -I$(top_builddir)/include/ -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband +INCLUDES = -I$(top_builddir)/include/ -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband \ + -I$(top_builddir)/libibnetdisc/include if DEBUG DBGFLAGS = -ggdb -D_DEBUG_ @@ -11,7 +12,7 @@ endif sbin_PROGRAMS = src/ibaddr src/ibnetdiscover src/ibping src/ibportstate \ src/ibroute src/ibstat src/ibsysstat src/ibtracert \ src/perfquery src/sminfo src/smpdump src/smpquery \ - src/saquery src/vendstat + src/saquery src/vendstat src/iblinkinfo.pl if ENABLE_TEST_UTILS sbin_PROGRAMS += src/ibsendtrap src/mcm_rereg_test @@ -28,7 +29,7 @@ sbin_SCRIPTS = scripts/ibcheckerrs scripts/ibchecknet scripts/ibchecknode \ scripts/dump_lfts.sh scripts/dump_mfts.sh \ scripts/set_nodedesc.sh \ scripts/ibqueryerrors.pl scripts/ibswportwatch.pl \ - scripts/iblinkinfo.pl scripts/ibprintswitch.pl \ + scripts/ibprintswitch.pl \ scripts/ibprintca.pl scripts/ibprintrt.pl \ scripts/ibfindnodesusing.pl scripts/ibidsverify.pl \ scripts/check_lft_balance.pl @@ -40,6 +41,11 @@ src_ibnetdiscover_SOURCES = src/ibnetdiscover.c src/grouping.c src/ibdiag_common src_ibnetdiscover_CFLAGS = -Wall $(DBGFLAGS) src_ibnetdiscover_LDFLAGS = -Wl,--rpath -Wl,$(libdir) +src_iblinkinfo_pl_SOURCES = src/iblinkinfo.c +src_iblinkinfo_pl_CFLAGS = -Wall $(DBGFLAGS) +src_iblinkinfo_pl_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ + -libcommon -L$(srcdir)/libibnetdisc -libnetdisc + src_ibping_SOURCES = src/ibping.c src/ibdiag_common.c src_ibping_CFLAGS = -Wall $(DBGFLAGS) diff --git a/infiniband-diags/scripts/iblinkinfo.pl b/infiniband-diags/scripts/iblinkinfo.pl deleted file mode 100755 index b6b27ce..0000000 --- a/infiniband-diags/scripts/iblinkinfo.pl +++ /dev/null @@ -1,327 +0,0 @@ -#!/usr/bin/perl -# -# Copyright (c) 2006 The Regents of the University of California. -# Copyright (c) 2007-2008 Voltaire, Inc. All rights reserved. -# -# Produced at Lawrence Livermore National Laboratory. -# Written by Ira Weiny . -# -# This software is available to you under a choice of one of two -# licenses. You may choose to be licensed under the terms of the GNU -# General Public License (GPL) Version 2, available from the file -# COPYING in the main directory of this source tree, or the -# OpenIB.org BSD license below: -# -# Redistribution and use in source and binary forms, with or -# without modification, are permitted provided that the following -# conditions are met: -# -# - Redistributions of source code must retain the above -# copyright notice, this list of conditions and the following -# disclaimer. -# -# - Redistributions in binary form must reproduce the above -# copyright notice, this list of conditions and the following -# disclaimer in the documentation and/or other materials -# provided with the distribution. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, -# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF -# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS -# BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN -# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN -# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -# SOFTWARE. -# - -use strict; - -use Getopt::Std; -use IBswcountlimits; - -sub usage_and_exit -{ - my $prog = $_[0]; - print -"Usage: $prog [-Rhclp -S -D -C -P ]\n"; - print -" Report link speed and connection for each port of each switch which is active\n"; - print " -h This help message\n"; - print -" -R Recalculate ibnetdiscover information (Default is to reuse ibnetdiscover output)\n"; - print -" -D output only the switch specified by direct route path\n"; - print " -S output only the switch specified by (hex format)\n"; - print " -d print only down links\n"; - print - " -l (line mode) print all information for each link on each line\n"; - print -" -p print additional switch settings (PktLifeTime,HoqLife,VLStallCount)\n"; - print " -c print port capabilities (enabled/supported values)\n"; - print " -C use selected Channel Adaptor name for queries\n"; - print " -P use selected channel adaptor port for queries\n"; - print " -g print port guids instead of node guids\n"; - exit 2; -} - -my $argv0 = `basename $0`; -my $regenerate_map = undef; -my $single_switch = undef; -my $direct_route = undef; -my $line_mode = undef; -my $print_add_switch = undef; -my $print_extended_cap = undef; -my $only_down_links = undef; -my $ca_name = ""; -my $ca_port = ""; -my $print_port_guids = undef; -my $switch_found = "no"; -chomp $argv0; - -if (!getopts("hcpldRS:D:C:P:g")) { usage_and_exit $argv0; } -if (defined $Getopt::Std::opt_h) { usage_and_exit $argv0; } -if (defined $Getopt::Std::opt_D) { $direct_route = $Getopt::Std::opt_D; } -if (defined $Getopt::Std::opt_R) { $regenerate_map = $Getopt::Std::opt_R; } -if (defined $Getopt::Std::opt_S) { - $single_switch = format_guid($Getopt::Std::opt_S); -} -if (defined $Getopt::Std::opt_d) { $only_down_links = $Getopt::Std::opt_d; } -if (defined $Getopt::Std::opt_l) { $line_mode = $Getopt::Std::opt_l; } -if (defined $Getopt::Std::opt_p) { $print_add_switch = $Getopt::Std::opt_p; } -if (defined $Getopt::Std::opt_c) { $print_extended_cap = $Getopt::Std::opt_c; } -if (defined $Getopt::Std::opt_C) { $ca_name = $Getopt::Std::opt_C; } -if (defined $Getopt::Std::opt_P) { $ca_port = $Getopt::Std::opt_P; } -if (defined $Getopt::Std::opt_g) { $print_port_guids = $Getopt::Std::opt_g; } - -my $extra_smpquery_params = get_ca_name_port_param_string($ca_name, $ca_port); - -sub main -{ - get_link_ends($regenerate_map, $ca_name, $ca_port); - if (defined($direct_route)) { - # convert DR to guid, then use original single_switch option - $single_switch = convert_dr_to_guid($direct_route); - if (!defined($single_switch) || !is_switch($single_switch)) { - printf("The direct route (%s) does not map to a switch.\n", - $direct_route); - return; - } - } - foreach my $switch (sort (keys(%IBswcountlimits::link_ends))) { - if ($single_switch && $switch ne $single_switch) { - next; - } else { - $switch_found = "yes"; - } - my $switch_prompt = "no"; - my $num_ports = get_num_ports($switch, $ca_name, $ca_port); - if ($num_ports == 0) { - printf("ERROR: switch $switch has 0 ports???\n"); - } - my @output_lines = undef; - my $pkt_lifetime = ""; - my $pkt_life_prompt = ""; - my $port_timeouts = ""; - my $print_switch = "yes"; - if ($only_down_links) { $print_switch = "no"; } - if ($print_add_switch) { - my $data = `smpquery $extra_smpquery_params -G switchinfo $switch`; - if ($data eq "") { - printf("ERROR: failed to get switchinfo for $switch\n"); - } - my @lines = split("\n", $data); - foreach my $line (@lines) { - if ($line =~ /^LifeTime:\.+(.*)/) { $pkt_lifetime = $1; } - } - $pkt_life_prompt = sprintf(" (LT: %2s)", $pkt_lifetime); - } - foreach my $port (1 .. $num_ports) { - my $hr = $IBswcountlimits::link_ends{$switch}{$port}; - if ($switch_prompt eq "no" && !$line_mode) { - my $switch_name = ""; - my $tmp_port = $port; - while ($switch_name eq "" && $tmp_port <= $num_ports) { - # the first port is down find switch name with up port - my $hr = $IBswcountlimits::link_ends{$switch}{$tmp_port}; - $switch_name = $hr->{loc_desc}; - $tmp_port++; - } - if ($switch_name eq "") { - printf( - "WARNING: Switch Name not found for $switch\n"); - } - push( - @output_lines, - sprintf( - "Switch %18s %s%s:\n", - $switch, $switch_name, $pkt_life_prompt - ) - ); - $switch_prompt = "yes"; - } - my $data = - `smpquery $extra_smpquery_params -G portinfo $switch $port`; - if ($data eq "") { - printf( - "ERROR: failed to get portinfo for $switch port $port\n"); - } - my @lines = split("\n", $data); - my $speed = ""; - my $speed_sup = ""; - my $speed_enable = ""; - my $width = ""; - my $width_sup = ""; - my $width_enable = ""; - my $state = ""; - my $hoq_life = ""; - my $vl_stall = ""; - my $phy_link_state = ""; - - foreach my $line (@lines) { - if ($line =~ /^LinkSpeedActive:\.+(.*)/) { $speed = $1; } - if ($line =~ /^LinkSpeedEnabled:\.+(.*)/) { - $speed_enable = $1; - } - if ($line =~ /^LinkSpeedSupported:\.+(.*)/) { $speed_sup = $1; } - if ($line =~ /^LinkWidthActive:\.+(.*)/) { $width = $1; } - if ($line =~ /^LinkWidthEnabled:\.+(.*)/) { - $width_enable = $1; - } - if ($line =~ /^LinkWidthSupported:\.+(.*)/) { $width_sup = $1; } - if ($line =~ /^LinkState:\.+(.*)/) { $state = $1; } - if ($line =~ /^HoqLife:\.+(.*)/) { $hoq_life = $1; } - if ($line =~ /^VLStallCount:\.+(.*)/) { $vl_stall = $1; } - if ($line =~ /^PhysLinkState:\.+(.*)/) { $phy_link_state = $1; } - } - my $rem_port = $hr->{rem_port}; - my $rem_lid = $hr->{rem_lid}; - my $rem_speed_sup = ""; - my $rem_speed_enable = ""; - my $rem_width_sup = ""; - my $rem_width_enable = ""; - if ($rem_lid ne "" && $rem_port ne "") { - $data = - `smpquery $extra_smpquery_params portinfo $rem_lid $rem_port`; - if ($data eq "") { - printf( - "ERROR: failed to get portinfo for $switch port $port\n" - ); - } - my @lines = split("\n", $data); - foreach my $line (@lines) { - if ($line =~ /^LinkSpeedEnabled:\.+(.*)/) { - $rem_speed_enable = $1; - } - if ($line =~ /^LinkSpeedSupported:\.+(.*)/) { - $rem_speed_sup = $1; - } - if ($line =~ /^LinkWidthEnabled:\.+(.*)/) { - $rem_width_enable = $1; - } - if ($line =~ /^LinkWidthSupported:\.+(.*)/) { - $rem_width_sup = $1; - } - } - } - my $capabilities = ""; - if ($print_extended_cap) { - $capabilities = sprintf("(%3s %s %6s / %8s [%s/%s][%s/%s])", - $width, $speed, $state, $phy_link_state, $width_enable, - $width_sup, $speed_enable, $speed_sup); - } else { - $capabilities = sprintf("(%3s %s %6s / %8s)", - $width, $speed, $state, $phy_link_state); - } - if ($print_add_switch) { - $port_timeouts = - sprintf(" (HOQ:%s VL_Stall:%s)", $hoq_life, $vl_stall); - } - if (!$only_down_links || ($only_down_links && $state eq "Down")) { - my $width_msg = ""; - my $speed_msg = ""; - if ($rem_width_enable ne "" && $rem_width_sup ne "") { - if ( $width_enable =~ /12X/ - && $rem_width_enable =~ /12X/ - && $width !~ /12X/) - { - $width_msg = "Could be 12X"; - } else { - if ( $width_enable =~ /8X/ - && $rem_width_enable =~ /8X/ - && $width !~ /8X/) - { - $width_msg = "Could be 8X"; - } else { - if ( $width_enable =~ /4X/ - && $rem_width_enable =~ /4X/ - && $width !~ /4X/) - { - $width_msg = "Could be 4X"; - } - } - } - } - if ($rem_speed_enable ne "" && $rem_speed_sup ne "") { - if ( $speed_enable =~ /10\.0/ - && $rem_speed_enable =~ /10\.0/ - && $speed !~ /10\.0/) - { - $speed_msg = "Could be 10.0 Gbps"; - } else { - if ( $speed_enable =~ /5\.0/ - && $rem_speed_enable =~ /5\.0/ - && $speed !~ /5\.0/) - { - $speed_msg = "Could be 5.0 Gbps"; - } - } - } - - if ($line_mode) { - my $line_begin = sprintf("%18s \"%30s\"%s", - $switch, $hr->{loc_desc}, $pkt_life_prompt); - my $ext_guid = sprintf("%18s", $hr->{rem_guid}); - if ($print_port_guids && $hr->{rem_port_guid} ne "") { - $ext_guid = sprintf("0x%016s", $hr->{rem_port_guid}); - } - push( - @output_lines, - sprintf( -"%s %6s %4s[%2s] ==%s%s==> %18s %6s %4s[%2s] \"%s\" ( %s %s)\n", - $line_begin, $hr->{loc_sw_lid}, - $port, $hr->{loc_ext_port}, - $capabilities, $port_timeouts, - $ext_guid, $hr->{rem_lid}, - $hr->{rem_port}, $hr->{rem_ext_port}, - $hr->{rem_desc}, $width_msg, - $speed_msg - ) - ); - } else { - push( - @output_lines, - sprintf( -" %6s %4s[%2s] ==%s%s==> %6s %4s[%2s] \"%s\" ( %s %s)\n", - $hr->{loc_sw_lid}, $port, - $hr->{loc_ext_port}, $capabilities, - $port_timeouts, $hr->{rem_lid}, - $hr->{rem_port}, $hr->{rem_ext_port}, - $hr->{rem_desc}, $width_msg, - $speed_msg - ) - ); - } - $print_switch = "yes"; - } - } - if ($print_switch eq "yes") { - foreach my $line (@output_lines) { print $line; } - } - } - if ($single_switch && $switch_found ne "yes") { - printf("Switch \"%s\" not found.\n", $single_switch); - } -} -main; - diff --git a/infiniband-diags/src/iblinkinfo.c b/infiniband-diags/src/iblinkinfo.c new file mode 100644 index 0000000..7cb8fba --- /dev/null +++ b/infiniband-diags/src/iblinkinfo.c @@ -0,0 +1,423 @@ +/* + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +char *argv0 = "iblinkinfotest"; +static FILE *f; + +static char *node_name_map_file = NULL; +static nn_map_t *node_name_map = NULL; + +static int timeout_ms = 500; + +static int down_links_only = 0; +static int line_mode = 0; +static int add_sw_settings = 0; +static int print_port_guids = 0; +static int old_output = 0; + +static unsigned int +get_max(unsigned int num) +{ + unsigned int v = num; // 32-bit word to find the log base 2 of + unsigned r = 0; // r will be lg(v) + + while (v >>= 1) // unroll for more speed... + { + r++; + } + + return (1 << r); +} + +static char * +get_linkspeed_str(int link_speed) +{ + return (ibnd_linkspeed_str(link_speed, old_output)); +} + +void +get_msg(char *width_msg, char *speed_msg, int msg_size, ibnd_port_t *port) +{ + int max_speed = 0; + + int max_width = get_max(port->info.link_width_supported + & port->remoteport->info.link_width_supported); + if ((max_width & port->info.link_width_active) == 0) { + // we are not at the max supported width + // print what we could be at. + snprintf(width_msg, msg_size, "Could be %s", + ibnd_linkwidth_str(max_width)); + } + + max_speed = get_max(port->info.link_speed_supported + & port->remoteport->info.link_speed_supported); + if ((max_speed & port->info.link_speed_active) == 0) { + // we are not at the max supported speed + // print what we could be at. + snprintf(speed_msg, msg_size, "Could be %s", + get_linkspeed_str(max_speed)); + } +} + +void +print_port(ibnd_node_t *node, ibnd_port_t *port) +{ + static char remote_guid_str[256]; + static char remote_str[256]; + static char link_str[256]; + static char width_msg[256]; + static char speed_msg[256]; + static char ext_port_str[256]; + static char loc_sma_lid[16]; + + if (!port) + return; + + remote_guid_str[0] = '\0'; + remote_str[0] = '\0'; + link_str[0] = '\0'; + width_msg[0] = '\0'; + speed_msg[0] = '\0'; + + snprintf(loc_sma_lid, 16, "%d", node->smalid); + if (port->remoteport) { + static char remote_name_buf[256]; + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); + + if (port->remoteport->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); + else + ext_port_str[0] = '\0'; + + get_msg(width_msg, speed_msg, 256, port); + if (line_mode) { + if (print_port_guids) { + snprintf(remote_guid_str, 256, + "0x%016lx ", + port->remoteport->guid); + } else { + snprintf(remote_guid_str, 256, + "0x%016lx ", + port->remoteport->node->info.nodeguid); + } + } + + snprintf(remote_str, 256, + "%s%6d %4d[%2s] \"%s\" ( %s %s)\n", + remote_guid_str, + port->remoteport->info.lid ? + port->remoteport->info.lid : + port->remoteport->node->smalid, + port->remoteport->portnum, + ext_port_str, + remap_node_name(node_name_map, + port->remoteport->node->info.nodeguid, + remote_name_buf), + width_msg, + speed_msg + ); + } else { + snprintf(remote_str, 256, + "%19s%6s %4s[%2s] \"\" ( )\n", "", "", "", ""); + if (old_output) { + loc_sma_lid[0] = '\0'; + } + } + + + if (add_sw_settings) { + snprintf(link_str, 256, + "(%3s %s %6s / %8s) (HOQ:%d VL_Stall:%d)", + ibnd_linkwidth_str(port->info.link_width_active), + get_linkspeed_str(port->info.link_speed_active), + ibnd_linkstate_str(port->info.link_state), + ibnd_physstate_str(port->info.phys_state), + port->info.hoq_lifetime, + port->info.vl_stall_count + ); + } else { + snprintf(link_str, 256, + "(%3s %s %6s / %8s)", + ibnd_linkwidth_str(port->info.link_width_active), + get_linkspeed_str(port->info.link_speed_active), + ibnd_linkstate_str(port->info.link_state), + ibnd_physstate_str(port->info.phys_state) + ); + } + + if (port->ext_portnum) + snprintf(ext_port_str, 256, "%d", port->ext_portnum); + else + ext_port_str[0] = '\0'; + + if (line_mode) { + static char name_buf[256]; + char *node_name = ""; + + if (old_output && (!port->remoteport)) { + node_name = ""; + } else { + strncpy(name_buf, node->nodedesc, 256); + node_name = remap_node_name(node_name_map, + node->info.nodeguid, + name_buf); + } + + printf("0x%016lx \"%30s\" %6s %4d[%2s] ==%s==> %s", + node->info.nodeguid, + node_name, + loc_sma_lid, port->portnum, + ext_port_str, + link_str, + remote_str + ); + } else { + printf(" %6s %4d[%2s] ==%s==> %s", + loc_sma_lid, port->portnum, + ext_port_str, + link_str, + remote_str + ); + } +} + +void +print_switch(ibnd_node_t *node, void *user_data) +{ + int i = 0; + + if (!line_mode) { + char name_buf[256]; + strncpy(name_buf, node->nodedesc, 256); + printf("Switch 0x%016lx %s:\n", + node->info.nodeguid, + remap_node_name(node_name_map, + node->info.nodeguid, + name_buf)); + } + + for (i = 1; i <= node->info.numports; i++) { + ibnd_port_t *port = node->ports[i]; + if (!port) + continue; + if (!down_links_only || port->info.link_state == IBND_LINK_DOWN) { + print_port(node, port); + } + } +} + +void +usage(void) +{ + fprintf(stderr, + "Usage: %s [-hclp -S -D -C -P ]\n" + " Report link speed and connection for each port of each switch which is active\n" + " -h This help message\n" + " -S output only the node specified by guid\n" + " -D print only node specified by \n" + " -f specify node to start \"from\"\n" + " -n Number of hops to include away from specified node\n" + " -d print only down links\n" + " -l (line mode) print all information for each link on each line\n" + " -p print additional switch settings (PktLifeTime,HoqLife,VLStallCount)\n" + + + " -t timeout for any single fabric query\n" + " -s show progress during scan\n" + " --node-name-map use specified node name map\n" + + " -C use selected Channel Adaptor name for queries\n" + " -P use selected channel adaptor port for queries\n" + " -g print port guids instead of node guids\n" + " --debug print debug messages\n" + " -R (this option is obsolete and does nothing)\n" + , + argv0); + exit(-1); +} + +int +main(int argc, char **argv) +{ + char *ca = 0; + int ca_port = 0; + ibnd_fabric_t *fabric = NULL; + uint64_t guid = 0; + char *dr_path = NULL; + char *from = NULL; + int hops = 0; + ib_portid_t port_id; + + static char const str_opts[] = "S:D:n:C:P:t:sldgphuf:R"; + static const struct option long_opts[] = { + { "S", 1, 0, 'S'}, + { "D", 1, 0, 'D'}, + { "num-hops", 1, 0, 'n'}, + { "down-links-only", 0, 0, 'd'}, + { "line-mode", 0, 0, 'l'}, + { "ca-name", 1, 0, 'C'}, + { "ca-port", 1, 0, 'P'}, + { "timeout", 1, 0, 't'}, + { "show", 0, 0, 's'}, + { "print-port-guids", 0, 0, 'g'}, + { "print-additional", 0, 0, 'p'}, + { "help", 0, 0, 'h'}, + { "usage", 0, 0, 'u'}, + { "node-name-map", 1, 0, 1}, + { "debug", 0, 0, 2}, + { "compat", 0, 0, 3}, + { "from", 1, 0, 'f'}, + { "R", 0, 0, 'R'}, + { } + }; + + f = stdout; + + argv0 = argv[0]; + + while (1) { + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); + if ( ch == -1 ) + break; + switch(ch) { + case 1: + node_name_map_file = strdup(optarg); + break; + case 2: + ibnd_debug(1); + break; + case 3: + old_output = 1; + break; + case 'f': + from = strdup(optarg); + break; + case 'C': + ca = strdup(optarg); + break; + case 'P': + ca_port = strtoul(optarg, 0, 0); + break; + case 'D': + dr_path = strdup(optarg); + break; + case 'n': + hops = (int)strtol(optarg, NULL, 0); + break; + case 'd': + down_links_only = 1; + break; + case 'l': + line_mode = 1; + break; + case 't': + timeout_ms = strtoul(optarg, 0, 0); + break; + case 's': + ibnd_show_progress(1); + break; + case 'g': + print_port_guids = 1; + break; + case 'S': + guid = (uint64_t)strtoull(optarg, 0, 0); + break; + case 'p': + add_sw_settings = 1; + break; + case 'R': + /* GNDN */ + break; + default: + usage(); + break; + } + } + argc -= optind; + argv += optind; + + if (argc && !(f = fopen(argv[0], "w"))) + fprintf(stderr, "can't open file %s for writing", argv[0]); + + node_name_map = open_node_name_map(node_name_map_file); + + if (from) { + /* only scan part of the fabric */ + str2drpath(&(port_id.drpath), from, 0, 0); + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + guid = 0; + } else { + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } + } + + if (guid) { + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); + print_switch(sw, NULL); + } else if (dr_path) { + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); + print_switch(sw, NULL); + } else { + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); + } + + ibnd_destroy_fabric(fabric); + + close_node_name_map(node_name_map); + exit(0); +} -- 1.5.4.5 From weiny2 at llnl.gov Thu Dec 11 16:20:47 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 11 Dec 2008 16:20:47 -0800 Subject: [ofa-general] [PATCH V2 3/3] Convert ibnetdiscover to use new ibnetdisc library. Message-ID: <20081211162047.0d574bda.weiny2@llnl.gov> >From dda1f79a9ed86ed1aba9db0fa4498005e5d95040 Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Tue, 2 Dec 2008 16:29:29 -0800 Subject: [PATCH] Convert ibnetdiscover to use new ibnetdisc library. Removed -e and -v since they were somewhat redundant with the -d option. All other functionality is preserved Signed-off-by: Ira Weiny --- infiniband-diags/Makefile.am | 4 +- infiniband-diags/include/grouping.h | 113 ---- infiniband-diags/man/ibnetdiscover.8 | 10 +- infiniband-diags/scripts/dump_lfts.sh | 2 +- infiniband-diags/scripts/dump_mfts.sh | 2 +- infiniband-diags/src/grouping.c | 787 -------------------------- infiniband-diags/src/ibnetdiscover.c | 982 ++++++++++----------------------- 7 files changed, 310 insertions(+), 1590 deletions(-) delete mode 100644 infiniband-diags/include/grouping.h delete mode 100644 infiniband-diags/src/grouping.c diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am index 274ef89..6362ae1 100644 --- a/infiniband-diags/Makefile.am +++ b/infiniband-diags/Makefile.am @@ -37,9 +37,9 @@ sbin_SCRIPTS = scripts/ibcheckerrs scripts/ibchecknet scripts/ibchecknode \ src_ibaddr_SOURCES = src/ibaddr.c src/ibdiag_common.c src_ibaddr_CFLAGS = -Wall $(DBGFLAGS) -src_ibnetdiscover_SOURCES = src/ibnetdiscover.c src/grouping.c src/ibdiag_common.c +src_ibnetdiscover_SOURCES = src/ibnetdiscover.c src/ibdiag_common.c src_ibnetdiscover_CFLAGS = -Wall $(DBGFLAGS) -src_ibnetdiscover_LDFLAGS = -Wl,--rpath -Wl,$(libdir) +src_ibnetdiscover_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -L$(srcdir)/libibnetdisc -libnetdisc src_iblinkinfo_pl_SOURCES = src/iblinkinfo.c src_iblinkinfo_pl_CFLAGS = -Wall $(DBGFLAGS) diff --git a/infiniband-diags/include/grouping.h b/infiniband-diags/include/grouping.h deleted file mode 100644 index e54efef..0000000 --- a/infiniband-diags/include/grouping.h +++ /dev/null @@ -1,113 +0,0 @@ -/* - * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. - * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -#ifndef _GROUPING_H_ -#define _GROUPING_H_ - -/*========================================================*/ -/* FABRIC SCANNER SPECIFIC DATA */ -/*========================================================*/ - -#define SPINES_MAX_NUM 12 -#define LINES_MAX_NUM 36 - -typedef struct ChassisList ChassisList; -typedef struct AllChassisList AllChassisList; - -struct ChassisList { - ChassisList *next; - uint64_t chassisguid; - int chassisnum; - int chassistype; - int nodecount; /* used for grouping by SystemImageGUID */ - Node *spinenode[SPINES_MAX_NUM + 1]; - Node *linenode[LINES_MAX_NUM + 1]; -}; - -struct AllChassisList { - ChassisList *first; - ChassisList *current; - ChassisList *last; -}; - -/*========================================================*/ -/* CHASSIS RECOGNITION SPECIFIC DATA */ -/*========================================================*/ - -/* Device IDs */ -#define VTR_DEVID_IB_FC_ROUTER 0x5a00 -#define VTR_DEVID_IB_IP_ROUTER 0x5a01 -#define VTR_DEVID_ISR9600_SPINE 0x5a02 -#define VTR_DEVID_ISR9600_LEAF 0x5a03 -#define VTR_DEVID_HCA1 0x5a04 -#define VTR_DEVID_HCA2 0x5a44 -#define VTR_DEVID_HCA3 0x6278 -#define VTR_DEVID_SW_6IB4 0x5a05 -#define VTR_DEVID_ISR9024 0x5a06 -#define VTR_DEVID_ISR9288 0x5a07 -#define VTR_DEVID_SLB24 0x5a09 -#define VTR_DEVID_SFB12 0x5a08 -#define VTR_DEVID_SFB4 0x5a0b -#define VTR_DEVID_ISR9024_12 0x5a0c -#define VTR_DEVID_SLB8 0x5a0d -#define VTR_DEVID_RLX_SWITCH_BLADE 0x5a20 -#define VTR_DEVID_ISR9024_DDR 0x5a31 -#define VTR_DEVID_SFB12_DDR 0x5a32 -#define VTR_DEVID_SFB4_DDR 0x5a33 -#define VTR_DEVID_SLB24_DDR 0x5a34 -#define VTR_DEVID_SFB2012 0x5a37 -#define VTR_DEVID_SLB2024 0x5a38 -#define VTR_DEVID_ISR2012 0x5a39 -#define VTR_DEVID_SFB2004 0x5a40 -#define VTR_DEVID_ISR2004 0x5a41 -#define VTR_DEVID_SRB2004 0x5a42 - -enum ChassisType { UNRESOLVED_CT, ISR9288_CT, ISR9096_CT, ISR2012_CT, ISR2004_CT }; -enum ChassisSlot { UNRESOLVED_CS, LINE_CS, SPINE_CS, SRBD_CS }; - -/*========================================================*/ -/* External interface */ -/*========================================================*/ - -ChassisList *group_nodes(); -char *portmapstring(Port *port); -char *get_chassis_type(unsigned char chassistype); -char *get_chassis_slot(unsigned char chassisslot); -uint64_t get_chassis_guid(unsigned char chassisnum); - -int is_xsigo_guid(uint64_t guid); -int is_xsigo_tca(uint64_t guid); -int is_xsigo_hca(uint64_t guid); - -#endif /* _GROUPING_H_ */ diff --git a/infiniband-diags/man/ibnetdiscover.8 b/infiniband-diags/man/ibnetdiscover.8 index 958efa9..768d392 100644 --- a/infiniband-diags/man/ibnetdiscover.8 +++ b/infiniband-diags/man/ibnetdiscover.8 @@ -5,7 +5,7 @@ ibnetdiscover \- discover InfiniBand topology .SH SYNOPSIS .B ibnetdiscover -[\-d(ebug)] [\-e(rr_show)] [\-v(erbose)] [\-s(how)] [\-l(ist)] [\-g(rouping)] [\-H(ca_list)] [\-S(witch_list)] [\-R(outer_list)] [\-C ca_name] [\-P ca_port] [\-t(imeout) timeout_ms] [\-V(ersion)] [\--node-name-map ] [\-p(orts)] [\-h(elp)] [] +[\-d(ebug)] [\-s(how)] [\-l(ist)] [\-g(rouping)] [\-H(ca_list)] [\-S(witch_list)] [\-R(outer_list)] [\-C ca_name] [\-P ca_port] [\-t(imeout) timeout_ms] [\-V(ersion)] [\--node-name-map ] [\-p(orts)] [\-h(elp)] [] .SH DESCRIPTION .PP @@ -37,7 +37,7 @@ List of connected switches List of connected routers .TP \fB\-s\fR, \fB\-\-show\fR -Show more information +Show progress information during discovery. .TP \fB\-\-node\-name\-map\fR Specify a node name map. The node name map file maps GUIDs to more user friendly @@ -57,15 +57,9 @@ using the util_name -h syntax. # Debugging flags .PP \-d raise the IB debugging level. - May be used several times (-ddd or -d -d -d). -.PP -\-e show send and receive errors (timeouts and others) .PP \-h show the usage message .PP -\-v increase the application verbosity level. - May be used several times (-vv or -v -v -v) -.PP \-V show the version info. # Other common flags: diff --git a/infiniband-diags/scripts/dump_lfts.sh b/infiniband-diags/scripts/dump_lfts.sh index ebca705..9d6a986 100755 --- a/infiniband-diags/scripts/dump_lfts.sh +++ b/infiniband-diags/scripts/dump_lfts.sh @@ -22,7 +22,7 @@ done dump_by_dr_path () { -for sw_dr in `ibnetdiscover $ca_info -v \ +for sw_dr in `ibnetdiscover $ca_info -s \ | sed -ne '/^DR path .* switch /s/^DR path \([,|0-9]\+\) ->.*{\([0-9|a-f]\+\)}.*$/\2 \1/p' \ | sort -u \ | awk 'BEGIN {guid=0;} {if ($1 != guid) { guid=$1; print $2; }}'` ; do diff --git a/infiniband-diags/scripts/dump_mfts.sh b/infiniband-diags/scripts/dump_mfts.sh index 39fc5fb..cef6ad3 100755 --- a/infiniband-diags/scripts/dump_mfts.sh +++ b/infiniband-diags/scripts/dump_mfts.sh @@ -22,7 +22,7 @@ done dump_by_dr_path () { -for sw_dr in `ibnetdiscover $ca_info -v \ +for sw_dr in `ibnetdiscover $ca_info -s \ | sed -ne '/^DR path .* switch /s/^DR path \[\(.*\)\].*$/\1/p' \ | sed -e 's/\]\[/,/g' \ | sort -u` ; do diff --git a/infiniband-diags/src/grouping.c b/infiniband-diags/src/grouping.c deleted file mode 100644 index f1a996f..0000000 --- a/infiniband-diags/src/grouping.c +++ /dev/null @@ -1,787 +0,0 @@ -/* - * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. - * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -/*========================================================*/ -/* FABRIC SCANNER SPECIFIC DATA */ -/*========================================================*/ - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include -#include -#include - -#include -#include - -#include "ibnetdiscover.h" -#include "grouping.h" - -#define OUT_BUFFER_SIZE 16 - - -extern Node *nodesdist[MAXHOPS+1]; /* last is CA list */ -extern Node *mynode; -extern Port *myport; -extern int maxhops_discovered; - -AllChassisList mylist; - -char *ChassisTypeStr[5] = { "", "ISR9288", "ISR9096", "ISR2012", "ISR2004" }; -char *ChassisSlotStr[4] = { "", "Line", "Spine", "SRBD" }; - - -char *get_chassis_type(unsigned char chassistype) -{ - if (chassistype == UNRESOLVED_CT || chassistype > ISR2004_CT) - return NULL; - return ChassisTypeStr[chassistype]; -} - -char *get_chassis_slot(unsigned char chassisslot) -{ - if (chassisslot == UNRESOLVED_CS || chassisslot > SRBD_CS) - return NULL; - return ChassisSlotStr[chassisslot]; -} - -static struct ChassisList *find_chassisnum(unsigned char chassisnum) -{ - ChassisList *current; - - for (current = mylist.first; current; current = current->next) { - if (current->chassisnum == chassisnum) - return current; - } - - return NULL; -} - -static uint64_t topspin_chassisguid(uint64_t guid) -{ - /* Byte 3 in system image GUID is chassis type, and */ - /* Byte 4 is location ID (slot) so just mask off byte 4 */ - return guid & 0xffffffff00ffffffULL; -} - -int is_xsigo_guid(uint64_t guid) -{ - if ((guid & 0xffffff0000000000ULL) == 0x0013970000000000ULL) - return 1; - else - return 0; -} - -static int is_xsigo_leafone(uint64_t guid) -{ - if ((guid & 0xffffffffff000000ULL) == 0x0013970102000000ULL) - return 1; - else - return 0; -} - -int is_xsigo_hca(uint64_t guid) -{ - /* NodeType 2 is HCA */ - if ((guid & 0xffffffff00000000ULL) == 0x0013970200000000ULL) - return 1; - else - return 0; -} - -int is_xsigo_tca(uint64_t guid) -{ - /* NodeType 3 is TCA */ - if ((guid & 0xffffffff00000000ULL) == 0x0013970300000000ULL) - return 1; - else - return 0; -} - -static int is_xsigo_ca(uint64_t guid) -{ - if (is_xsigo_hca(guid) || is_xsigo_tca(guid)) - return 1; - else - return 0; -} - -static int is_xsigo_switch(uint64_t guid) -{ - if ((guid & 0xffffffff00000000ULL) == 0x0013970100000000ULL) - return 1; - else - return 0; -} - -static uint64_t xsigo_chassisguid(Node *node) -{ - if (!is_xsigo_ca(node->sysimgguid)) { - /* Byte 3 is NodeType and byte 4 is PortType */ - /* If NodeType is 1 (switch), PortType is masked */ - if (is_xsigo_switch(node->sysimgguid)) - return node->sysimgguid & 0xffffffff00ffffffULL; - else - return node->sysimgguid; - } else { - /* Is there a peer port ? */ - if (!node->ports->remoteport) - return node->sysimgguid; - - /* If peer port is Leaf 1, use its chassis GUID */ - if (is_xsigo_leafone(node->ports->remoteport->node->sysimgguid)) - return node->ports->remoteport->node->sysimgguid & - 0xffffffff00ffffffULL; - else - return node->sysimgguid; - } -} - -static uint64_t get_chassisguid(Node *node) -{ - if (node->vendid == TS_VENDOR_ID || node->vendid == SS_VENDOR_ID) - return topspin_chassisguid(node->sysimgguid); - else if (node->vendid == XS_VENDOR_ID || is_xsigo_guid(node->sysimgguid)) - return xsigo_chassisguid(node); - else - return node->sysimgguid; -} - -static struct ChassisList *find_chassisguid(Node *node) -{ - ChassisList *current; - uint64_t chguid; - - chguid = get_chassisguid(node); - for (current = mylist.first; current; current = current->next) { - if (current->chassisguid == chguid) - return current; - } - - return NULL; -} - -uint64_t get_chassis_guid(unsigned char chassisnum) -{ - ChassisList *chassis; - - chassis = find_chassisnum(chassisnum); - if (chassis) - return chassis->chassisguid; - else - return 0; -} - -static int is_router(Node *node) -{ - return (node->devid == VTR_DEVID_IB_FC_ROUTER || - node->devid == VTR_DEVID_IB_IP_ROUTER); -} - -static int is_spine_9096(Node *node) -{ - return (node->devid == VTR_DEVID_SFB4 || - node->devid == VTR_DEVID_SFB4_DDR); -} - -static int is_spine_9288(Node *node) -{ - return (node->devid == VTR_DEVID_SFB12 || - node->devid == VTR_DEVID_SFB12_DDR); -} - -static int is_spine_2004(Node *node) -{ - return (node->devid == VTR_DEVID_SFB2004); -} - -static int is_spine_2012(Node *node) -{ - return (node->devid == VTR_DEVID_SFB2012); -} - -static int is_spine(Node *node) -{ - return (is_spine_9096(node) || is_spine_9288(node) || - is_spine_2004(node) || is_spine_2012(node)); -} - -static int is_line_24(Node *node) -{ - return (node->devid == VTR_DEVID_SLB24 || - node->devid == VTR_DEVID_SLB24_DDR || - node->devid == VTR_DEVID_SRB2004); -} - -static int is_line_8(Node *node) -{ - return (node->devid == VTR_DEVID_SLB8); -} - -static int is_line_2024(Node *node) -{ - return (node->devid == VTR_DEVID_SLB2024); -} - -static int is_line(Node *node) -{ - return (is_line_24(node) || is_line_8(node) || is_line_2024(node)); -} - -int is_chassis_switch(Node *node) -{ - return (is_spine(node) || is_line(node)); -} - -/* these structs help find Line (Anafa) slot number while using spine portnum */ -int line_slot_2_sfb4[25] = { 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4 }; -int anafa_line_slot_2_sfb4[25] = { 0, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2 }; -int line_slot_2_sfb12[25] = { 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10, 10, 11, 11, 12, 12 }; -int anafa_line_slot_2_sfb12[25] = { 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2 }; - -/* IPR FCR modules connectivity while using sFB4 port as reference */ -int ipr_slot_2_sfb4_port[25] = { 0, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1 }; - -/* these structs help find Spine (Anafa) slot number while using spine portnum */ -int spine12_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; -int anafa_spine12_slot_2_slb[25]= { 0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; -int spine4_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; -int anafa_spine4_slot_2_slb[25] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; -/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ - -static void get_sfb_slot(Node *node, Port *lineport) -{ - ChassisRecord *ch = node->chrecord; - - ch->chassisslot = SPINE_CS; - if (is_spine_9096(node)) { - ch->chassistype = ISR9096_CT; - ch->slotnum = spine4_slot_2_slb[lineport->portnum]; - ch->anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; - } else if (is_spine_9288(node)) { - ch->chassistype = ISR9288_CT; - ch->slotnum = spine12_slot_2_slb[lineport->portnum]; - ch->anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; - } else if (is_spine_2012(node)) { - ch->chassistype = ISR2012_CT; - ch->slotnum = spine12_slot_2_slb[lineport->portnum]; - ch->anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; - } else if (is_spine_2004(node)) { - ch->chassistype = ISR2004_CT; - ch->slotnum = spine4_slot_2_slb[lineport->portnum]; - ch->anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; - } else { - IBPANIC("Unexpected node found: guid 0x%016" PRIx64, node->nodeguid); - } -} - -static void get_router_slot(Node *node, Port *spineport) -{ - ChassisRecord *ch = node->chrecord; - int guessnum = 0; - - if (!ch) { - if (!(node->chrecord = calloc(1, sizeof(ChassisRecord)))) - IBPANIC("out of mem"); - ch = node->chrecord; - } - - ch->chassisslot = SRBD_CS; - if (is_spine_9096(spineport->node)) { - ch->chassistype = ISR9096_CT; - ch->slotnum = line_slot_2_sfb4[spineport->portnum]; - ch->anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; - } else if (is_spine_9288(spineport->node)) { - ch->chassistype = ISR9288_CT; - ch->slotnum = line_slot_2_sfb12[spineport->portnum]; - /* this is a smart guess based on nodeguids order on sFB-12 module */ - guessnum = spineport->node->nodeguid % 4; - /* module 1 <--> remote anafa 3 */ - /* module 2 <--> remote anafa 2 */ - /* module 3 <--> remote anafa 1 */ - ch->anafanum = (guessnum == 3 ? 1 : (guessnum == 1 ? 3 : 2)); - } else if (is_spine_2012(spineport->node)) { - ch->chassistype = ISR2012_CT; - ch->slotnum = line_slot_2_sfb12[spineport->portnum]; - /* this is a smart guess based on nodeguids order on sFB-12 module */ - guessnum = spineport->node->nodeguid % 4; - // module 1 <--> remote anafa 3 - // module 2 <--> remote anafa 2 - // module 3 <--> remote anafa 1 - ch->anafanum = (guessnum == 3? 1 : (guessnum == 1 ? 3 : 2)); - } else if (is_spine_2004(spineport->node)) { - ch->chassistype = ISR2004_CT; - ch->slotnum = line_slot_2_sfb4[spineport->portnum]; - ch->anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; - } else { - IBPANIC("Unexpected node found: guid 0x%016" PRIx64, spineport->node->nodeguid); - } -} - -static void get_slb_slot(ChassisRecord *ch, Port *spineport) -{ - ch->chassisslot = LINE_CS; - if (is_spine_9096(spineport->node)) { - ch->chassistype = ISR9096_CT; - ch->slotnum = line_slot_2_sfb4[spineport->portnum]; - ch->anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; - } else if (is_spine_9288(spineport->node)) { - ch->chassistype = ISR9288_CT; - ch->slotnum = line_slot_2_sfb12[spineport->portnum]; - ch->anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; - } else if (is_spine_2012(spineport->node)) { - ch->chassistype = ISR2012_CT; - ch->slotnum = line_slot_2_sfb12[spineport->portnum]; - ch->anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; - } else if (is_spine_2004(spineport->node)) { - ch->chassistype = ISR2004_CT; - ch->slotnum = line_slot_2_sfb4[spineport->portnum]; - ch->anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; - } else { - IBPANIC("Unexpected node found: guid 0x%016" PRIx64, spineport->node->nodeguid); - } -} - -/* - This function called for every Voltaire node in fabric - It could be optimized so, but time overhead is very small - and its only diag.util -*/ -static void fill_chassis_record(Node *node) -{ - Port *port; - Node *remnode = 0; - ChassisRecord *ch = 0; - - if (node->chrecord) /* somehow this node has already been passed */ - return; - - if (!(node->chrecord = calloc(1, sizeof(ChassisRecord)))) - IBPANIC("out of mem"); - - ch = node->chrecord; - - /* node is router only in case of using unique lid */ - /* (which is lid of chassis router port) */ - /* in such case node->ports is actually a requested port... */ - if (is_router(node) && is_spine(node->ports->remoteport->node)) - get_router_slot(node, node->ports->remoteport); - else if (is_spine(node)) { - for (port = node->ports; port; port = port->next) { - if (!port->remoteport) - continue; - remnode = port->remoteport->node; - if (remnode->type != SWITCH_NODE) { - if (!remnode->chrecord) - get_router_slot(remnode, port); - continue; - } - if (!ch->chassistype) - /* we assume here that remoteport belongs to line */ - get_sfb_slot(node, port->remoteport); - - /* we could break here, but need to find if more routers connected */ - } - - } else if (is_line(node)) { - for (port = node->ports; port; port = port->next) { - if (port->portnum > 12) - continue; - if (!port->remoteport) - continue; - /* we assume here that remoteport belongs to spine */ - get_slb_slot(ch, port->remoteport); - break; - } - } - - return; -} - -static int get_line_index(Node *node) -{ - int retval = 3 * (node->chrecord->slotnum - 1) + node->chrecord->anafanum; - - if (retval > LINES_MAX_NUM || retval < 1) - IBPANIC("Internal error"); - return retval; -} - -static int get_spine_index(Node *node) -{ - int retval; - - if (is_spine_9288(node) || is_spine_2012(node)) - retval = 3 * (node->chrecord->slotnum - 1) + node->chrecord->anafanum; - else - retval = node->chrecord->slotnum; - - if (retval > SPINES_MAX_NUM || retval < 1) - IBPANIC("Internal error"); - return retval; -} - -static void insert_line_router(Node *node, ChassisList *chassislist) -{ - int i = get_line_index(node); - - if (chassislist->linenode[i]) - return; /* already filled slot */ - - chassislist->linenode[i] = node; - node->chrecord->chassisnum = chassislist->chassisnum; -} - -static void insert_spine(Node *node, ChassisList *chassislist) -{ - int i = get_spine_index(node); - - if (chassislist->spinenode[i]) - return; /* already filled slot */ - - chassislist->spinenode[i] = node; - node->chrecord->chassisnum = chassislist->chassisnum; -} - -static void pass_on_lines_catch_spines(ChassisList *chassislist) -{ - Node *node, *remnode; - Port *port; - int i; - - for (i = 1; i <= LINES_MAX_NUM; i++) { - node = chassislist->linenode[i]; - - if (!(node && is_line(node))) - continue; /* empty slot or router */ - - for (port = node->ports; port; port = port->next) { - if (port->portnum > 12) - continue; - - if (!port->remoteport) - continue; - remnode = port->remoteport->node; - - if (!remnode->chrecord) - continue; /* some error - spine not initialized ? FIXME */ - insert_spine(remnode, chassislist); - } - } -} - -static void pass_on_spines_catch_lines(ChassisList *chassislist) -{ - Node *node, *remnode; - Port *port; - int i; - - for (i = 1; i <= SPINES_MAX_NUM; i++) { - node = chassislist->spinenode[i]; - if (!node) - continue; /* empty slot */ - for (port = node->ports; port; port = port->next) { - if (!port->remoteport) - continue; - remnode = port->remoteport->node; - - if (!remnode->chrecord) - continue; /* some error - line/router not initialized ? FIXME */ - insert_line_router(remnode, chassislist); - } - } -} - -/* - Stupid interpolation algorithm... - But nothing to do - have to be compliant with VoltaireSM/NMS -*/ -static void pass_on_spines_interpolate_chguid(ChassisList *chassislist) -{ - Node *node; - int i; - - for (i = 1; i <= SPINES_MAX_NUM; i++) { - node = chassislist->spinenode[i]; - if (!node) - continue; /* skip the empty slots */ - - /* take first guid minus one to be consistent with SM */ - chassislist->chassisguid = node->nodeguid - 1; - break; - } -} - -/* - This function fills chassislist structure with all nodes - in that chassis - chassislist structure = structure of one standalone chassis -*/ -static void build_chassis(Node *node, ChassisList *chassislist) -{ - Node *remnode = 0; - Port *port = 0; - - /* we get here with node = chassis_spine */ - chassislist->chassistype = node->chrecord->chassistype; - insert_spine(node, chassislist); - - /* loop: pass on all ports of node */ - for (port = node->ports; port; port = port->next) { - if (!port->remoteport) - continue; - remnode = port->remoteport->node; - - if (!remnode->chrecord) - continue; /* some error - line or router not initialized ? FIXME */ - - insert_line_router(remnode, chassislist); - } - - pass_on_lines_catch_spines(chassislist); - /* this pass needed for to catch routers, since routers connected only */ - /* to spines in slot 1 or 4 and we could miss them first time */ - pass_on_spines_catch_lines(chassislist); - - /* additional 2 passes needed for to overcome a problem of pure "in-chassis" */ - /* connectivity - extra pass to ensure that all related chips/modules */ - /* inserted into the chassislist */ - pass_on_lines_catch_spines(chassislist); - pass_on_spines_catch_lines(chassislist); - pass_on_spines_interpolate_chguid(chassislist); -} - -/*========================================================*/ -/* INTERNAL TO EXTERNAL PORT MAPPING */ -/*========================================================*/ - -/* -Description : On ISR9288/9096 external ports indexing - is not matching the internal ( anafa ) port - indexes. Use this MAP to translate the data you get from - the OpenIB diagnostics (smpquery, ibroute, ibtracert, etc.) - - -Module : sLB-24 - anafa 1 anafa 2 -ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 -int port | 22 23 24 18 17 16 | 22 23 24 18 17 16 -ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 -int port | 19 20 21 15 14 13 | 19 20 21 15 14 13 ------------------------------------------------- - -Module : sLB-8 - anafa 1 anafa 2 -ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 -int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 -ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 -int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 - ------------> - anafa 1 anafa 2 -ext port | - - 5 - - 6 | - - 7 - - 8 -int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 -ext port | - - 1 - - 2 | - - 3 - - 4 -int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 ------------------------------------------------- - -Module : sLB-2024 - -ext port | 13 14 15 16 17 18 19 20 21 22 23 24 -A1 int port| 13 14 15 16 17 18 19 20 21 22 23 24 -ext port | 1 2 3 4 5 6 7 8 9 10 11 12 -A2 int port| 13 14 15 16 17 18 19 20 21 22 23 24 ---------------------------------------------------- - -*/ - -int int2ext_map_slb24[2][25] = { - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 5, 4, 18, 17, 16, 1, 2, 3, 13, 14, 15 }, - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 11, 10, 24, 23, 22, 7, 8, 9, 19, 20, 21 } - }; -int int2ext_map_slb8[2][25] = { - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 6, 6, 6, 1, 1, 1, 5, 5, 5 }, - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 8, 8, 8, 3, 3, 3, 7, 7, 7 } - }; -int int2ext_map_slb2024[2][25] = { - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }, - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 } - }; -/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ - -/* - This function relevant only for line modules/chips - Returns string with external port index -*/ -char *portmapstring(Port *port) -{ - static char mapping[OUT_BUFFER_SIZE]; - ChassisRecord *ch = port->node->chrecord; - int portnum = port->portnum; - int chipnum = 0; - int pindex = 0; - Node *node = port->node; - - if (!ch || !is_line(node) || (portnum < 13 || portnum > 24)) - return NULL; - - if (ch->anafanum < 1 || ch->anafanum > 2) - return NULL; - - memset(mapping, 0, sizeof(mapping)); - - chipnum = ch->anafanum - 1; - - if (is_line_24(node)) - pindex = int2ext_map_slb24[chipnum][portnum]; - else if (is_line_2024(node)) - pindex = int2ext_map_slb2024[chipnum][portnum]; - else - pindex = int2ext_map_slb8[chipnum][portnum]; - - sprintf(mapping, "[ext %d]", pindex); - - return mapping; -} - -static void add_chassislist() -{ - if (!(mylist.current = calloc(1, sizeof(ChassisList)))) - IBPANIC("out of mem"); - - if (mylist.first == NULL) { - mylist.first = mylist.current; - mylist.last = mylist.current; - } else { - mylist.last->next = mylist.current; - mylist.current->next = NULL; - mylist.last = mylist.current; - } -} - -/* - Main grouping function - Algorithm: - 1. pass on every Voltaire node - 2. catch spine chip for every Voltaire node - 2.1 build/interpolate chassis around this chip - 2.2 go to 1. - 3. pass on non Voltaire nodes (SystemImageGUID based grouping) - 4. now group non Voltaire nodes by SystemImageGUID -*/ -ChassisList *group_nodes() -{ - Node *node; - int dist; - int chassisnum = 0; - struct ChassisList *chassis; - - mylist.first = NULL; - mylist.current = NULL; - mylist.last = NULL; - - /* first pass on switches and build for every Voltaire node */ - /* an appropriate chassis record (slotnum and position) */ - /* according to internal connectivity */ - /* not very efficient but clear code so... */ - for (dist = 0; dist <= maxhops_discovered; dist++) { - for (node = nodesdist[dist]; node; node = node->dnext) { - if (node->vendid == VTR_VENDOR_ID) - fill_chassis_record(node); - } - } - - /* separate every Voltaire chassis from each other and build linked list of them */ - /* algorithm: catch spine and find all surrounding nodes */ - for (dist = 0; dist <= maxhops_discovered; dist++) { - for (node = nodesdist[dist]; node; node = node->dnext) { - if (node->vendid != VTR_VENDOR_ID) - continue; - if (!node->chrecord || node->chrecord->chassisnum || !is_spine(node)) - continue; - add_chassislist(); - mylist.current->chassisnum = ++chassisnum; - build_chassis(node, mylist.current); - } - } - - /* now make pass on nodes for chassis which are not Voltaire */ - /* grouped by common SystemImageGUID */ - for (dist = 0; dist <= maxhops_discovered; dist++) { - for (node = nodesdist[dist]; node; node = node->dnext) { - if (node->vendid == VTR_VENDOR_ID) - continue; - if (node->sysimgguid) { - chassis = find_chassisguid(node); - if (chassis) - chassis->nodecount++; - else { - /* Possible new chassis */ - add_chassislist(); - mylist.current->chassisguid = get_chassisguid(node); - mylist.current->nodecount = 1; - } - } - } - } - - /* now, make another pass to see which nodes are part of chassis */ - /* (defined as chassis->nodecount > 1) */ - for (dist = 0; dist <= MAXHOPS; ) { - for (node = nodesdist[dist]; node; node = node->dnext) { - if (node->vendid == VTR_VENDOR_ID) - continue; - if (node->sysimgguid) { - chassis = find_chassisguid(node); - if (chassis && chassis->nodecount > 1) { - if (!chassis->chassisnum) - chassis->chassisnum = ++chassisnum; - if (!node->chrecord) { - if (!(node->chrecord = calloc(1, sizeof(ChassisRecord)))) - IBPANIC("out of mem"); - node->chrecord->chassisnum = chassis->chassisnum; - } - } - } - } - if (dist == maxhops_discovered) - dist = MAXHOPS; /* skip to CAs */ - else - dist++; - } - - return (mylist.first); -} diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c index 2cfaa8a..3fba414 100644 --- a/infiniband-diags/src/ibnetdiscover.c +++ b/infiniband-diags/src/ibnetdiscover.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2004-2008 Voltaire Inc. All rights reserved. * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -47,483 +48,106 @@ #include #include -#include -#include -#include #include +#include +#include -#include "ibnetdiscover.h" -#include "grouping.h" #include "ibdiag_common.h" -static char *node_type_str[] = { - "???", - "ca", - "switch", - "router", - "iwarp rnic" -}; - -static char *linkwidth_str[] = { - "??", - "1x", - "4x", - "??", - "8x", - "??", - "??", - "??", - "12x" -}; - -static char *linkspeed_str[] = { - "???", - "SDR", - "DDR", - "???", - "QDR" -}; - -static int timeout = 2000; /* ms */ -static int dumplevel = 0; +static int debug; static int verbose; -static FILE *f; +#define LIST_CA_NODE (1 << IBND_CA_NODE) +#define LIST_SWITCH_NODE (1 << IBND_SWITCH_NODE) +#define LIST_ROUTER_NODE (1 << IBND_ROUTER_NODE) char *argv0 = "ibnetdiscover"; +static FILE *f; static char *node_name_map_file = NULL; static nn_map_t *node_name_map = NULL; -Node *nodesdist[MAXHOPS+1]; /* last is Ca list */ -Node *mynode; -int maxhops_discovered = 0; - -struct ChassisList *chassis = NULL; - -static char * -get_linkwidth_str(int linkwidth) -{ - if (linkwidth > 8) - return linkwidth_str[0]; - else - return linkwidth_str[linkwidth]; -} - -static char * -get_linkspeed_str(int linkspeed) -{ - if (linkspeed > 4) - return linkspeed_str[0]; - else - return linkspeed_str[linkspeed]; -} - -static inline const char* -node_type_str2(Node *node) -{ - switch(node->type) { - case SWITCH_NODE: return "SW"; - case CA_NODE: return "CA"; - case ROUTER_NODE: return "RT"; - } - return "??"; -} - -void -decode_port_info(void *pi, Port *port) -{ - mad_decode_field(pi, IB_PORT_LID_F, &port->lid); - mad_decode_field(pi, IB_PORT_LMC_F, &port->lmc); - mad_decode_field(pi, IB_PORT_STATE_F, &port->state); - mad_decode_field(pi, IB_PORT_PHYS_STATE_F, &port->physstate); - mad_decode_field(pi, IB_PORT_LINK_WIDTH_ACTIVE_F, &port->linkwidth); - mad_decode_field(pi, IB_PORT_LINK_SPEED_ACTIVE_F, &port->linkspeed); -} +static int timeout_ms = 2000; -int -get_port(Port *port, int portnum, ib_portid_t *portid) -{ - char portinfo[64]; - void *pi = portinfo; - - port->portnum = portnum; - - if (!smp_query(pi, portid, IB_ATTR_PORT_INFO, portnum, timeout)) - return -1; - decode_port_info(pi, port); - - DEBUG("portid %s portnum %d: lid %d state %d physstate %d %s %s", - portid2str(portid), portnum, port->lid, port->state, port->physstate, get_linkwidth_str(port->linkwidth), get_linkspeed_str(port->linkspeed)); - return 1; -} -/* - * Returns 0 if non switch node is found, 1 if switch is found, -1 if error. - */ -int -get_node(Node *node, Port *port, ib_portid_t *portid) -{ - char portinfo[64]; - char switchinfo[64]; - void *pi = portinfo, *ni = node->nodeinfo, *nd = node->nodedesc; - void *si = switchinfo; - - if (!smp_query(ni, portid, IB_ATTR_NODE_INFO, 0, timeout)) - return -1; - - mad_decode_field(ni, IB_NODE_GUID_F, &node->nodeguid); - mad_decode_field(ni, IB_NODE_TYPE_F, &node->type); - mad_decode_field(ni, IB_NODE_NPORTS_F, &node->numports); - mad_decode_field(ni, IB_NODE_DEVID_F, &node->devid); - mad_decode_field(ni, IB_NODE_VENDORID_F, &node->vendid); - mad_decode_field(ni, IB_NODE_SYSTEM_GUID_F, &node->sysimgguid); - mad_decode_field(ni, IB_NODE_PORT_GUID_F, &node->portguid); - mad_decode_field(ni, IB_NODE_LOCAL_PORT_F, &node->localport); - port->portnum = node->localport; - port->portguid = node->portguid; - - if (!smp_query(nd, portid, IB_ATTR_NODE_DESC, 0, timeout)) - return -1; - - if (!smp_query(pi, portid, IB_ATTR_PORT_INFO, 0, timeout)) - return -1; - decode_port_info(pi, port); - - if (node->type != SWITCH_NODE) - return 0; - - node->smalid = port->lid; - node->smalmc = port->lmc; - - /* after we have the sma information find out the real PortInfo for this port */ - if (!smp_query(pi, portid, IB_ATTR_PORT_INFO, node->localport, timeout)) - return -1; - decode_port_info(pi, port); - - if (!smp_query(si, portid, IB_ATTR_SWITCH_INFO, 0, timeout)) - node->smaenhsp0 = 0; /* assume base SP0 */ - else - mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &node->smaenhsp0); - - DEBUG("portid %s: got switch node %" PRIx64 " '%s'", - portid2str(portid), node->nodeguid, node->nodedesc); - return 1; -} - -static int -extend_dpath(ib_dr_path_t *path, int nextport) -{ - if (path->cnt+2 >= sizeof(path->p)) - return -1; - ++path->cnt; - if (path->cnt > maxhops_discovered) - maxhops_discovered = path->cnt; - path->p[path->cnt] = nextport; - return path->cnt; -} - -static void -dump_endnode(ib_portid_t *path, char *prompt, Node *node, Port *port) -{ - if (!dumplevel) - return; - - fprintf(f, "%s -> %s %s {%016" PRIx64 "} portnum %d lid %d-%d\"%s\"\n", - portid2str(path), prompt, - (node->type <= IB_NODE_MAX ? node_type_str[node->type] : "???"), - node->nodeguid, node->type == SWITCH_NODE ? 0 : port->portnum, - port->lid, port->lid + (1 << port->lmc) - 1, - clean_nodedesc(node->nodedesc)); -} - -#define HASHGUID(guid) ((uint32_t)(((uint32_t)(guid) * 101) ^ ((uint32_t)((guid) >> 32) * 103))) -#define HTSZ 137 - -static Node *nodestbl[HTSZ]; - -static Node * -find_node(Node *new) -{ - int hash = HASHGUID(new->nodeguid) % HTSZ; - Node *node; - - for (node = nodestbl[hash]; node; node = node->htnext) - if (node->nodeguid == new->nodeguid) - return node; - - return NULL; -} - -static Node * -create_node(Node *temp, ib_portid_t *path, int dist) -{ - Node *node; - int hash = HASHGUID(temp->nodeguid) % HTSZ; - - node = malloc(sizeof(*node)); - if (!node) - return NULL; - - memcpy(node, temp, sizeof(*node)); - node->dist = dist; - node->path = *path; - - node->htnext = nodestbl[hash]; - nodestbl[hash] = node; - - if (node->type != SWITCH_NODE) - dist = MAXHOPS; /* special Ca list */ - - node->dnext = nodesdist[dist]; - nodesdist[dist] = node; - - return node; -} - -static Port * -find_port(Node *node, Port *port) -{ - Port *old; - - for (old = node->ports; old; old = old->next) - if (old->portnum == port->portnum) - return old; - - return NULL; -} - -static Port * -create_port(Node *node, Port *temp) -{ - Port *port; - - port = malloc(sizeof(*port)); - if (!port) - return NULL; - - memcpy(port, temp, sizeof(*port)); - port->node = node; - port->next = node->ports; - node->ports = port; - - return port; -} - -static void -link_ports(Node *node, Port *port, Node *remotenode, Port *remoteport) -{ - DEBUG("linking: 0x%" PRIx64 " %p->%p:%u and 0x%" PRIx64 " %p->%p:%u", - node->nodeguid, node, port, port->portnum, - remotenode->nodeguid, remotenode, remoteport, remoteport->portnum); - if (port->remoteport) - port->remoteport->remoteport = NULL; - if (remoteport->remoteport) - remoteport->remoteport->remoteport = NULL; - port->remoteport = remoteport; - remoteport->remoteport = port; -} - -static int -handle_port(Node *node, Port *port, ib_portid_t *path, int portnum, int dist) -{ - Node node_buf; - Port port_buf; - Node *remotenode, *oldnode; - Port *remoteport, *oldport; - - memset(&node_buf, 0, sizeof(node_buf)); - memset(&port_buf, 0, sizeof(port_buf)); - - DEBUG("handle node %p port %p:%d dist %d", node, port, portnum, dist); - if (port->physstate != 5) /* LinkUp */ - return -1; - - if (extend_dpath(&path->drpath, portnum) < 0) - return -1; - - if (get_node(&node_buf, &port_buf, path) < 0) { - IBWARN("NodeInfo on %s failed, skipping port", - portid2str(path)); - path->drpath.cnt--; /* restore path */ - return -1; - } - - oldnode = find_node(&node_buf); - if (oldnode) - remotenode = oldnode; - else if (!(remotenode = create_node(&node_buf, path, dist + 1))) - IBERROR("no memory"); - - oldport = find_port(remotenode, &port_buf); - if (oldport) { - remoteport = oldport; - if (node != remotenode || port != remoteport) - IBWARN("port moving..."); - } else if (!(remoteport = create_port(remotenode, &port_buf))) - IBERROR("no memory"); - - dump_endnode(path, oldnode ? "known remote" : "new remote", - remotenode, remoteport); - - link_ports(node, port, remotenode, remoteport); - - path->drpath.cnt--; /* restore path */ - return 0; -} - -/* - * Return 1 if found, 0 if not, -1 on errors. - */ -static int -discover(ib_portid_t *from) -{ - Node node_buf; - Port port_buf; - Node *node; - Port *port; - int i; - int dist = 0; - ib_portid_t *path; - - DEBUG("from %s", portid2str(from)); - - memset(&node_buf, 0, sizeof(node_buf)); - memset(&port_buf, 0, sizeof(port_buf)); - - if (get_node(&node_buf, &port_buf, from) < 0) { - IBWARN("can't reach node %s", portid2str(from)); - return -1; - } - - node = create_node(&node_buf, from, 0); - if (!node) - IBERROR("out of memory"); - - mynode = node; - - port = create_port(node, &port_buf); - if (!port) - IBERROR("out of memory"); - - if (node->type != SWITCH_NODE && - handle_port(node, port, from, node->localport, 0) < 0) - return 0; - - for (dist = 0; dist < MAXHOPS; dist++) { - - for (node = nodesdist[dist]; node; node = node->dnext) { - - path = &node->path; - - DEBUG("dist %d node %p", dist, node); - dump_endnode(path, "processing", node, port); - - for (i = 1; i <= node->numports; i++) { - if (i == node->localport) - continue; - - if (get_port(&port_buf, i, path) < 0) { - IBWARN("can't reach node %s port %d", portid2str(path), i); - continue; - } - - port = find_port(node, &port_buf); - if (port) - continue; - - port = create_port(node, &port_buf); - if (!port) - IBERROR("out of memory"); - - /* If switch, set port GUID to node GUID */ - if (node->type == SWITCH_NODE) - port->portguid = node->portguid; - - handle_port(node, port, path, i, dist); - } - } - } - - return 0; -} - char * -node_name(Node *node) +node_name(ibnd_node_t *node) { static char buf[256]; - switch(node->type) { - case SWITCH_NODE: - sprintf(buf, "\"%s", "S"); - break; - case CA_NODE: + switch(node->info.type) { + case IBND_CA_NODE: sprintf(buf, "\"%s", "H"); break; - case ROUTER_NODE: + case IBND_SWITCH_NODE: + sprintf(buf, "\"%s", "S"); + break; + case IBND_ROUTER_NODE: sprintf(buf, "\"%s", "R"); break; default: sprintf(buf, "\"%s", "?"); break; } - sprintf(buf+2, "-%016" PRIx64 "\"", node->nodeguid); + sprintf(buf+2, "-%016" PRIx64 "\"", node->info.nodeguid); return buf; } void -list_node(Node *node) +list_node(ibnd_node_t *node, void *user_data) { - char *node_type; - char *nodename = remap_node_name(node_name_map, node->nodeguid, + char *nodename = remap_node_name(node_name_map, node->info.nodeguid, node->nodedesc); - switch(node->type) { - case SWITCH_NODE: - node_type = "Switch"; - break; - case CA_NODE: - node_type = "Ca"; - break; - case ROUTER_NODE: - node_type = "Router"; - break; - default: - node_type = "???"; - break; - } fprintf(f, "%s\t : 0x%016" PRIx64 " ports %d devid 0x%x vendid 0x%x \"%s\"\n", - node_type, - node->nodeguid, node->numports, node->devid, node->vendid, + ibnd_node_type_str(node), + node->info.nodeguid, node->info.numports, node->info.devid, + node->info.vendid, nodename); free(nodename); } void -out_ids(Node *node, int group, char *chname) +list_nodes(ibnd_fabric_t *fabric, int list) +{ + if (list & LIST_CA_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_CA_NODE, NULL); + } + if (list & LIST_SWITCH_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_SWITCH_NODE, NULL); + } + if (list & LIST_ROUTER_NODE) { + ibnd_iter_nodes_type(fabric, list_node, IBND_ROUTER_NODE, NULL); + } +} + +void +out_ids(ibnd_node_t *node, int group, char *chname) { - fprintf(f, "\nvendid=0x%x\ndevid=0x%x\n", node->vendid, node->devid); - if (node->sysimgguid) - fprintf(f, "sysimgguid=0x%" PRIx64, node->sysimgguid); - if (group - && node->chrecord && node->chrecord->chassisnum) { - fprintf(f, "\t\t# Chassis %d", node->chrecord->chassisnum); + fprintf(f, "\nvendid=0x%x\ndevid=0x%x\n", node->info.vendid, node->info.devid); + if (node->info.sysimgguid) + fprintf(f, "sysimgguid=0x%" PRIx64, node->info.sysimgguid); + if (group && node->chassis && node->chassis->chassisnum) { + fprintf(f, "\t\t# Chassis %d", node->chassis->chassisnum); if (chname) - fprintf(f, " (%s)", chname); - if (is_xsigo_tca(node->nodeguid) && node->ports->remoteport) - fprintf(f, " slot %d", node->ports->remoteport->portnum); + fprintf(f, " (%s)", clean_nodedesc(chname)); + if (ibnd_is_xsigo_tca(node->info.nodeguid) + && node->ports[1] + && node->ports[1]->remoteport) + fprintf(f, " slot %d", node->ports[1]->remoteport->portnum); } fprintf(f, "\n"); } + uint64_t -out_chassis(int chassisnum) +out_chassis(ibnd_fabric_t *fabric, int chassisnum) { uint64_t guid; fprintf(f, "\nChassis %d", chassisnum); - guid = get_chassis_guid(chassisnum); + guid = ibnd_get_chassis_guid(fabric, chassisnum); if (guid) fprintf(f, " (guid 0x%" PRIx64 ")", guid); fprintf(f, "\n"); @@ -531,54 +155,49 @@ out_chassis(int chassisnum) } void -out_switch(Node *node, int group, char *chname) +out_switch(ibnd_node_t *node, int group, char *chname) { char *str; + char str2[256]; char *nodename = NULL; out_ids(node, group, chname); - fprintf(f, "switchguid=0x%" PRIx64, node->nodeguid); - fprintf(f, "(%" PRIx64 ")", node->portguid); - /* Currently, only if Voltaire chassis */ - if (group - && node->chrecord && node->chrecord->chassisnum - && node->vendid == VTR_VENDOR_ID) { - str = get_chassis_type(node->chrecord->chassistype); + fprintf(f, "switchguid=0x%" PRIx64, node->info.nodeguid); + fprintf(f, "(%" PRIx64 ")", node->info.nodeportguid); + if (group) { + str = ibnd_get_chassis_type(node); if (str) fprintf(f, "%s ", str); - str = get_chassis_slot(node->chrecord->chassisslot); + str = ibnd_get_chassis_slot_str(node, str2, 256); if (str) - fprintf(f, "%s ", str); - fprintf(f, "%d Chip %d", node->chrecord->slotnum, node->chrecord->anafanum); + fprintf(f, "%s", str); } - nodename = remap_node_name(node_name_map, node->nodeguid, + nodename = remap_node_name(node_name_map, node->info.nodeguid, node->nodedesc); fprintf(f, "\nSwitch\t%d %s\t\t# \"%s\" %s port 0 lid %d lmc %d\n", - node->numports, node_name(node), + node->info.numports, node_name(node), nodename, - node->smaenhsp0 ? "enhanced" : "base", + node->sw_info.smaenhsp0 ? "enhanced" : "base", node->smalid, node->smalmc); free(nodename); } void -out_ca(Node *node, int group, char *chname) +out_ca(ibnd_node_t *node, int group, char *chname) { char *node_type; char *node_type2; - char *nodename = remap_node_name(node_name_map, node->nodeguid, - node->nodedesc); out_ids(node, group, chname); - switch(node->type) { - case CA_NODE: + switch(node->info.type) { + case IBND_CA_NODE: node_type = "ca"; node_type2 = "Ca"; break; - case ROUTER_NODE: + case IBND_ROUTER_NODE: node_type = "rt"; node_type2 = "Rt"; break; @@ -588,37 +207,37 @@ out_ca(Node *node, int group, char *chname) break; } - fprintf(f, "%sguid=0x%" PRIx64 "\n", node_type, node->nodeguid); + fprintf(f, "%sguid=0x%" PRIx64 "\n", node_type, node->info.nodeguid); fprintf(f, "%s\t%d %s\t\t# \"%s\"", - node_type2, node->numports, node_name(node), - nodename); - if (group && is_xsigo_hca(node->nodeguid)) + node_type2, node->info.numports, node_name(node), + clean_nodedesc(node->nodedesc)); + if (group && ibnd_is_xsigo_hca(node->info.nodeguid)) fprintf(f, " (scp)"); fprintf(f, "\n"); - - free(nodename); } +#define OUT_BUFFER_SIZE 16 static char * -out_ext_port(Port *port, int group) +out_ext_port(ibnd_port_t *port, int group) { - char *str = NULL; + static char mapping[OUT_BUFFER_SIZE]; - /* Currently, only if Voltaire chassis */ - if (group - && port->node->chrecord && port->node->vendid == VTR_VENDOR_ID) - str = portmapstring(port); + if (group && port->ext_portnum != 0) { + snprintf(mapping, OUT_BUFFER_SIZE, + "[ext %d]", port->ext_portnum); + return (mapping); + } - return (str); + return (NULL); } void -out_switch_port(Port *port, int group) +out_switch_port(ibnd_port_t *port, int group) { char *ext_port_str = NULL; char *rem_nodename = NULL; - DEBUG("port %p:%d remoteport %p", port, port->portnum, port->remoteport); + DEBUG("port %p:%d remoteport %p\n", port, port->portnum, port->remoteport); fprintf(f, "[%d]", port->portnum); ext_port_str = out_ext_port(port, group); @@ -626,7 +245,7 @@ out_switch_port(Port *port, int group) fprintf(f, "%s", ext_port_str); rem_nodename = remap_node_name(node_name_map, - port->remoteport->node->nodeguid, + port->remoteport->node->info.nodeguid, port->remoteport->node->nodedesc); ext_port_str = out_ext_port(port->remoteport, group); @@ -634,17 +253,17 @@ out_switch_port(Port *port, int group) node_name(port->remoteport->node), port->remoteport->portnum, ext_port_str ? ext_port_str : ""); - if (port->remoteport->node->type != SWITCH_NODE) - fprintf(f, "(%" PRIx64 ") ", port->remoteport->portguid); + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) + fprintf(f, "(%" PRIx64 ") ", port->remoteport->guid); fprintf(f, "\t\t# \"%s\" lid %d %s%s", rem_nodename, - port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid, - get_linkwidth_str(port->linkwidth), - get_linkspeed_str(port->linkspeed)); + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active, 0)); - if (is_xsigo_tca(port->remoteport->portguid)) + if (ibnd_is_xsigo_tca(port->remoteport->guid)) fprintf(f, " slot %d", port->portnum); - else if (is_xsigo_hca(port->remoteport->portguid)) + else if (ibnd_is_xsigo_hca(port->remoteport->guid)) fprintf(f, " (scp)"); fprintf(f, "\n"); @@ -652,278 +271,290 @@ out_switch_port(Port *port, int group) } void -out_ca_port(Port *port, int group) +out_ca_port(ibnd_port_t *port, int group) { char *str = NULL; char *rem_nodename = NULL; fprintf(f, "[%d]", port->portnum); - if (port->node->type != SWITCH_NODE) - fprintf(f, "(%" PRIx64 ") ", port->portguid); + if (port->node->info.type != IBND_SWITCH_NODE) + fprintf(f, "(%" PRIx64 ") ", port->guid); fprintf(f, "\t%s[%d]", node_name(port->remoteport->node), port->remoteport->portnum); str = out_ext_port(port->remoteport, group); if (str) fprintf(f, "%s", str); - if (port->remoteport->node->type != SWITCH_NODE) - fprintf(f, " (%" PRIx64 ") ", port->remoteport->portguid); + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) + fprintf(f, " (%" PRIx64 ") ", port->remoteport->guid); rem_nodename = remap_node_name(node_name_map, - port->remoteport->node->nodeguid, + port->remoteport->node->info.nodeguid, port->remoteport->node->nodedesc); fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d %s%s\n", - port->lid, port->lmc, rem_nodename, - port->remoteport->node->type == SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->lid, - get_linkwidth_str(port->linkwidth), - get_linkspeed_str(port->linkspeed)); + port->info.lid, port->info.lmc, rem_nodename, + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active, 0)); free(rem_nodename); } +struct iter_user_data { + int group; + int skip_chassis_nodes; +}; + +static void +switch_iter_func(ibnd_node_t *node, void *iter_user_data) +{ + ibnd_port_t *port; + int p = 0; + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; + + DEBUG("SWITCH: node %p\n", node); + + /* skip chassis based switches if flagged */ + if (data->skip_chassis_nodes && node->chassis && node->chassis->chassisnum) + return; + + out_switch(node, data->group, NULL); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_switch_port(port, data->group); + } +} + +static void +ca_iter_func(ibnd_node_t *node, void *iter_user_data) +{ + ibnd_port_t *port; + int p = 0; + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; + + DEBUG("CA: node %p\n", node); + /* Now, skip chassis based CAs */ + if (data->group && node->chassis && node->chassis->chassisnum) + return; + out_ca(node, data->group, NULL); + + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_ca_port(port, data->group); + } +} + +static void +router_iter_func(ibnd_node_t *node, void *iter_user_data) +{ + ibnd_port_t *port; + int p = 0; + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; + + DEBUG("RT: node %p\n", node); + /* Now, skip chassis based RTs */ + if (data->group && node->chassis && + node->chassis->chassisnum) + return; + out_ca(node, data->group, NULL); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_ca_port(port, data->group); + } +} + int -dump_topology(int listtype, int group) +dump_topology(int group, ibnd_fabric_t *fabric) { - Node *node; - Port *port; - int i = 0, dist = 0; + ibnd_node_t *node; + ibnd_port_t *port; + int i = 0, p = 0; time_t t = time(0); uint64_t chguid; char *chname = NULL; + struct iter_user_data iter_user_data; - if (!listtype) { - fprintf(f, "#\n# Topology file: generated on %s#\n", ctime(&t)); - fprintf(f, "# Max of %d hops discovered\n", maxhops_discovered); - fprintf(f, "# Initiated from node %016" PRIx64 " port %016" PRIx64 "\n", mynode->nodeguid, mynode->portguid); - } + fprintf(f, "#\n# Topology file: generated on %s#\n", ctime(&t)); + fprintf(f, "# Max of %d hops discovered\n", fabric->maxhops_discovered); + fprintf(f, "# Initiated from node %016" PRIx64 " port %016" PRIx64 "\n", + fabric->from_node->info.nodeguid, fabric->from_node->info.nodeportguid); /* Make pass on switches */ - if (group && !listtype) { - ChassisList *ch = NULL; + if (group) { + ibnd_chassis_t *ch = NULL; /* Chassis based switches first */ - for (ch = chassis; ch; ch = ch->next) { + for (ch = fabric->chassis; ch; ch = ch->next) { int n = 0; if (!ch->chassisnum) continue; - chguid = out_chassis(ch->chassisnum); - if (chname) - free(chname); + chguid = out_chassis(fabric, ch->chassisnum); + chname = NULL; - if (is_xsigo_guid(chguid)) { - for (node = nodesdist[MAXHOPS]; node; node = node->dnext) { - if (!node->chrecord || - !node->chrecord->chassisnum) +/** + * Will this work for Xsigo? + */ + if (ibnd_is_xsigo_guid(chguid)) { + for (node = ch->nodes; node; + node = node->next_chassis_node) { + if (ibnd_is_xsigo_hca(node->info.nodeguid)) { + chname = node->nodedesc; + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); + } + } + +#if 0 +/** + * vs. this? + * I don't want to expose the nodesdist array to the end user. + */ + for (node = fabric->nodesdist[MAXHOPS]; node; node = node->dnext) { + if (!node->chassis || + !node->chassis->chassisnum) continue; - if (node->chrecord->chassisnum != ch->chassisnum) + if (node->chassis->chassisnum != ch->chassisnum) continue; - if (is_xsigo_hca(node->nodeguid)) { - chname = remap_node_name(node_name_map, - node->nodeguid, - node->nodedesc); - fprintf(f, "Hostname: %s\n", chname); + if (ibnd_is_xsigo_hca(node->nodeguid)) { + chname = node->nodedesc; + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); } } +#endif } fprintf(f, "\n# Spine Nodes"); - for (n = 1; n <= (SPINES_MAX_NUM+1); n++) { + for (n = 1; n <= SPINES_MAX_NUM; n++) { if (ch->spinenode[n]) { out_switch(ch->spinenode[n], group, chname); - for (port = ch->spinenode[n]->ports; port; port = port->next, i++) - if (port->remoteport) + for (p = 1; p <= ch->spinenode[n]->info.numports; p++) { + port = ch->spinenode[n]->ports[p]; + if (port && port->remoteport) out_switch_port(port, group); + } } } fprintf(f, "\n# Line Nodes"); - for (n = 1; n <= (LINES_MAX_NUM+1); n++) { + for (n = 1; n <= LINES_MAX_NUM; n++) { if (ch->linenode[n]) { out_switch(ch->linenode[n], group, chname); - for (port = ch->linenode[n]->ports; port; port = port->next, i++) - if (port->remoteport) + for (p = 1; p <= ch->linenode[n]->info.numports; p++) { + port = ch->linenode[n]->ports[p]; + if (port && port->remoteport) out_switch_port(port, group); + } } } fprintf(f, "\n# Chassis Switches"); - for (dist = 0; dist <= maxhops_discovered; dist++) { - - for (node = nodesdist[dist]; node; node = node->dnext) { - - /* Non Voltaire chassis */ - if (node->vendid == VTR_VENDOR_ID) - continue; - if (!node->chrecord || - !node->chrecord->chassisnum) - continue; - - if (node->chrecord->chassisnum != ch->chassisnum) - continue; - + for (node = ch->nodes; node; + node = node->next_chassis_node) { + if (node->info.type == IBND_SWITCH_NODE) { out_switch(node, group, chname); - for (port = node->ports; port; port = port->next, i++) - if (port->remoteport) + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) out_switch_port(port, group); - + } } - } fprintf(f, "\n# Chassis CAs"); - for (node = nodesdist[MAXHOPS]; node; node = node->dnext) { - if (!node->chrecord || - !node->chrecord->chassisnum) - continue; - - if (node->chrecord->chassisnum != ch->chassisnum) - continue; - - out_ca(node, group, chname); - for (port = node->ports; port; port = port->next, i++) - if (port->remoteport) - out_ca_port(port, group); - + for (node = ch->nodes; node; + node = node->next_chassis_node) { + if (node->info.type == IBND_CA_NODE) { + out_ca(node, group, chname); + for (p = 1; p <= node->info.numports; p++) { + port = node->ports[p]; + if (port && port->remoteport) + out_ca_port(port, group); + } + } } } - } else { - for (dist = 0; dist <= maxhops_discovered; dist++) { - - for (node = nodesdist[dist]; node; node = node->dnext) { - - DEBUG("SWITCH: dist %d node %p", dist, node); - if (!listtype) - out_switch(node, group, chname); - else { - if (listtype & LIST_SWITCH_NODE) - list_node(node); - continue; - } + } else { /* !group */ + iter_user_data.group = group; + iter_user_data.skip_chassis_nodes = 0; - for (port = node->ports; port; port = port->next, i++) - if (port->remoteport) - out_switch_port(port, group); - } - } + ibnd_iter_nodes_type(fabric, switch_iter_func, + IBND_SWITCH_NODE, &iter_user_data); } - if (chname) - free(chname); chname = NULL; - if (group && !listtype) { + if (group) { + iter_user_data.group = group; + iter_user_data.skip_chassis_nodes = 1; fprintf(f, "\nNon-Chassis Nodes\n"); - - for (dist = 0; dist <= maxhops_discovered; dist++) { - - for (node = nodesdist[dist]; node; node = node->dnext) { - - DEBUG("SWITCH: dist %d node %p", dist, node); - /* Now, skip chassis based switches */ - if (node->chrecord && - node->chrecord->chassisnum) - continue; - out_switch(node, group, chname); - - for (port = node->ports; port; port = port->next, i++) - if (port->remoteport) - out_switch_port(port, group); - } - - } + ibnd_iter_nodes_type(fabric, switch_iter_func, + IBND_SWITCH_NODE, &iter_user_data); } - /* Make pass on CAs */ - for (node = nodesdist[MAXHOPS]; node; node = node->dnext) { - - DEBUG("CA: dist %d node %p", dist, node); - if (!listtype) { - /* Now, skip chassis based CAs */ - if (group && node->chrecord && - node->chrecord->chassisnum) - continue; - out_ca(node, group, chname); - } else { - if (((listtype & LIST_CA_NODE) && (node->type == CA_NODE)) || - ((listtype & LIST_ROUTER_NODE) && (node->type == ROUTER_NODE))) - list_node(node); - continue; - } + iter_user_data.group = group; + iter_user_data.skip_chassis_nodes = 0; - for (port = node->ports; port; port = port->next, i++) - if (port->remoteport) - out_ca_port(port, group); - } + /* Make pass on CAs */ + ibnd_iter_nodes_type(fabric, ca_iter_func, IBND_CA_NODE, + &iter_user_data); - if (chname) - free(chname); + /* make pass on routers */ + ibnd_iter_nodes_type(fabric, router_iter_func, IBND_ROUTER_NODE, + &iter_user_data); return i; } -void dump_ports_report () + +void dump_ports_report (ibnd_node_t *node, void *user_data) { - int b, n = 0, p; - Node *node; - Port *port; - - // If switch and LID == 0, search of other switch ports with - // valid LID and assign it to all ports of that switch - for (b = 0; b <= MAXHOPS; b++) - for (node = nodesdist[b]; node; node = node->dnext) - if (node->type == SWITCH_NODE) { - int swlid = 0; - for (p = 0, port = node->ports; - p < node->numports && port && !swlid; - port = port->next) - if (port->lid != 0) - swlid = port->lid; - for (p = 0, port = node->ports; - p < node->numports && port; - port = port->next) - port->lid = swlid; - } + int p = 0; + ibnd_port_t *port = NULL; + + /* for each port */ + for (p = node->info.numports, port = node->ports[p]; + p > 0; + port = node->ports[--p]) { + if (port == NULL) + continue; - for (b = 0; b <= MAXHOPS; b++) - for (node = nodesdist[b]; node; node = node->dnext) { - for (p = 0, port = node->ports; - p < node->numports && port; - p++, port = port->next) { - fprintf(stdout, - "%2s %5d %2d 0x%016" PRIx64 " %s %s", - node_type_str2(port->node), port->lid, - port->portnum, - port->portguid, - get_linkwidth_str(port->linkwidth), - get_linkspeed_str(port->linkspeed)); - if (port->remoteport) - fprintf(stdout, - " - %2s %5d %2d 0x%016" PRIx64 - " ( '%s' - '%s' )\n", - node_type_str2(port->remoteport->node), - port->remoteport->lid, - port->remoteport->portnum, - port->remoteport->portguid, - port->node->nodedesc, - port->remoteport->node->nodedesc); - else - fprintf(stdout, "%36s'%s'\n", "", - port->node->nodedesc); - } - n++; - } + fprintf(stdout, + "%2s %5d %2d 0x%016" PRIx64 " %s %s", + ibnd_node_type_str_short(node), + node->info.type == IBND_SWITCH_NODE ? node->smalid : port->info.lid, + port->portnum, + port->guid, + ibnd_linkwidth_str(port->info.link_width_active), + ibnd_linkspeed_str(port->info.link_speed_active, 0)); + if (port->remoteport) + fprintf(stdout, + " - %2s %5d %2d 0x%016" PRIx64 + " ( '%s' - '%s' )\n", + ibnd_node_type_str_short(port->remoteport->node), + port->remoteport->node->info.type == IBND_SWITCH_NODE ? + port->remoteport->node->smalid : port->remoteport->info.lid, + port->remoteport->portnum, + port->remoteport->guid, + port->node->nodedesc, + port->remoteport->node->nodedesc); + else + fprintf(stdout, "%36s'%s'\n", "", + port->node->nodedesc); + } } void usage(void) { - fprintf(stderr, "Usage: %s [-d(ebug)] -e(rr_show) -v(erbose) -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -R(outer_list) -V(ersion) -C ca_name -P ca_port " + fprintf(stderr, "Usage: %s [-d(ebug)] -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -R(outer_list) -V(ersion) -C ca_name -P ca_port " "-t(imeout) timeout_ms --node-name-map node-name-map] -p(orts) []\n", argv0); fprintf(stderr, " --node-name-map specify a node name map file\n"); @@ -933,20 +564,18 @@ usage(void) int main(int argc, char **argv) { - int mgmt_classes[2] = {IB_SMI_CLASS, IB_SMI_DIRECT_CLASS}; - ib_portid_t my_portid = {0}; - int udebug = 0, list = 0; + int list = 0; char *ca = 0; int ca_port = 0; int group = 0; int ports_report = 0; + ibnd_fabric_t *fabric = NULL; static char const str_opts[] = "C:P:t:devslgHSRpVhu"; static const struct option long_opts[] = { { "C", 1, 0, 'C'}, { "P", 1, 0, 'P'}, { "debug", 0, 0, 'd'}, - { "err_show", 0, 0, 'e'}, { "verbose", 0, 0, 'v'}, { "show", 0, 0, 's'}, { "list", 0, 0, 'l'}, @@ -982,23 +611,17 @@ main(int argc, char **argv) ca_port = strtoul(optarg, 0, 0); break; case 'd': - ibdebug++; - madrpc_show_errors(1); - umad_debug(udebug); - udebug++; + debug = 1; + ibnd_debug(1); break; case 't': - timeout = strtoul(optarg, 0, 0); + timeout_ms = strtoul(optarg, 0, 0); break; case 'v': verbose++; - dumplevel++; break; case 's': - dumplevel = 1; - break; - case 'e': - madrpc_show_errors(1); + ibnd_show_progress(1); break; case 'l': list = LIST_CA_NODE | LIST_SWITCH_NODE | LIST_ROUTER_NODE; @@ -1007,13 +630,13 @@ main(int argc, char **argv) group = 1; break; case 'S': - list = LIST_SWITCH_NODE; + list |= LIST_SWITCH_NODE; break; case 'H': - list = LIST_CA_NODE; + list |= LIST_CA_NODE; break; case 'R': - list = LIST_ROUTER_NODE; + list |= LIST_ROUTER_NODE; break; case 'V': fprintf(stderr, "%s %s\n", argv0, get_build_version() ); @@ -1030,22 +653,25 @@ main(int argc, char **argv) argv += optind; if (argc && !(f = fopen(argv[0], "w"))) - IBERROR("can't open file %s for writing", argv[0]); + fprintf(stderr, "can't open file %s for writing", argv[0]); - madrpc_init(ca, ca_port, mgmt_classes, 2); node_name_map = open_node_name_map(node_name_map_file); - if (discover(&my_portid) < 0) - IBERROR("discover"); - - if (group) - chassis = group_nodes(); + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { + fprintf(stderr, "discover failed\n"); + exit(1); + } if (ports_report) - dump_ports_report(); + ibnd_iter_nodes(fabric, + dump_ports_report, + NULL); + else if (list) + list_nodes(fabric, list); else - dump_topology(list, group); + dump_topology(group, fabric); + ibnd_destroy_fabric(fabric); close_node_name_map(node_name_map); exit(0); } -- 1.5.4.5 From vlad at lists.openfabrics.org Fri Dec 12 03:25:58 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 12 Dec 2008 03:25:58 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081212-0200 daily build status Message-ID: <20081212112558.CBC03E60DED@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From cap at nsc.liu.se Fri Dec 12 03:26:54 2008 From: cap at nsc.liu.se (Peter Kjellstrom) Date: Fri, 12 Dec 2008 12:26:54 +0100 Subject: [ofa-general] Infiniband performance In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784F017A4199@mtiexch01.mti.com> References: <9FA59C95FFCBB34EA5E42C1A8573784F017A4199@mtiexch01.mti.com> Message-ID: <200812121226.58394.cap@nsc.liu.se> On Thursday 11 December 2008, Gilad Shainer wrote: > On the maximum BW you are correct - IB is capable for 16Gb/s data rate. You > are seeing 12Gb/s due to the host chipset bandwidth limitation. Using a few more words, IB DDR on PCI-express 8x gen1 does not reach its full potential (12/16 Gbps sounds about right). To max out IB DDR you'd need PCI-express 8x gen2 or 16x gen1. The former is available (for some combinations of HCA and host chipset) and yields a performance quite close to 16 Gbps. /Peter > Gilad. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: From nicolas.morey-chaisemartin at ext.bull.net Fri Dec 12 06:07:11 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Fri, 12 Dec 2008 15:07:11 +0100 Subject: [ofa-general] ipoib device not loading? In-Reply-To: <49418262.3070801@ncsa.uiuc.edu> References: <49418262.3070801@ncsa.uiuc.edu> Message-ID: <4942700F.7080401@ext.bull.net> Jeremy Enos wrote: > > [root at host OFED-1.4-20081209-0926]# service openibd status > > HCA driver loaded > > > The following OFED modules are loaded: > > ib_ipath > mlx4_core > mlx4_ib > ib_mthca > ib_uverbs > ib_umad > ib_sa > ib_cm > ib_mad > ib_core > iw_cxgb3 > > [root at host OFED-1.4-20081209-0926]# ifup ib0 > ib_ipoib device ib0 does not seem to be present, delaying initialization. > > You need the ib_ipoib module to be able to use ipoib. Did you put the right option (--with_ipoib_mod or something like this) when running ./configure for ofa_kernel? Try to run modprobe ib_ipoib. If it works then, add ipoib to your /etc/infiniband/openib.conf so it'll be loaded by openibd. If it doesn't, so check syslog, dmesg and other logs to see it it was an error or simply the file is missing If it's missing, recompile your ofa_kernel with the ipoib module. Nicolas From Wayne.Glanfield at uk.renaultf1.com Fri Dec 12 09:04:12 2008 From: Wayne.Glanfield at uk.renaultf1.com (Glanfield, Wayne) Date: Fri, 12 Dec 2008 17:04:12 +0000 Subject: [ofa-general] MPI_Test: ibv_poll_cq(): bad status 12 Message-ID: Not sure if this is the correct forum, but we are experiencing problems with IB when running a commercial CFD code which is causing jobs to crash with the following errors. Could someone explain what is the likely cause of these and how we can minimise their occurrence. Thanks Wayne starccm+: Rank 0:172: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:172: MPI_Test: self cfd-cnsl-0230 peer cfd-cnsl-0144 (rank: 219) starccm+: Rank 0:172: MPI_Test: error message: transport retry exceeded error Error: {'In': ['Machine::main', 'SimulationIterator::startIterating', 'SteadySolver::step', 'SegregatedFlowSolver::iterationUpdate'], 'Neo.Error': 'Error', 'Processor': 172, 'Severity': 'EXCEPTION', 'message': 'MPI Error : MPI_Test: Internal MPI error'}Synchronizing parallel nodes (attempt 0) starccm+: Rank 0:71: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:68: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:71: MPI_Test: self cfd-cnsl-0196 peer cfd-cnsl-0214 (rank: 92) starccm+: Rank 0:71: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:68: MPI_Test: self cfd-cnsl-0196 peer cfd-cnsl-0214 (rank: 93) starccm+: Rank 0:68: MPI_Test: error message: transport retry exceeded error Error: {'In': ['Machine::main', 'SimulationIterator::startIterating', 'SteadySolver::step', 'SegregatedFlowSolver::iterationUpdate', 'AMGLinearSolver::solve'], 'Neo.Error': 'Error', 'Processor': 71, 'Severity': 'EXCEPTION', 'message': 'MPI Error : MPI_Test: Internal MPI error'} Synchronizing parallel nodes (attempt 0) starccm+: Rank 0:68: MPI_Gather: ibv_poll_cq(): bad status 5 starccm+: Rank 0:68: MPI_Gather: self cfd-cnsl-0196 peer cfd-cnsl-0214 (rank: 93) starccm+: Rank 0:68: MPI_Gather: error message: work request flushed error starccm+: Rank 0:71: MPI_Gather: ibv_poll_cq(): bad status 12 starccm+: Rank 0:71: MPI_Gather: self cfd-cnsl-0196 peer cfd-cnsl-0214 (rank: 91) starccm+: Rank 0:71: MPI_Gather: error message: transport retry exceeded error /apps/CFD/CD-ADAPCO/Linux/starccm+3.04.008/star/bin/starenv: line 961: 5745 Segmentation fault "$@" starccm+: Rank 0:118: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:46: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:42: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:118: MPI_Test: self cfd-cnsl-0408 peer cfd-cnsl-0452 (rank: 229) starccm+: Rank 0:118: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:42: MPI_Test: self cfd-cnsl-0271 peer cfd-cnsl-0452 (rank: 229) starccm+: Rank 0:42: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:46: MPI_Test: self cfd-cnsl-0271 peer cfd-cnsl-0452 (rank: 228) starccm+: Rank 0:46: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:86: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:87: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:93: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:244: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:244: MPI_Test: self cfd-cnsl-0342 peer cfd-cnsl-0257 (rank: 26) starccm+: Rank 0:244: MPI_Test: error message: transport retry exceeded error Error: {'In': ['Machine::main', 'SimulationIterator::startIterating', 'SteadySolver::step', 'RsTurbSolver::iterationUpdate'], 'Neo.Error': 'Error', 'Processor': 244, 'Severity': 'EXCEPTION', 'message': 'MPI Error : MPI_Test: Internal MPI error'} Synchronizing parallel nodes (attempt 0) starccm+: Rank 0:26: MPI_Cancel: ibv_poll_cq(): bad status 12 starccm+: Rank 0:26: MPI_Cancel: self cfd-cnsl-0257 peer cfd-cnsl-0342 (rank: 244) starccm+: Rank 0:26: MPI_Cancel: error message: transport retry exceeded error starccm+: Rank 0:244: MPI_Cancel: ibv_poll_cq(): bad status 5 starccm+: Rank 0:244: MPI_Cancel: self cfd-cnsl-0342 peer cfd-cnsl-0257 (rank: 26) starccm+: Rank 0:244: MPI_Cancel: error message: work request flushed error starccm+: Rank 0:244: MPI_Cancel: MPI BUG: no requests done /apps/CFD/CD-ADAPCO/Linux/starccm+3.04.008/star/bin/starenv: line 961: 5729 Segmentation fault "$@" MPI Application rank 244 exited before MPI_Finalize() with status 139 hung starccm+: Rank 0:58: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:57: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:57: MPI_Test: self cfd-cnsl-0401 peer cfd-cnsl-0448 (rank: 40) starccm+: Rank 0:57: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:58: MPI_Test: self cfd-cnsl-0401 peer cfd-cnsl-0448 (rank: 42) starccm+: Rank 0:58: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:72: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:72: MPI_Test: self cfd-cnsl-0371 peer cfd-cnsl-0277 (rank: 1) starccm+: Rank 0:72: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:74: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:74: MPI_Test: self cfd-cnsl-0371 peer cfd-cnsl-0277 (rank: 1) starccm+: Rank 0:74: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:75: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:75: MPI_Test: self cfd-cnsl-0371 peer cfd-cnsl-0448 (rank: 40) starccm+: Rank 0:75: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:26: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:29: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:29: MPI_Test: self cfd-cnsl-0349 peer cfd-cnsl-0418 (rank: 252) starccm+: Rank 0:29: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:26: MPI_Test: self cfd-cnsl-0349 peer cfd-cnsl-0418 (rank: 254) starccm+: Rank 0:26: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:134: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:129: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:135: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:131: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:130: MPI_Test: ibv_poll_cq(): bad status 12 starccm+: Rank 0:134: MPI_Test: self cfd-cnsl-0386 peer cfd-cnsl-0418 (rank: 250) starccm+: Rank 0:134: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:131: MPI_Test: self cfd-cnsl-0386 peer cfd-cnsl-0418 (rank: 255) starccm+: Rank 0:131: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:130: MPI_Test: self cfd-cnsl-0386 peer cfd-cnsl-0418 (rank: 254) starccm+: Rank 0:130: MPI_Test: error message: transport retry exceeded error starccm+: Rank 0:129: MPI_Test: self cfd-cnsl-0386 peer cfd-cnsl-0418 (rank: 254) starccm+: Rank 0:129: MPI_Test: error message: transport retry exceeded error --------------------------------------------------------------------- For further information on the Renault F1 Team visit our web site at www.renaultf1.com. Renault F1 Team Limited Registered in England no. 1806337 Registered Office: 16 Old Bailey London EC4M 7EG WARNING: please ensure that you have adequate virus protection in place before you open or detach any documents attached to this email. This e-mail may constitute privileged information. If you are not the intended recipient, you have received this confidential email and any attachments transmitted with it in error and you must not disclose copy, circulate or in any other way use or rely on this information. E-mails to and from the Renault F1 Team are monitored for operational reasons and in accordance with lawful business practices. The contents of this email are those of the individual and do not necessarily represent the views of the company. Please note that this e-mail has been created in the knowledge that Internet e-mail is not a 100% secure communications medium. We advise that you understand and observe this lack of security when e-mailing us. If you have received this email in error please forward to: is.helpdesk at uk.renaultf1.com quoting the sender, then delete the message and any attached documents --------------------------------------------------------------------- From rdreier at cisco.com Fri Dec 12 10:19:23 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 12 Dec 2008 10:19:23 -0800 Subject: [ofa-general] Infiniband performance In-Reply-To: <200812121226.58394.cap@nsc.liu.se> (Peter Kjellstrom's message of "Fri, 12 Dec 2008 12:26:54 +0100") References: <9FA59C95FFCBB34EA5E42C1A8573784F017A4199@mtiexch01.mti.com> <200812121226.58394.cap@nsc.liu.se> Message-ID: > To max out IB DDR you'd need PCI-express 8x gen2 or 16x gen1. The former is > available (for some combinations of HCA and host chipset) and yields a > performance quite close to 16 Gbps. Actually both alternatives are available. - R. From jenos at ncsa.uiuc.edu Fri Dec 12 11:29:17 2008 From: jenos at ncsa.uiuc.edu (Jeremy Enos) Date: Fri, 12 Dec 2008 13:29:17 -0600 Subject: [ofa-general] ipoib device not loading? In-Reply-To: <4942700F.7080401@ext.bull.net> References: <49418262.3070801@ncsa.uiuc.edu> <4942700F.7080401@ext.bull.net> Message-ID: <4942BB8D.4030800@ncsa.uiuc.edu> Ah.. thanks. (when I installed, I just told it to install all) [root at host jenos]# modprobe ib_ipoib FATAL: Error inserting ib_ipoib (/lib/modules/2.6.27.5-41.fc9.x86_64/updates/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko): Unknown symbol in module, or unknown parameter (see dmesg) Then dmesg shows this line: ib_ipoib: Unknown symbol icmpv6_send I have IPv6 disabled on this host for other reasons. Perhaps that's causing the problem? If so, is there a way to build w/o IPv6 requirements? thx- Jeremy Nicolas Morey Chaisemartin wrote: > Jeremy Enos wrote: >> >> [root at host OFED-1.4-20081209-0926]# service openibd status >> >> HCA driver loaded >> >> >> The following OFED modules are loaded: >> >> ib_ipath >> mlx4_core >> mlx4_ib >> ib_mthca >> ib_uverbs >> ib_umad >> ib_sa >> ib_cm >> ib_mad >> ib_core >> iw_cxgb3 >> >> [root at host OFED-1.4-20081209-0926]# ifup ib0 >> ib_ipoib device ib0 does not seem to be present, delaying >> initialization. >> >> > You need the ib_ipoib module to be able to use ipoib. > Did you put the right option (--with_ipoib_mod or something like this) > when running ./configure for ofa_kernel? > Try to run modprobe ib_ipoib. > If it works then, add ipoib to your /etc/infiniband/openib.conf so > it'll be loaded by openibd. > If it doesn't, so check syslog, dmesg and other logs to see it it was > an error or simply the file is missing > If it's missing, recompile your ofa_kernel with the ipoib module. > > Nicolas > From rdreier at cisco.com Fri Dec 12 11:33:57 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 12 Dec 2008 11:33:57 -0800 Subject: [ofa-general] [PATCH] cma_zero_addr In-Reply-To: <1228639669.3833.10.camel@alst60.voltaire.com> (Aleksey Senin's message of "Sun, 07 Dec 2008 10:47:49 +0200") References: <1228222680.14862.13.camel@alst60.voltaire.com> <1228639669.3833.10.camel@alst60.voltaire.com> Message-ID: > PATCHv6 is the latest version. Should we use it? > http://lists.openfabrics.org/pipermail/general/2008-December/055727.html > And it could be nice if cma_zero_addr patch, will be accepted too. OK, thanks. What is the final version of the cma_zero_addr() change? I do agree it makes sense it makes sense to use ipv6_addr_any() instead of duplicating the code. Have you and Sean agreed on it? - R. From sean.hefty at intel.com Fri Dec 12 11:54:21 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 12 Dec 2008 11:54:21 -0800 Subject: [ofa-general] [PATCH] cma_zero_addr In-Reply-To: References: <1228222680.14862.13.camel@alst60.voltaire.com> <1228639669.3833.10.camel@alst60.voltaire.com> Message-ID: <000001c95c93$6d1f8560$1e58180a@amr.corp.intel.com> > > PATCHv6 is the latest version. Should we use it? > > > http://lists.openfabrics.org/pipermail/general/2008-December/055727.html > > > And it could be nice if cma_zero_addr patch, will be accepted too. > >OK, thanks. What is the final version of the cma_zero_addr() change? I >do agree it makes sense it makes sense to use ipv6_addr_any() instead of >duplicating the code. Have you and Sean agreed on it? I've acked his patch. - Sean From chien.tin.tung at intel.com Fri Dec 12 12:46:13 2008 From: chien.tin.tung at intel.com (Chien Tung) Date: Fri, 12 Dec 2008 14:46:13 -0600 Subject: [ofa-general] [PATCH 03/10 v2] RDMA/nes: Remove tx_free_list Message-ID: <20081212204613.GA6760@ctung-MOBL> From: Faisal Latif There is no lock protecting tx_free_list thus causing a system crash when skb_dequeue() is called and the list is empty. Since it did not give any performance boost under heavy load, removing it to simplfy the code. Replace get_free_pkt() with dev_alloc_skb to allocate MAX_CM_BUFFER skb for connection establishment/teardown as well as MPA request/response. Signed-off-by: Faisal Latif Signed-off-by: Chien Tung --- v2 change: * remove get_free_pkt() since it is left with dev_alloc_skb(). Roland, This should apply nicely to your for-next branch. If you see any other errors with formatting, please let me know. drivers/infiniband/hw/nes/nes_cm.c | 76 ++---------------------------------- drivers/infiniband/hw/nes/nes_cm.h | 5 +- 2 files changed, 6 insertions(+), 75 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index aa373c5..cb48041 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -94,7 +94,6 @@ static int mini_cm_set(struct nes_cm_core *, u32, u32); static void form_cm_frame(struct sk_buff *, struct nes_cm_node *, void *, u32, void *, u32, u8); -static struct sk_buff *get_free_pkt(struct nes_cm_node *cm_node); static int add_ref_cm_node(struct nes_cm_node *); static int rem_ref_cm_node(struct nes_cm_core *, struct nes_cm_node *); @@ -355,7 +354,6 @@ static void print_core(struct nes_cm_core *core) nes_debug(NES_DBG_CM, "State : %u \n", core->state); - nes_debug(NES_DBG_CM, "Tx Free cnt : %u \n", skb_queue_len(&core->tx_free_list)); nes_debug(NES_DBG_CM, "Listen Nodes : %u \n", atomic_read(&core->listen_node_cnt)); nes_debug(NES_DBG_CM, "Active Nodes : %u \n", atomic_read(&core->node_cnt)); @@ -688,7 +686,7 @@ static int send_syn(struct nes_cm_node *cm_node, u32 sendack, optionssize += 1; if (!skb) - skb = get_free_pkt(cm_node); + skb = dev_alloc_skb(MAX_CM_BUFFER); if (!skb) { nes_debug(NES_DBG_CM, "Failed to get a Free pkt\n"); return -1; @@ -713,7 +711,7 @@ static int send_reset(struct nes_cm_node *cm_node, struct sk_buff *skb) int flags = SET_RST | SET_ACK; if (!skb) - skb = get_free_pkt(cm_node); + skb = dev_alloc_skb(MAX_CM_BUFFER); if (!skb) { nes_debug(NES_DBG_CM, "Failed to get a Free pkt\n"); return -1; @@ -734,7 +732,7 @@ static int send_ack(struct nes_cm_node *cm_node, struct sk_buff *skb) int ret; if (!skb) - skb = get_free_pkt(cm_node); + skb = dev_alloc_skb(MAX_CM_BUFFER); if (!skb) { nes_debug(NES_DBG_CM, "Failed to get a Free pkt\n"); @@ -757,7 +755,7 @@ static int send_fin(struct nes_cm_node *cm_node, struct sk_buff *skb) /* if we didn't get a frame get one */ if (!skb) - skb = get_free_pkt(cm_node); + skb = dev_alloc_skb(MAX_CM_BUFFER); if (!skb) { nes_debug(NES_DBG_CM, "Failed to get a Free pkt\n"); @@ -772,59 +770,15 @@ static int send_fin(struct nes_cm_node *cm_node, struct sk_buff *skb) /** - * get_free_pkt - */ -static struct sk_buff *get_free_pkt(struct nes_cm_node *cm_node) -{ - struct sk_buff *skb, *new_skb; - - /* check to see if we need to repopulate the free tx pkt queue */ - if (skb_queue_len(&cm_node->cm_core->tx_free_list) < NES_CM_FREE_PKT_LO_WATERMARK) { - while (skb_queue_len(&cm_node->cm_core->tx_free_list) < - cm_node->cm_core->free_tx_pkt_max) { - /* replace the frame we took, we won't get it back */ - new_skb = dev_alloc_skb(cm_node->cm_core->mtu); - BUG_ON(!new_skb); - /* add a replacement frame to the free tx list head */ - skb_queue_head(&cm_node->cm_core->tx_free_list, new_skb); - } - } - - skb = skb_dequeue(&cm_node->cm_core->tx_free_list); - - return skb; -} - - -/** - * make_hashkey - generate hash key from node tuple - */ -static inline int make_hashkey(u16 loc_port, nes_addr_t loc_addr, u16 rem_port, - nes_addr_t rem_addr) -{ - u32 hashkey = 0; - - hashkey = loc_addr + rem_addr + loc_port + rem_port; - hashkey = (hashkey % NES_CM_HASHTABLE_SIZE); - - return hashkey; -} - - -/** * find_node - find a cm node that matches the reference cm node */ static struct nes_cm_node *find_node(struct nes_cm_core *cm_core, u16 rem_port, nes_addr_t rem_addr, u16 loc_port, nes_addr_t loc_addr) { unsigned long flags; - u32 hashkey; struct list_head *hte; struct nes_cm_node *cm_node; - /* make a hash index key for this packet */ - hashkey = make_hashkey(loc_port, loc_addr, rem_port, rem_addr); - /* get a handle on the hte */ hte = &cm_core->connected_nodes; @@ -892,7 +846,6 @@ static struct nes_cm_listener *find_listener(struct nes_cm_core *cm_core, static int add_hte_node(struct nes_cm_core *cm_core, struct nes_cm_node *cm_node) { unsigned long flags; - u32 hashkey; struct list_head *hte; if (!cm_node || !cm_core) @@ -901,11 +854,6 @@ static int add_hte_node(struct nes_cm_core *cm_core, struct nes_cm_node *cm_node nes_debug(NES_DBG_CM, "Adding Node %p to Active Connection HT\n", cm_node); - /* first, make an index into our hash table */ - hashkey = make_hashkey(cm_node->loc_port, cm_node->loc_addr, - cm_node->rem_port, cm_node->rem_addr); - cm_node->hashkey = hashkey; - spin_lock_irqsave(&cm_core->ht_lock, flags); /* get a handle on the hash table element (list head for this slot) */ @@ -2198,10 +2146,7 @@ static int mini_cm_recv_pkt(struct nes_cm_core *cm_core, */ static struct nes_cm_core *nes_cm_alloc_core(void) { - int i; - struct nes_cm_core *cm_core; - struct sk_buff *skb = NULL; /* setup the CM core */ /* alloc top level core control structure */ @@ -2219,19 +2164,6 @@ static struct nes_cm_core *nes_cm_alloc_core(void) atomic_set(&cm_core->events_posted, 0); - /* init the packet lists */ - skb_queue_head_init(&cm_core->tx_free_list); - - for (i = 0; i < NES_CM_DEFAULT_FRAME_CNT; i++) { - skb = dev_alloc_skb(cm_core->mtu); - if (!skb) { - kfree(cm_core); - return NULL; - } - /* add 'raw' skb to free frame list */ - skb_queue_head(&cm_core->tx_free_list, skb); - } - cm_core->api = &nes_cm_api; spin_lock_init(&cm_core->ht_lock); diff --git a/drivers/infiniband/hw/nes/nes_cm.h b/drivers/infiniband/hw/nes/nes_cm.h index 3a20a78..fafa350 100644 --- a/drivers/infiniband/hw/nes/nes_cm.h +++ b/drivers/infiniband/hw/nes/nes_cm.h @@ -165,6 +165,8 @@ struct nes_timer_entry { #define NES_CM_DEF_SEQ2 0x18ed5740 #define NES_CM_DEF_LOCAL_ID2 0xb807 +#define MAX_CM_BUFFER 512 + typedef u32 nes_addr_t; @@ -258,8 +260,6 @@ struct nes_cm_listener { /* per connection node and node state information */ struct nes_cm_node { - u32 hashkey; - nes_addr_t loc_addr, rem_addr; u16 loc_port, rem_port; @@ -357,7 +357,6 @@ struct nes_cm_core { u32 mtu; u32 free_tx_pkt_max; u32 rx_pkt_posted; - struct sk_buff_head tx_free_list; atomic_t ht_node_cnt; struct list_head connected_nodes; /* struct list_head hashtable[NES_CM_HASHTABLE_SIZE]; */ -- 1.5.3.3 From devel at morey-chaisemartin.com Fri Dec 12 14:37:34 2008 From: devel at morey-chaisemartin.com (Nicolas Morey-Chaisemartin) Date: Fri, 12 Dec 2008 23:37:34 +0100 Subject: [ofa-general] ipoib device not loading? In-Reply-To: <4942BB8D.4030800@ncsa.uiuc.edu> References: <49418262.3070801@ncsa.uiuc.edu> <4942700F.7080401@ext.bull.net> <4942BB8D.4030800@ncsa.uiuc.edu> Message-ID: <4942E7AE.3090104@morey-chaisemartin.com> Yes there is, I had the same problem last month. I'm just not sure what it is the option you need. I'm not at work so I can't check what I did. Anyway there is a CONFIG_IPV6 and CONFIG_IPV6_MODULE used in a #ifdef in drivers/infiniband/ulp/ipoib_cm.h I guess you could set some flag to force them undefined but there should and probably is a cleaner way. Nicolas Jeremy Enos a écrit : > Ah.. thanks. (when I installed, I just told it to install all) > > [root at host jenos]# modprobe ib_ipoib > FATAL: Error inserting ib_ipoib > (/lib/modules/2.6.27.5-41.fc9.x86_64/updates/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko): > Unknown symbol in module, or unknown parameter (see dmesg) > > Then dmesg shows this line: > ib_ipoib: Unknown symbol icmpv6_send > > I have IPv6 disabled on this host for other reasons. Perhaps that's > causing the problem? If so, is there a way to build w/o IPv6 > requirements? > thx- > > Jeremy > > Nicolas Morey Chaisemartin wrote: >> Jeremy Enos wrote: >>> >>> [root at host OFED-1.4-20081209-0926]# service openibd status >>> >>> HCA driver loaded >>> >>> >>> The following OFED modules are loaded: >>> >>> ib_ipath >>> mlx4_core >>> mlx4_ib >>> ib_mthca >>> ib_uverbs >>> ib_umad >>> ib_sa >>> ib_cm >>> ib_mad >>> ib_core >>> iw_cxgb3 >>> >>> [root at host OFED-1.4-20081209-0926]# ifup ib0 >>> ib_ipoib device ib0 does not seem to be present, delaying >>> initialization. >>> >>> >> You need the ib_ipoib module to be able to use ipoib. >> Did you put the right option (--with_ipoib_mod or something like >> this) when running ./configure for ofa_kernel? >> Try to run modprobe ib_ipoib. >> If it works then, add ipoib to your /etc/infiniband/openib.conf so >> it'll be loaded by openibd. >> If it doesn't, so check syslog, dmesg and other logs to see it it was >> an error or simply the file is missing >> If it's missing, recompile your ofa_kernel with the ipoib module. >> >> Nicolas >> > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > From sean.hefty at intel.com Fri Dec 12 16:38:11 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 12 Dec 2008 16:38:11 -0800 Subject: [ofa-general] porting IB management code to Windows In-Reply-To: <20081211203649.GO31451@obsidianresearch.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081211203649.GO31451@obsidianresearch.com> Message-ID: <000401c95cbb$141c5290$1e58180a@amr.corp.intel.com> >Just to chime in here with some past experience.. Is there any way it >would be acceptable to use gcc (or even the Intel compiler) as the >mandatory Windows C compiler? That would save everyone alot of >ongoing hassle. MS does not maintain the C compiler portion of VC++ >and it is very old standards wise, half your changes in this patch are >due to it not supporting C99. I installed the Intel compiler (version 11.0.066) and tried using that within the WDK build environment to build just sminfo. The good news is that sminfo did build within the WDK environment and run. The bad news is that every change to sminfo.c that was posted was still needed by the Intel compiler, plus it required a couple of other changes as well. :( I didn't spend any time looking into the compile issues, so I don't know if changing the build environment would eliminate some of the changes. I also did not try using gcc on Windows. (Btw, I think we can fix the const issue.) I would like to avoid the other changes, but it's not looking like it will happen. >So, really what you are proposing is to abandon all modern C >constructs in the offical source tree :| Some of this is acutally >harmful run-time wise (like removing const on the static variables) >and harmful maintenance wise (removing C99 named initalizers) What I'm really proposing is that the IB management utilities package support both Linux and Windows. The alternative is to have independent packages with separate source code bases. And unless there's a way to eliminate the changes, they'll be there. I just don't know where there is yet. Btw, Arlin can provide more details on the other required changes. We only have a few of the diags ported at this time (i.e. the easiest ones to port). - Sean From weiny2 at llnl.gov Fri Dec 12 17:00:03 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Fri, 12 Dec 2008 17:00:03 -0800 Subject: [ofa-general] [PATCH] opensm/opensm/osm_console.c: move reporting of plugins to "status" command. Message-ID: <20081212170003.73e8e2ff.weiny2@llnl.gov> >From 97b0a66b8e7a4ce16e5d7a10f48c08c9663d2d5c Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Fri, 12 Dec 2008 16:57:12 -0800 Subject: [PATCH] opensm/opensm/osm_console.c: move reporting of plugins to "status" command. Since plugins are now generic, it does not make sence to have them printed under the perfmgr status command. Signed-off-by: Ira Weiny --- opensm/opensm/osm_console.c | 33 ++++++++++++++++++++------------- 1 files changed, 20 insertions(+), 13 deletions(-) diff --git a/opensm/opensm/osm_console.c b/opensm/opensm/osm_console.c index 5727cea..c6e8e59 100644 --- a/opensm/opensm/osm_console.c +++ b/opensm/opensm/osm_console.c @@ -324,16 +324,31 @@ static char *sa_state_str(osm_sa_state_t state) static void print_status(osm_opensm_t * p_osm, FILE * out) { + cl_list_item_t *item; + if (out) { cl_plock_acquire(&p_osm->lock); - fprintf(out, " OpenSM Version: %s\n", p_osm->osm_version); - fprintf(out, " SM State : %s\n", + fprintf(out, " OpenSM Version : %s\n", p_osm->osm_version); + fprintf(out, " SM State : %s\n", sm_state_str(p_osm->subn.sm_state)); - fprintf(out, " SA State : %s\n", + fprintf(out, " SA State : %s\n", sa_state_str(p_osm->sa.state)); - fprintf(out, " Routing Engine: %s\n", + fprintf(out, " Routing Engine : %s\n", osm_routing_engine_type_str(p_osm-> routing_engine_used)); + + fprintf(out, " Loaded event plugins :"); + if (cl_qlist_head(&p_osm->plugin_list) == + cl_qlist_end(&p_osm->plugin_list)) { + fprintf(out, " "); + } + for (item = cl_qlist_head(&p_osm->plugin_list); + item != cl_qlist_end(&p_osm->plugin_list); + item = cl_qlist_next(item)) + fprintf(out, " %s", + ((osm_epi_plugin_t *)item)->plugin_name); + fprintf(out, "\n"); + #ifdef ENABLE_OSM_PERF_MGR fprintf(out, "\n PerfMgr state/sweep state : %s/%s\n", osm_perfmgr_get_state_str(&(p_osm->perfmgr)), @@ -1128,24 +1143,16 @@ static void perfmgr_parse(char **p_last, osm_opensm_t * p_osm, FILE * out) fprintf(out, "\"%s\" option not found\n", p_cmd); } } else { - cl_list_item_t *item; fprintf(out, "Performance Manager status:\n" "state : %s\n" "sweep state : %s\n" "sweep time : %us\n" - "outstanding queries/max : %d/%u\n" - "loaded event plugin :", + "outstanding queries/max : %d/%u\n", osm_perfmgr_get_state_str(&(p_osm->perfmgr)), osm_perfmgr_get_sweep_state_str(&(p_osm->perfmgr)), osm_perfmgr_get_sweep_time_s(&(p_osm->perfmgr)), p_osm->perfmgr.outstanding_queries, p_osm->perfmgr.max_outstanding_queries); - for (item = cl_qlist_head(&p_osm->plugin_list); - item != cl_qlist_end(&p_osm->plugin_list); - item = cl_qlist_next(item)) - fprintf(out, " %s", - ((osm_epi_plugin_t *)item)->plugin_name); - fprintf(out, "\n"); } } #endif /* ENABLE_OSM_PERF_MGR */ -- 1.5.4.5 From jgunthorpe at obsidianresearch.com Fri Dec 12 17:34:03 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Fri, 12 Dec 2008 18:34:03 -0700 Subject: [ofa-general] porting IB management code to Windows In-Reply-To: <000401c95cbb$141c5290$1e58180a@amr.corp.intel.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081211203649.GO31451@obsidianresearch.com> <000401c95cbb$141c5290$1e58180a@amr.corp.intel.com> Message-ID: <20081213013403.GM30561@obsidianresearch.com> On Fri, Dec 12, 2008 at 04:38:11PM -0800, Sean Hefty wrote: > I installed the Intel compiler (version 11.0.066) and tried using that within > the WDK build environment to build just sminfo. The good news is that sminfo > did build within the WDK environment and run. The bad news is that every change > to sminfo.c that was posted was still needed by the Intel compiler, plus it > required a couple of other changes as well. :( Well, I'm not sure what is up with the const thing since thats bog standard C89 even, but the structure initializers and the Intel compiler are a matter of using the GCC extension vs C99: - [SMINFO_MASTER] "SMINFO_MASTER", + [SMINFO_MASTER] = "SMINFO_MASTER", And similarly anything using the 'field: value' should be '.field = value' for C99. (make sure C99 support is turned on too!) Compiling with -std=c99 on Linux will catch several of these issues, and purging them agressively is generally a good idea. Actually -std=c99 -D_XOPEN_SOURCE=600 is good option set for portable code written to modern standards (ie SUSv3 and C99) as it purges some of the uncommon and BSD calls from the library headers and tends to keep things cleaner. (Though sometimes you need to use _GNU_SOURCE on some files to access some special functions :() But it is all kind of moot if you are attempting to compile without some POSIX API emulation layer for Windows (SFU, cygwin, etc). That makes things extra hard, and I'm not sure it is worthwhile for this particular application. :) Ie if you are willing to use MS's SFU you get gcc and a POSIX compatibility library as part of the SDK install and alot of problems go away. Jason From vlad at lists.openfabrics.org Sat Dec 13 03:17:48 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 13 Dec 2008 03:17:48 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081213-0200 daily build status Message-ID: <20081213111748.AD719E60C8D@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From sashak at voltaire.com Sat Dec 13 05:50:51 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 15:50:51 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_inform.c report IB traps to plugin In-Reply-To: <493E3AE5.5000604@gmail.com> References: <493CEBBE.2020407@gmail.com> <20081208200217.GB13924@sashak.voltaire.com> <493E3AE5.5000604@gmail.com> Message-ID: <20081213135051.GP15622@sashak.voltaire.com> On 11:31 Tue 09 Dec , Eli Dorfman wrote: > Sasha Khapyorsky wrote: > > Hi Eli, > > > > On 11:41 Mon 08 Dec , Eli Dorfman wrote: > >> report IB traps to plugin > >> > >> Signed-off-by: Eli Dorfman > >> --- > >> opensm/opensm/osm_inform.c | 4 +++- > >> 1 files changed, 3 insertions(+), 1 deletions(-) > >> > >> diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c > >> index f3c8ed7..bb16e3a 100644 > >> --- a/opensm/opensm/osm_inform.c > >> +++ b/opensm/opensm/osm_inform.c > >> @@ -565,7 +565,8 @@ osm_report_notice(IN osm_log_t * const p_log, > >> } > >> > >> /* an official Event information log */ > >> - if (ib_notice_is_generic(p_ntc)) > >> + if (ib_notice_is_generic(p_ntc)) { > >> + osm_opensm_report_event(p_subn->p_osm, OSM_EVENT_ID_TRAP, p_ntc); > >> OSM_LOG(p_log, OSM_LOG_INFO, > >> "Reporting Generic Notice type:%u num:%u (%s)" > >> " from LID:%u GID:%s\n", > >> @@ -575,6 +576,7 @@ osm_report_notice(IN osm_log_t * const p_log, > >> cl_ntoh16(p_ntc->issuer_lid), > >> inet_ntop(AF_INET6, p_ntc->issuer_gid.raw, gid_str, > >> sizeof gid_str)); > >> + } > > > > Did you mean to have it osm_report_notice()? Actually it is where OpenSM > > sends notices, not where OpenSM gets traps. Trap receiver processor is > > located in osm_trap_rcv.c. > > Yes that's what i meant. > When OpenSM receives traps it calls osm_report_notice(). > It is also call for OpenSM initiated traps (e.g. GID IN/OUT and MC CREATE/DELETE). Ok. I see your point. Then why it should be limited by generic notice types? Also wouldn't it be better to call plugin report callback after notice was actually processed (eg. at end of this function)? Sasha From sashak at voltaire.com Sat Dec 13 10:04:58 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 20:04:58 +0200 Subject: [ofa-general] [ANNOUNCE] management tarballs release Message-ID: <20081213180458.GQ15622@sashak.voltaire.com> Hi, There is a new release of the management (OpenSM and infiniband diagnostics) tarballs available in: http://www.openfabrics.org/downloads/management/ md5sum: f83686006d5313b816a5264d41736559 libibcommon-1.2.0.tar.gz 97df518730e43afc2934c9d62bbfcde2 libibumad-1.3.0.tar.gz e27deeaa2c5409ddda32ff6d19ac9420 libibmad-1.3.0.tar.gz 5c67b1735a8dd9632e2cde648294f583 opensm-3.3.0.tar.gz a4aa5e4cfcbcae9d6e67a27b5153c880 infiniband-diags-1.5.0.tar.gz All component versions are from recent master branch. Full change log is below (actually the only change is versions update). Sasha Sasha Khapyorsky (1): management: bump all package versions From sashak at voltaire.com Sat Dec 13 10:06:32 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 20:06:32 +0200 Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_console.c: move reporting of plugins to "status" command. In-Reply-To: <20081212170003.73e8e2ff.weiny2@llnl.gov> References: <20081212170003.73e8e2ff.weiny2@llnl.gov> Message-ID: <20081213180632.GR15622@sashak.voltaire.com> On 17:00 Fri 12 Dec , Ira Weiny wrote: > From 97b0a66b8e7a4ce16e5d7a10f48c08c9663d2d5c Mon Sep 17 00:00:00 2001 > From: Ira Weiny > Date: Fri, 12 Dec 2008 16:57:12 -0800 > Subject: [PATCH] opensm/opensm/osm_console.c: move reporting of plugins to "status" command. > > Since plugins are now generic, it does not make sence to have them printed > under the perfmgr status command. > > Signed-off-by: Ira Weiny Applied. Thanks. Sasha From sashak at voltaire.com Sat Dec 13 10:59:02 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 20:59:02 +0200 Subject: [ofa-general] Re: [PATCH] [5 of 10] [REVISED] mesh analysis - local geometry In-Reply-To: <008d01c95af5$0870d730$19528590$@com> References: <008d01c95af5$0870d730$19528590$@com> Message-ID: <20081213185852.GS15622@sashak.voltaire.com> Hi Bob, On 12:28 Wed 10 Dec , Robert Pearson wrote: > > Here is a revised mesh patch #5 that incorporates changes based on your > comments. There is no patch. Did you forget to include it? Sasha From sashak at voltaire.com Sat Dec 13 11:51:51 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 21:51:51 +0200 Subject: [ofa-general] Re: [PATCH] [3 of 10] [REVISED] mesh analysis - node and link structures In-Reply-To: <007301c95af2$d4fff720$7effe560$@com> References: <007301c95af2$d4fff720$7effe560$@com> Message-ID: <20081213195151.GT15622@sashak.voltaire.com> On 12:12 Wed 10 Dec , Robert Pearson wrote: > Sasha, > > > > Here is a revised mesh patch #3 that incorporates changes based on your > comments. This patch adds calls to osm_mesh_node_create/delete(), but not the funtions, so OpenSM fails to link due to unresolved symbols. Sasha From sashak at voltaire.com Sat Dec 13 12:30:14 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 22:30:14 +0200 Subject: [ofa-general] Re: porting IB management code to Windows In-Reply-To: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> Message-ID: <20081213203014.GU15622@sashak.voltaire.com> Hi Sean, On 11:18 Thu 11 Dec , Sean Hefty wrote: > > We've started porting the IB management code (IB-diags at this point) to > Windows. My strong preference is to avoid branching the code and instead keep a > single source code tree. Is there any objection to accepting changes against > the management tree to allow the code to run on both Linux and Windows? Basically I have no objections against porting changes. And I also would prefer to keep a single code base. However, I would prefer to minimize amount of needed changes and would really prefer to not get a lot of limitations in using modern C. I will comment inline in the patch example below. > (We can > figure out the logistics of build related files later. I'm most concerned about > the code itself.) > > The patch below gives an example of the changes needed to make this happen. > Most are a result of compiler differences. > > - Sean > > --- infiniband-diags-1.4.2\src\sminfo.c 2008-10-19 11:34:42.000000000 -0700 > +++ scm\winof\branches\winverbs\tools\infiniband_diags\src\sminfo.c > 2008-12-10 15:06:01.096000000 -0800 > @@ -37,12 +37,19 @@ > > #include > #include > + > +#if defined(_WIN32) || defined(_WIN64) > +#include > +#include > +#include "..\..\..\..\etc\user\getopt.c" > +#include "..\ibdiag_common.c" > +#else > #include > #include > #include > #include > +#endif Could such ugly header mess be eliminated? I'm not familiar with windows environment, but would expect that headers like exist there (although I may be wrong about it). Of course some header file may be missing, this is not so bad - you could add one somewhere under WinOF tree in the include path, then something like: winof/include/path/getopt.h: #ifndef WINOF_GETOPT_H #define WINOF_GETOPT_H #include "..\..\..\..\etc\user\getopt.c" #endif could resolve the problem. And similar with another header files (also AFAIK WinOF is not using autotools, so file config.h could be also good place for various wrappers). > -#include > #include > #include > > @@ -72,13 +79,13 @@ enum { > }; > > char *statestr[] = { > - [SMINFO_NOTACT] "SMINFO_NOTACT", > - [SMINFO_DISCOVER] "SMINFO_DISCOVER", > - [SMINFO_STANDBY] "SMINFO_STANDBY", > - [SMINFO_MASTER] "SMINFO_MASTER", > + "SMINFO_NOTACT", > + "SMINFO_DISCOVER", > + "SMINFO_STANDBY", > + "SMINFO_MASTER", > }; Could VC++ understand C99 like initializations (maybe with using some flags)? I would really prefer to use something like this. > > -#define STATESTR(s) (((uint)(s)) < SMINFO_STATE_LAST ? statestr[s] : "???") > +#define STATESTR(s) (((unsigned int)(s)) < SMINFO_STATE_LAST ? statestr[s] : > "???") > > int > main(int argc, char **argv) > @@ -88,7 +95,7 @@ main(int argc, char **argv) > ib_portid_t portid = {0}; > int timeout = 0; /* use default */ > uint8_t *p; > - uint act = 0; > + unsigned int act = 0; All 'uint' -> 'unsigned int' conversions seem fine for me (I think we need to do this even w/out connection to WinOF porting issue). > int prio = 0, state = SMINFO_STANDBY; > uint64_t guid = 0, key = 0; > extern int ibdebug; > @@ -97,8 +104,8 @@ main(int argc, char **argv) > char *ca = 0; > int ca_port = 0; > > - static char const str_opts[] = "C:P:t:s:p:a:deDGVhu"; > - static const struct option long_opts[] = { > + static char str_opts[] = "C:P:t:s:p:a:deDGVhu"; > + static struct option long_opts[] = { I saw in your another email that 'const' issue could be solved (worst case it could be masked in WinOF config.h - #define const ). Right? > { "C", 1, 0, 'C'}, > { "P", 1, 0, 'P'}, > { "debug", 0, 0, 'd'}, > @@ -112,7 +119,7 @@ main(int argc, char **argv) > { "timeout", 1, 0, 't'}, > { "help", 0, 0, 'h'}, > { "usage", 0, 0, 'u'}, > - { } > + { 0 } Could VC be learned with some flags to understand {}? Basically we could except such change, but it will be hard to remember to follow this rule on linux side :) > }; > > argv0 = argv[0]; > @@ -188,7 +195,7 @@ main(int argc, char **argv) > > if (mod) { > if (!(p = smp_set(sminfo, &portid, IB_ATTR_SMINFO, mod, > timeout))) > - IBERROR("query"); > + IBERROR("set"); This is fine (and guess is not related to porting issue :)) Sasha > } else > if (!(p = smp_query(sminfo, &portid, IB_ATTR_SMINFO, 0, > timeout))) > IBERROR("query"); > > From sashak at voltaire.com Sat Dec 13 12:34:23 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 22:34:23 +0200 Subject: [ofa-general] porting IB management code to Windows In-Reply-To: <20081213013403.GM30561@obsidianresearch.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081211203649.GO31451@obsidianresearch.com> <000401c95cbb$141c5290$1e58180a@amr.corp.intel.com> <20081213013403.GM30561@obsidianresearch.com> Message-ID: <20081213203423.GV15622@sashak.voltaire.com> Hi Jason, On 18:34 Fri 12 Dec , Jason Gunthorpe wrote: > On Fri, Dec 12, 2008 at 04:38:11PM -0800, Sean Hefty wrote: > > > I installed the Intel compiler (version 11.0.066) and tried using that within > > the WDK build environment to build just sminfo. The good news is that sminfo > > did build within the WDK environment and run. The bad news is that every change > > to sminfo.c that was posted was still needed by the Intel compiler, plus it > > required a couple of other changes as well. :( > > Well, I'm not sure what is up with the const thing since thats bog > standard C89 even, but the structure initializers and the Intel > compiler are a matter of using the GCC extension vs C99: > > - [SMINFO_MASTER] "SMINFO_MASTER", > + [SMINFO_MASTER] = "SMINFO_MASTER", > > And similarly anything using the 'field: value' should be '.field = > value' for C99. (make sure C99 support is turned on too!) > > Compiling with -std=c99 on Linux will catch several of these issues, > and purging them agressively is generally a good idea. I agree, it would be a reasonable requirement. > Actually -std=c99 -D_XOPEN_SOURCE=600 is good option set for portable > code written to modern standards (ie SUSv3 and C99) as it purges some > of the uncommon and BSD calls from the library headers and tends to > keep things cleaner. (Though sometimes you need to use _GNU_SOURCE on > some files to access some special functions :() > > But it is all kind of moot if you are attempting to compile without > some POSIX API emulation layer for Windows (SFU, cygwin, etc). > That makes things extra hard, and I'm not sure it is worthwhile for > this particular application. :) > > Ie if you are willing to use MS's SFU you get gcc and a POSIX > compatibility library as part of the SDK install and alot of > problems go away. This could be an interesting option too. In particular I remember a lot of compatibility issues with using pthread library in OpenSM. Thanks for a good ideas. Sasha From sashak at voltaire.com Sat Dec 13 12:37:53 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 22:37:53 +0200 Subject: [ofa-general] [PATCH] opensm/osm_sm.c: fix MC group creation in race condition Message-ID: <20081213203753.GW15622@sashak.voltaire.com> In case of a race condition when MC group was deleted during creation and it is detected in osm_sm_mcgrp_join() don't create new group - it will be empty and invalid anyway, just return an error - similar to port join/leave race condition handling. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_sm.c | 34 +++++++++------------------------- 1 files changed, 9 insertions(+), 25 deletions(-) diff --git a/opensm/opensm/osm_sm.c b/opensm/opensm/osm_sm.c index efebf4a..649ff2a 100644 --- a/opensm/opensm/osm_sm.c +++ b/opensm/opensm/osm_sm.c @@ -535,36 +535,20 @@ osm_sm_mcgrp_join(IN osm_sm_t * const p_sm, * If this multicast group does not already exist, create it. */ p_mgrp = osm_get_mgrp_by_mlid(p_sm->p_subn, mlid); - if (!p_mgrp) { - OSM_LOG(p_sm->p_log, OSM_LOG_VERBOSE, - "Creating group, MLID 0x%X\n", cl_ntoh16(mlid)); - - p_mgrp = osm_mgrp_new(mlid); - if (p_mgrp == NULL) { - CL_PLOCK_RELEASE(p_sm->p_lock); - OSM_LOG(p_sm->p_log, OSM_LOG_ERROR, "ERR 2E06: " - "Unable to allocate multicast group object\n"); - status = IB_INSUFFICIENT_MEMORY; - goto Exit; - } - - p_sm->p_subn->mgroups[cl_ntoh16(mlid) - IB_LID_MCAST_START_HO] = p_mgrp; - } else { + if (!p_mgrp || !osm_mgrp_is_guid(p_mgrp, port_guid)) { /* - * The group already exists. If the port is not a + * The group removed or the port is not a * member of the group, then fail immediately. * This can happen since the spinlock is released briefly * before the SA calls this function. */ - if (!osm_mgrp_is_guid(p_mgrp, port_guid)) { - CL_PLOCK_RELEASE(p_sm->p_lock); - OSM_LOG(p_sm->p_log, OSM_LOG_ERROR, "ERR 2E12: " - "Port 0x%016" PRIx64 - " not in mcast group 0x%X\n", - cl_ntoh64(port_guid), cl_ntoh16(mlid)); - status = IB_NOT_FOUND; - goto Exit; - } + CL_PLOCK_RELEASE(p_sm->p_lock); + OSM_LOG(p_sm->p_log, OSM_LOG_ERROR, "ERR 2E12: " + "MC group with mlid 0x%x doesn't exist or " + "port 0x%016" PRIx64 " is not in the group.\n", + cl_ntoh16(mlid), cl_ntoh64(port_guid)); + status = IB_NOT_FOUND; + goto Exit; } /* -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 13 12:38:52 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 22:38:52 +0200 Subject: [ofa-general] [PATCH] opensm/osm_sa_mcmember_record: improve __cleanup_mgrp() Message-ID: <20081213203852.GX15622@sashak.voltaire.com> Improve __cleanup_mgrp() function it is called from context where mgrp pointer is known and additional by mlid resolving is not needed. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_sa_mcmember_record.c | 26 ++++++++++---------------- 1 files changed, 10 insertions(+), 16 deletions(-) diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index 99aee1b..4561808 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -142,16 +142,13 @@ static ib_net16_t __get_new_mlid(osm_sa_t *sa, ib_net16_t requested_mlid) we silently drop it. Since it was an intermediate group no need to re-route it. **********************************************************************/ -static void __cleanup_mgrp(IN osm_sa_t * sa, IN ib_net16_t const mlid) +static void __cleanup_mgrp(IN osm_sa_t * sa, osm_mgrp_t *mgrp) { - osm_mgrp_t *p_mgrp = osm_get_mgrp_by_mlid(sa->p_subn, mlid); - /* Remove MGRP only if osm_mcm_port_t count is 0 and not a well known group */ - if (p_mgrp && cl_is_qmap_empty(&p_mgrp->mcm_port_tbl) && - p_mgrp->well_known == FALSE) { - sa->p_subn->mgroups[cl_ntoh16(mlid) - IB_LID_MCAST_START_HO] = NULL; - osm_mgrp_delete(p_mgrp); + if (cl_is_qmap_empty(&mgrp->mcm_port_tbl) && !mgrp->well_known) { + sa->p_subn->mgroups[cl_ntoh16(mgrp->mlid) - IB_LID_MCAST_START_HO] = NULL; + osm_mgrp_delete(mgrp); } } @@ -1273,7 +1270,7 @@ __osm_mcmr_rcv_join_mgrp(IN osm_sa_t * sa, IN osm_madw_t * const p_madw) || !__validate_port_caps(sa->p_log, p_mgrp, p_physp) || !(join_state != 0)) { /* since we might have created the new group we need to cleanup */ - __cleanup_mgrp(sa, mlid); + __cleanup_mgrp(sa, p_mgrp); CL_PLOCK_RELEASE(sa->p_lock); @@ -1312,7 +1309,7 @@ __osm_mcmr_rcv_join_mgrp(IN osm_sa_t * sa, IN osm_madw_t * const p_madw) if (status != IB_SUCCESS) { /* we fail to add the port so we might need to delete the group */ - __cleanup_mgrp(sa, mlid); + __cleanup_mgrp(sa, p_mgrp); CL_PLOCK_RELEASE(sa->p_lock); @@ -1345,13 +1342,10 @@ __osm_mcmr_rcv_join_mgrp(IN osm_sa_t * sa, IN osm_madw_t * const p_madw) CL_PLOCK_EXCL_ACQUIRE(sa->p_lock); /* the request for routing failed so we need to remove the port */ - p_mgrp = osm_get_mgrp_by_mlid(sa->p_subn, mlid); - if (p_mgrp != NULL) { - osm_mgrp_delete_port(sa->p_subn, sa->p_log, p_mgrp, - p_recvd_mcmember_rec->port_gid. - unicast.interface_id); - __cleanup_mgrp(sa, mlid); - } + osm_mgrp_delete_port(sa->p_subn, sa->p_log, p_mgrp, + p_recvd_mcmember_rec->port_gid. + unicast.interface_id); + __cleanup_mgrp(sa, p_mgrp); CL_PLOCK_RELEASE(sa->p_lock); osm_sa_send_error(sa, p_madw, IB_SA_MAD_STATUS_NO_RESOURCES); goto Exit; -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 13 12:40:24 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 22:40:24 +0200 Subject: [ofa-general] [PATCH] opensm/multicast: remove some unused parameters. Message-ID: <20081213204024.GY15622@sashak.voltaire.com> Remove some unused in multicast routing processing parameters - req_type and port_guid. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_base.h | 17 ------------- opensm/include/opensm/osm_multicast.h | 10 ------- opensm/include/opensm/osm_sm.h | 7 +---- opensm/opensm/osm_mcast_mgr.c | 31 ++++++------------------ opensm/opensm/osm_sa_mcmember_record.c | 8 +---- opensm/opensm/osm_sm.c | 41 +++++-------------------------- 6 files changed, 18 insertions(+), 96 deletions(-) diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h index 54df41e..7f485ff 100644 --- a/opensm/include/opensm/osm_base.h +++ b/opensm/include/opensm/osm_base.h @@ -828,23 +828,6 @@ typedef enum _osm_sm_signal { } osm_sm_signal_t; /***********/ -/****d* OpenSM/osm_mcast_req_type_t -* NAME -* osm_mcast_req_type_t -* -* DESCRIPTION -* Enumerates the possible signals used by the OSM_MCAST_REQUEST -* -* SYNOPSIS -*/ -typedef enum _osm_mcast_req_type { - OSM_MCAST_REQ_TYPE_CREATE, - OSM_MCAST_REQ_TYPE_JOIN, - OSM_MCAST_REQ_TYPE_LEAVE, - OSM_MCAST_REQ_TYPE_SUBNET_CHANGE -} osm_mcast_req_type_t; -/***********/ - /****s* OpenSM: Base/MAX_GUID_FILE_LINE_LENGTH * NAME * MAX_GUID_FILE_LINE_LENGTH diff --git a/opensm/include/opensm/osm_multicast.h b/opensm/include/opensm/osm_multicast.h index bd219d1..a871306 100644 --- a/opensm/include/opensm/osm_multicast.h +++ b/opensm/include/opensm/osm_multicast.h @@ -96,8 +96,6 @@ BEGIN_C_DECLS typedef struct osm_mcast_mgr_ctxt { cl_list_item_t list_item; ib_net16_t mlid; - osm_mcast_req_type_t req_type; - ib_net64_t port_guid; } osm_mcast_mgr_ctxt_t; /* * FIELDS @@ -106,14 +104,6 @@ typedef struct osm_mcast_mgr_ctxt { * The network ordered LID of this Multicast Group * (must be >= 0xC000). * -* req_type -* The type of the request that caused this call -* (multicast create/join/leave). -* -* port_guid -* The port guid of the port that is being added/removed from -* the multicast group due to this call. -* * SEE ALSO *********/ diff --git a/opensm/include/opensm/osm_sm.h b/opensm/include/opensm/osm_sm.h index ebe3dc3..cc8321d 100644 --- a/opensm/include/opensm/osm_sm.h +++ b/opensm/include/opensm/osm_sm.h @@ -539,8 +539,7 @@ osm_resp_send(IN osm_sm_t * sm, ib_api_status_t osm_sm_mcgrp_join(IN osm_sm_t * const p_sm, IN const ib_net16_t mlid, - IN const ib_net64_t port_guid, - IN osm_mcast_req_type_t req_type); + IN const ib_net64_t port_guid); /* * PARAMETERS * p_sm @@ -552,10 +551,6 @@ osm_sm_mcgrp_join(IN osm_sm_t * const p_sm, * port_guid * [in] Port GUID to add to the group. * -* req_type -* [in] Type of the MC request that caused this join -* (MC create/join). -* * RETURN VALUES * None * diff --git a/opensm/opensm/osm_mcast_mgr.c b/opensm/opensm/osm_mcast_mgr.c index 2f9cb5e..e42be7b 100644 --- a/opensm/opensm/osm_mcast_mgr.c +++ b/opensm/opensm/osm_mcast_mgr.c @@ -68,9 +68,6 @@ typedef struct osm_mcast_work_obj { static osm_mcast_work_obj_t *__osm_mcast_work_obj_new(IN const osm_port_t * const p_port) { - /* - TO DO - get these objects from a lockpool. - */ osm_mcast_work_obj_t *p_obj; /* @@ -895,7 +892,7 @@ osm_mcast_mgr_set_table(osm_sm_t * sm, /********************************************************************** **********************************************************************/ -static void __osm_mcast_mgr_clear(osm_sm_t * sm, IN osm_mgrp_t * const p_mgrp) +static void __osm_mcast_mgr_clear(osm_sm_t * sm, uint16_t mlid) { osm_switch_t *p_sw; cl_qmap_t *p_sw_tbl; @@ -911,7 +908,7 @@ static void __osm_mcast_mgr_clear(osm_sm_t * sm, IN osm_mgrp_t * const p_mgrp) p_sw = (osm_switch_t *) cl_qmap_head(p_sw_tbl); while (p_sw != (osm_switch_t *) cl_qmap_end(p_sw_tbl)) { p_mcast_tbl = osm_switch_get_mcast_tbl_ptr(p_sw); - osm_mcast_tbl_clear_mlid(p_mcast_tbl, cl_ntoh16(p_mgrp->mlid)); + osm_mcast_tbl_clear_mlid(p_mcast_tbl, mlid); p_sw = (osm_switch_t *) cl_qmap_next(&p_sw->map_item); } @@ -1046,10 +1043,7 @@ Exit: lock must already be held on entry **********************************************************************/ static ib_api_status_t -osm_mcast_mgr_process_tree(osm_sm_t * sm, - IN osm_mgrp_t * const p_mgrp, - IN osm_mcast_req_type_t req_type, - ib_net64_t port_guid) +osm_mcast_mgr_process_tree(osm_sm_t * sm, IN osm_mgrp_t * const p_mgrp) { ib_api_status_t status = IB_SUCCESS; ib_net16_t mlid; @@ -1075,7 +1069,7 @@ osm_mcast_mgr_process_tree(osm_sm_t * sm, the spanning tree which sets the mcast table bits for each port in the group. */ - __osm_mcast_mgr_clear(sm, p_mgrp); + __osm_mcast_mgr_clear(sm, cl_ntoh16(mlid)); if (!p_mgrp->full_members) goto Exit; @@ -1098,16 +1092,13 @@ Exit: NOTE : The lock should be held externally! **********************************************************************/ static ib_api_status_t -mcast_mgr_process_mgrp(osm_sm_t * sm, - IN osm_mgrp_t * const p_mgrp, - IN osm_mcast_req_type_t req_type, - IN ib_net64_t port_guid) +mcast_mgr_process_mgrp(osm_sm_t * sm, IN osm_mgrp_t * const p_mgrp) { ib_api_status_t status; OSM_LOG_ENTER(sm->p_log); - status = osm_mcast_mgr_process_tree(sm, p_mgrp, req_type, port_guid); + status = osm_mcast_mgr_process_tree(sm, p_mgrp); if (status != IB_SUCCESS) { OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 0A19: " "Unable to create spanning tree (%s)\n", @@ -1162,9 +1153,7 @@ osm_signal_t osm_mcast_mgr_process(osm_sm_t * sm) */ p_mgrp = sm->p_subn->mgroups[i]; if (p_mgrp) - mcast_mgr_process_mgrp(sm, p_mgrp, - OSM_MCAST_REQ_TYPE_SUBNET_CHANGE, - 0); + mcast_mgr_process_mgrp(sm, p_mgrp); } /* @@ -1206,8 +1195,6 @@ osm_signal_t osm_mcast_mgr_process_mgroups(osm_sm_t * sm) ib_net16_t mlid; osm_signal_t ret, signal = OSM_SIGNAL_DONE; osm_mcast_mgr_ctxt_t *ctx; - osm_mcast_req_type_t req_type; - ib_net64_t port_guid; OSM_LOG_ENTER(sm->p_log); @@ -1216,8 +1203,6 @@ osm_signal_t osm_mcast_mgr_process_mgroups(osm_sm_t * sm) while (!cl_is_qlist_empty(p_list)) { ctx = (osm_mcast_mgr_ctxt_t *) cl_qlist_remove_head(p_list); - req_type = ctx->req_type; - port_guid = ctx->port_guid; /* nice copy no warning on size diff */ memcpy(&mlid, &ctx->mlid, sizeof(mlid)); @@ -1244,7 +1229,7 @@ osm_signal_t osm_mcast_mgr_process_mgroups(osm_sm_t * sm) OSM_LOG(sm->p_log, OSM_LOG_DEBUG, "Processing mgrp with lid:0x%X change id:%u\n", cl_ntoh16(mlid), p_mgrp->last_change_id); - mcast_mgr_process_mgrp(sm, p_mgrp, req_type, port_guid); + mcast_mgr_process_mgrp(sm, p_mgrp); } /* diff --git a/opensm/opensm/osm_sa_mcmember_record.c b/opensm/opensm/osm_sa_mcmember_record.c index 4561808..b586942 100644 --- a/opensm/opensm/osm_sa_mcmember_record.c +++ b/opensm/opensm/osm_sa_mcmember_record.c @@ -1123,7 +1123,6 @@ __osm_mcmr_rcv_join_mgrp(IN osm_sa_t * sa, IN osm_madw_t * const p_madw) osm_physp_t *p_physp; osm_physp_t *p_request_physp; uint8_t is_new_group; /* TRUE = there is a need to create a group */ - osm_mcast_req_type_t req_type; uint8_t join_state; OSM_LOG_ENTER(sa->p_log); @@ -1235,12 +1234,9 @@ __osm_mcmr_rcv_join_mgrp(IN osm_sa_t * sa, IN osm_madw_t * const p_madw) /* copy the MGID to the result */ mcmember_rec.mgid = p_mgrp->mcmember_rec.mgid; is_new_group = 1; - req_type = OSM_MCAST_REQ_TYPE_CREATE; - } else { + } else /* no need for a new group */ is_new_group = 0; - req_type = OSM_MCAST_REQ_TYPE_JOIN; - } CL_ASSERT(p_mgrp); mlid = p_mgrp->mlid; @@ -1331,7 +1327,7 @@ __osm_mcmr_rcv_join_mgrp(IN osm_sa_t * sa, IN osm_madw_t * const p_madw) /* do the actual routing (actually schedule the update) */ status = osm_sm_mcgrp_join(sa->sm, mlid, p_recvd_mcmember_rec->port_gid.unicast. - interface_id, req_type); + interface_id); if (status != IB_SUCCESS) { OSM_LOG(sa->p_log, OSM_LOG_ERROR, "ERR 1B14: " diff --git a/opensm/opensm/osm_sm.c b/opensm/opensm/osm_sm.c index 649ff2a..d1d8863 100644 --- a/opensm/opensm/osm_sm.c +++ b/opensm/opensm/osm_sm.c @@ -449,9 +449,7 @@ Exit: **********************************************************************/ static ib_api_status_t __osm_sm_mgrp_process(IN osm_sm_t * const p_sm, - IN osm_mgrp_t * const p_mgrp, - IN const ib_net64_t port_guid, - IN osm_mcast_req_type_t req_type) + IN osm_mgrp_t * const p_mgrp) { osm_mcast_mgr_ctxt_t *ctx; @@ -464,8 +462,6 @@ __osm_sm_mgrp_process(IN osm_sm_t * const p_sm, return IB_ERROR; memset(ctx, 0, sizeof(*ctx)); ctx->mlid = p_mgrp->mlid; - ctx->req_type = req_type; - ctx->port_guid = port_guid; cl_spinlock_acquire(&p_sm->mgrp_lock); cl_qlist_insert_tail(&p_sm->mgrp_list, &ctx->list_item); @@ -478,33 +474,10 @@ __osm_sm_mgrp_process(IN osm_sm_t * const p_sm, /********************************************************************** **********************************************************************/ -static ib_api_status_t -__osm_sm_mgrp_connect(IN osm_sm_t * const p_sm, - IN osm_mgrp_t * const p_mgrp, - IN const ib_net64_t port_guid, - IN osm_mcast_req_type_t req_type) -{ - return __osm_sm_mgrp_process(p_sm, p_mgrp, port_guid, req_type); -} - -/********************************************************************** - **********************************************************************/ -static void -__osm_sm_mgrp_disconnect(IN osm_sm_t * const p_sm, - IN osm_mgrp_t * const p_mgrp, - IN const ib_net64_t port_guid) -{ - __osm_sm_mgrp_process(p_sm, p_mgrp, port_guid, - OSM_MCAST_REQ_TYPE_LEAVE); -} - -/********************************************************************** - **********************************************************************/ ib_api_status_t osm_sm_mcgrp_join(IN osm_sm_t * const p_sm, IN const ib_net16_t mlid, - IN const ib_net64_t port_guid, - IN osm_mcast_req_type_t req_type) + IN const ib_net64_t port_guid) { osm_mgrp_t *p_mgrp; osm_port_t *p_port; @@ -579,12 +552,12 @@ osm_sm_mcgrp_join(IN osm_sm_t * const p_sm, goto Exit; } - status = __osm_sm_mgrp_connect(p_sm, p_mgrp, port_guid, req_type); + status = __osm_sm_mgrp_process(p_sm, p_mgrp); CL_PLOCK_RELEASE(p_sm->p_lock); Exit: OSM_LOG_EXIT(p_sm->p_log); - return (status); + return status; } /********************************************************************** @@ -595,7 +568,7 @@ osm_sm_mcgrp_leave(IN osm_sm_t * const p_sm, { osm_mgrp_t *p_mgrp; osm_port_t *p_port; - ib_api_status_t status = IB_SUCCESS; + ib_api_status_t status; OSM_LOG_ENTER(p_sm->p_log); @@ -635,12 +608,12 @@ osm_sm_mcgrp_leave(IN osm_sm_t * const p_sm, */ osm_port_remove_mgrp(p_port, mlid); - __osm_sm_mgrp_disconnect(p_sm, p_mgrp, port_guid); + status = __osm_sm_mgrp_process(p_sm, p_mgrp); CL_PLOCK_RELEASE(p_sm->p_lock); Exit: OSM_LOG_EXIT(p_sm->p_log); - return (status); + return status; } void osm_set_sm_priority(osm_sm_t *sm, uint8_t priority) -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 13 12:43:03 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 22:43:03 +0200 Subject: [ofa-general] [PATCH] opensm/osm_subnet: consolidate some duplicated code Message-ID: <20081213204303.GZ15622@sashak.voltaire.com> Consolidate some duplicated code in osm_get_*_by_mad_addr() functions. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_subnet.h | 2 +- opensm/opensm/osm_subnet.c | 118 +++++++++++------------------------- 2 files changed, 36 insertions(+), 84 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index d97d5f4..fe456d5 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -741,7 +741,7 @@ struct osm_mgrp; ib_api_status_t osm_get_gid_by_mad_addr(IN struct osm_log *p_log, IN const osm_subn_t * p_subn, - IN const struct osm_mad_addr *p_mad_addr, + IN struct osm_mad_addr *p_mad_addr, OUT ib_gid_t * p_gid); /* * PARAMETERS diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index c41962d..9136021 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -210,58 +210,12 @@ osm_subn_init(IN osm_subn_t * const p_subn, /********************************************************************** **********************************************************************/ -ib_api_status_t -osm_get_gid_by_mad_addr(IN osm_log_t * p_log, - IN const osm_subn_t * p_subn, - IN const osm_mad_addr_t * p_mad_addr, - OUT ib_gid_t * p_gid) -{ - const cl_ptr_vector_t *p_tbl; - const osm_port_t *p_port = NULL; - - if (p_gid == NULL) { - OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7505: " - "Provided output GID is NULL\n"); - return (IB_INVALID_PARAMETER); - } - - /* Find the port gid of the request in the subnet */ - p_tbl = &p_subn->port_lid_tbl; - - CL_ASSERT(cl_ptr_vector_get_size(p_tbl) < 0x10000); - - if ((uint16_t) cl_ptr_vector_get_size(p_tbl) > - cl_ntoh16(p_mad_addr->dest_lid)) { - p_port = - cl_ptr_vector_get(p_tbl, cl_ntoh16(p_mad_addr->dest_lid)); - if (p_port == NULL) { - OSM_LOG(p_log, OSM_LOG_DEBUG, - "Did not find any port with LID: %u\n", - cl_ntoh16(p_mad_addr->dest_lid)); - return (IB_INVALID_PARAMETER); - } - p_gid->unicast.interface_id = p_port->p_physp->port_guid; - p_gid->unicast.prefix = p_subn->opt.subnet_prefix; - } else { - /* The dest_lid is not in the subnet table - this is an error */ - OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7501: " - "LID is out of range: %u\n", - cl_ntoh16(p_mad_addr->dest_lid)); - return (IB_INVALID_PARAMETER); - } - - return (IB_SUCCESS); -} - -/********************************************************************** - **********************************************************************/ -osm_physp_t *osm_get_physp_by_mad_addr(IN osm_log_t * p_log, - IN const osm_subn_t * p_subn, - IN osm_mad_addr_t * p_mad_addr) +osm_port_t *osm_get_port_by_mad_addr(IN osm_log_t * p_log, + IN const osm_subn_t * p_subn, + IN osm_mad_addr_t * p_mad_addr) { const cl_ptr_vector_t *p_port_lid_tbl; osm_port_t *p_port = NULL; - osm_physp_t *p_physp = NULL; /* Find the port gid of the request in the subnet */ p_port_lid_tbl = &p_subn->port_lid_tbl; @@ -273,53 +227,51 @@ osm_physp_t *osm_get_physp_by_mad_addr(IN osm_log_t * p_log, p_port = cl_ptr_vector_get(p_port_lid_tbl, cl_ntoh16(p_mad_addr->dest_lid)); - if (p_port == NULL) { - /* The port is not in the port_lid table - this is an error */ - OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7502: " - "Cannot locate port object by lid: %u\n", - cl_ntoh16(p_mad_addr->dest_lid)); - - goto Exit; - } - p_physp = p_port->p_physp; } else { /* The dest_lid is not in the subnet table - this is an error */ - OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7503: " + OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7504: " "Lid is out of range: %u\n", cl_ntoh16(p_mad_addr->dest_lid)); } -Exit: - return p_physp; + return p_port; } -/********************************************************************** - **********************************************************************/ -osm_port_t *osm_get_port_by_mad_addr(IN osm_log_t * p_log, - IN const osm_subn_t * p_subn, - IN osm_mad_addr_t * p_mad_addr) +ib_api_status_t +osm_get_gid_by_mad_addr(IN osm_log_t * p_log, + IN const osm_subn_t * p_subn, + IN osm_mad_addr_t * p_mad_addr, + OUT ib_gid_t * p_gid) { - const cl_ptr_vector_t *p_port_lid_tbl; - osm_port_t *p_port = NULL; + const osm_port_t *p_port = NULL; - /* Find the port gid of the request in the subnet */ - p_port_lid_tbl = &p_subn->port_lid_tbl; + if (p_gid == NULL) { + OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7505: " + "Provided output GID is NULL\n"); + return (IB_INVALID_PARAMETER); + } - CL_ASSERT(cl_ptr_vector_get_size(p_port_lid_tbl) < 0x10000); + p_port = osm_get_port_by_mad_addr(p_log, p_subn, p_mad_addr); + if (!p_port) + return IB_INVALID_PARAMETER; - if ((uint16_t) cl_ptr_vector_get_size(p_port_lid_tbl) > - cl_ntoh16(p_mad_addr->dest_lid)) { - p_port = - cl_ptr_vector_get(p_port_lid_tbl, - cl_ntoh16(p_mad_addr->dest_lid)); - } else { - /* The dest_lid is not in the subnet table - this is an error */ - OSM_LOG(p_log, OSM_LOG_ERROR, "ERR 7504: " - "Lid is out of range: %u\n", - cl_ntoh16(p_mad_addr->dest_lid)); - } + p_gid->unicast.interface_id = p_port->p_physp->port_guid; + p_gid->unicast.prefix = p_subn->opt.subnet_prefix; - return p_port; + return IB_SUCCESS; +} + +osm_physp_t *osm_get_physp_by_mad_addr(IN osm_log_t * p_log, + IN const osm_subn_t * p_subn, + IN osm_mad_addr_t * p_mad_addr) +{ + osm_port_t *p_port = NULL; + + p_port = osm_get_port_by_mad_addr(p_log, p_subn, p_mad_addr); + if (!p_port) + return NULL; + + return p_port->p_physp; } /********************************************************************** -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 13 12:45:49 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 22:45:49 +0200 Subject: [ofa-general] [PATCH] infiniband-diabs/saquery: unify SA queries processors Message-ID: <20081213204549.GA15622@sashak.voltaire.com> Unify single SA queries processors, rename it print_*() -> query_*(). Signed-off-by: Sasha Khapyorsky --- infiniband-diags/src/saquery.c | 128 +++++++++++++++++++++++++--------------- 1 files changed, 80 insertions(+), 48 deletions(-) diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c index 11a573f..aca9bd7 100644 --- a/infiniband-diags/src/saquery.c +++ b/infiniband-diags/src/saquery.c @@ -987,7 +987,8 @@ static ib_api_status_t get_print_class_port_info(osm_bind_handle_t h) return (status); } -static ib_api_status_t print_path_records(osm_bind_handle_t h) +static int query_path_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) { ib_net16_t attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); ib_api_status_t status; @@ -1073,7 +1074,32 @@ static ib_api_status_t print_multicast_group_records(osm_bind_handle_t h) return (status); } -static ib_api_status_t print_service_records(osm_bind_handle_t h) +static int query_class_port_info(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) +{ + return get_print_class_port_info(h); +} + +static int query_node_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) +{ + return print_node_records(h); +} + +static int query_portinfo_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) +{ + return print_portinfo_records(h); +} + +static int query_mcmember_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) +{ + return print_multicast_member_records(h); +} + +static int query_service_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) { ib_net16_t attr_offset = ib_get_attr_offset(sizeof(ib_service_record_t)); @@ -1088,7 +1114,8 @@ static ib_api_status_t print_service_records(osm_bind_handle_t h) return (status); } -static ib_api_status_t print_inform_info_records(osm_bind_handle_t h) +static int query_informinfo_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) { ib_net16_t attr_offset = ib_get_attr_offset(sizeof(ib_inform_info_record_t)); @@ -1104,8 +1131,8 @@ static ib_api_status_t print_inform_info_records(osm_bind_handle_t h) return (status); } -static ib_api_status_t -print_link_records(osm_bind_handle_t h, int argc, char *argv[]) +static int query_link_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) { ib_link_record_t lr; ib_net64_t comp_mask = 0; @@ -1148,9 +1175,8 @@ print_link_records(osm_bind_handle_t h, int argc, char *argv[]) return status; } -static int -print_sl2vl_records(const struct query_cmd *q, osm_bind_handle_t h, - int argc, char *argv[]) +static int query_sl2vl_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) { ib_slvl_table_record_t slvl; ib_net64_t comp_mask = 0; @@ -1186,9 +1212,8 @@ print_sl2vl_records(const struct query_cmd *q, osm_bind_handle_t h, return status; } -static int -print_vlarb_records(const struct query_cmd *q, osm_bind_handle_t h, - int argc, char *argv[]) +static int query_vlarb_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) { ib_vl_arb_table_record_t vlarb; ib_net64_t comp_mask = 0; @@ -1224,9 +1249,8 @@ print_vlarb_records(const struct query_cmd *q, osm_bind_handle_t h, return status; } -static int -print_pkey_tbl_records(const struct query_cmd *q, osm_bind_handle_t h, - int argc, char *argv[]) +static int query_pkey_tbl_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) { ib_pkey_table_record_t pktr; ib_net64_t comp_mask = 0; @@ -1262,9 +1286,8 @@ print_pkey_tbl_records(const struct query_cmd *q, osm_bind_handle_t h, return status; } -static int -print_lft_records(const struct query_cmd *q, osm_bind_handle_t h, - int argc, char *argv[]) +static int query_lft_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) { ib_lft_record_t lftr; ib_net64_t comp_mask = 0; @@ -1296,9 +1319,8 @@ print_lft_records(const struct query_cmd *q, osm_bind_handle_t h, return status; } -static int -print_mft_records(const struct query_cmd *q, osm_bind_handle_t h, - int argc, char *argv[]) +static int query_mft_records(const struct query_cmd *q, + osm_bind_handle_t h, int argc, char *argv[]) { ib_mft_record_t mftr; ib_net64_t comp_mask = 0; @@ -1412,25 +1434,32 @@ static void clean_up(void) } static const struct query_cmd query_cmds[] = { - {"ClassPortInfo", "CPI", IB_MAD_ATTR_CLASS_PORT_INFO,}, - {"NodeRecord", "NR", IB_MAD_ATTR_NODE_RECORD,}, - {"PortInfoRecord", "PIR", IB_MAD_ATTR_PORTINFO_RECORD,}, + {"ClassPortInfo", "CPI", IB_MAD_ATTR_CLASS_PORT_INFO, + NULL, query_class_port_info}, + {"NodeRecord", "NR", IB_MAD_ATTR_NODE_RECORD, + NULL, query_node_records}, + {"PortInfoRecord", "PIR", IB_MAD_ATTR_PORTINFO_RECORD, + NULL, query_portinfo_records}, {"SL2VLTableRecord", "SL2VL", IB_MAD_ATTR_SLVL_RECORD, - "[[lid]/[in_port]/[out_port]]", print_sl2vl_records}, + "[[lid]/[in_port]/[out_port]]", query_sl2vl_records}, {"PKeyTableRecord", "PKTR", IB_MAD_ATTR_PKEY_TBL_RECORD, - "[[lid]/[port]/[block]]", print_pkey_tbl_records}, + "[[lid]/[port]/[block]]", query_pkey_tbl_records}, {"VLArbitrationTableRecord", "VLAR", IB_MAD_ATTR_VLARB_RECORD, - "[[lid]/[port]/[block]]", print_vlarb_records}, - {"InformInfoRecord", "IIR", IB_MAD_ATTR_INFORM_INFO_RECORD,}, + "[[lid]/[port]/[block]]", query_vlarb_records}, + {"InformInfoRecord", "IIR", IB_MAD_ATTR_INFORM_INFO_RECORD, + NULL, query_informinfo_records}, {"LinkRecord", "LR", IB_MAD_ATTR_LINK_RECORD, - "[[from_lid]/[from_port]] [[to_lid]/[to_port]]",}, - {"ServiceRecord", "SR", IB_MAD_ATTR_SERVICE_RECORD,}, - {"PathRecord", "PR", IB_MAD_ATTR_PATH_RECORD,}, - {"MCMemberRecord", "MCMR", IB_MAD_ATTR_MCMEMBER_RECORD,}, + "[[from_lid]/[from_port]] [[to_lid]/[to_port]]", query_link_records}, + {"ServiceRecord", "SR", IB_MAD_ATTR_SERVICE_RECORD, + NULL, query_service_records}, + {"PathRecord", "PR", IB_MAD_ATTR_PATH_RECORD, + NULL, query_path_records}, + {"MCMemberRecord", "MCMR", IB_MAD_ATTR_MCMEMBER_RECORD, + NULL, query_mcmember_records}, {"LFTRecord", "LFTR", IB_MAD_ATTR_LFT_RECORD, - "[[lid]/[block]]", print_lft_records}, + "[[lid]/[block]]", query_lft_records}, {"MFTRecord", "MFTR", IB_MAD_ATTR_MFT_RECORD, - "[[mlid]/[position]/[block]]", print_mft_records}, + "[[mlid]/[position]/[block]]", query_mft_records}, {0} }; @@ -1447,6 +1476,17 @@ static const struct query_cmd *find_query(const char *name) return NULL; } +static const struct query_cmd *find_query_by_type(ib_net16_t type) +{ + const struct query_cmd *q; + + for (q = query_cmds; q->name; q++) + if (q->query_type == type) + return q; + + return NULL; +} + static void usage(void) { const struct query_cmd *q; @@ -1741,7 +1781,7 @@ int main(int argc, char **argv) (ib_gid_t *) & src_addr.s6_addr, (ib_gid_t *) & dst_addr.s6_addr); } else { - status = print_path_records(h); + status = query_path_records(q, h, 0, NULL); } break; case IB_MAD_ATTR_CLASS_PORT_INFO: @@ -1756,22 +1796,14 @@ int main(int argc, char **argv) else status = print_multicast_group_records(h); break; - case IB_MAD_ATTR_SERVICE_RECORD: - status = print_service_records(h); - break; - case IB_MAD_ATTR_INFORM_INFO_RECORD: - status = print_inform_info_records(h); - break; - case IB_MAD_ATTR_LINK_RECORD: - status = print_link_records(h, argc, argv); - break; default: - if (q && q->handler) - status = q->handler(q, h, argc, argv); - else { - fprintf(stderr, "Unknown query type %d\n", query_type); + if ((!q && !(q = find_query_by_type(query_type))) + || !q->handler) { + fprintf(stderr, "Unknown query type %d\n", + ntohs(query_type)); status = IB_UNKNOWN_ERROR; - } + } else + status = q->handler(q, h, argc, argv); break; } -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 13 12:46:31 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 22:46:31 +0200 Subject: [ofa-general] [PATCH] infiniband-diags/saquery: separate queries and commands In-Reply-To: <20081213204549.GA15622@sashak.voltaire.com> References: <20081213204549.GA15622@sashak.voltaire.com> Message-ID: <20081213204631.GB15622@sashak.voltaire.com> This new control level 'command' will be used to preserve useful backward compatible usage (controlled by command line options) where complex SA queries are used. Single queries (controlled by query names) will be performed by query id using SAQUERY_CMD_QUERY command. Such separation will help us to extend existing functionality and to preserve saquery backward compatibility. Also rename print_portinfo_records() to print_issm_records() b/c it is what this function does (controlled by "command"). query_portinfo_records() will be implemented later. Signed-off-by: Sasha Khapyorsky --- infiniband-diags/src/saquery.c | 55 +++++++++++++++++++++++---------------- 1 files changed, 32 insertions(+), 23 deletions(-) diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c index aca9bd7..e4175c2 100644 --- a/infiniband-diags/src/saquery.c +++ b/infiniband-diags/src/saquery.c @@ -1002,7 +1002,7 @@ static int query_path_records(const struct query_cmd *q, return (status); } -static ib_api_status_t print_portinfo_records(osm_bind_handle_t h) +static ib_api_status_t print_issm_records(osm_bind_handle_t h) { ib_api_status_t status; @@ -1089,7 +1089,7 @@ static int query_node_records(const struct query_cmd *q, static int query_portinfo_records(const struct query_cmd *q, osm_bind_handle_t h, int argc, char *argv[]) { - return print_portinfo_records(h); + return print_issm_records(h); } static int query_mcmember_records(const struct query_cmd *q, @@ -1541,11 +1541,21 @@ static void usage(void) exit(-1); } +enum saquery_command { + SAQUERY_CMD_QUERY, + SAQUERY_CMD_NODE_RECORD, + SAQUERY_CMD_PATH_RECORD, + SAQUERY_CMD_CLASS_PORT_INFO, + SAQUERY_CMD_ISSM, + SAQUERY_CMD_MCGROUPS, + SAQUERY_CMD_MCMEMBERS, +}; + int main(int argc, char **argv) { int ch = 0; - int members = 0; osm_bind_handle_t h; + enum saquery_command command = SAQUERY_CMD_QUERY; const struct query_cmd *q = NULL; char *src = NULL, *dst = NULL; char *sgid = NULL, *dgid = NULL; @@ -1602,7 +1612,7 @@ int main(int argc, char **argv) if (*ch) dst = strdup(ch); free(opt); - query_type = IB_MAD_ATTR_PATH_RECORD; + command = SAQUERY_CMD_PATH_RECORD; break; } case 2: @@ -1620,7 +1630,7 @@ int main(int argc, char **argv) usage(); } free(opt); - query_type = IB_MAD_ATTR_PATH_RECORD; + command = SAQUERY_CMD_PATH_RECORD; break; } case 3: @@ -1635,7 +1645,7 @@ int main(int argc, char **argv) smkey = cl_hton64(strtoull(optarg, NULL, 0)); break; case 'p': - query_type = IB_MAD_ATTR_PATH_RECORD; + command = SAQUERY_CMD_PATH_RECORD; break; case 'V': fprintf(stderr, "%s %s\n", argv0, get_build_version()); @@ -1644,7 +1654,7 @@ int main(int argc, char **argv) node_print_desc = ALL_DESC; break; case 'c': - query_type = IB_MAD_ATTR_CLASS_PORT_INFO; + command = SAQUERY_CMD_CLASS_PORT_INFO; break; case 'S': query_type = IB_MAD_ATTR_SERVICE_RECORD; @@ -1653,7 +1663,7 @@ int main(int argc, char **argv) query_type = IB_MAD_ATTR_INFORM_INFO_RECORD; break; case 'N': - query_type = IB_MAD_ATTR_NODE_RECORD; + command = SAQUERY_CMD_NODE_RECORD; break; case 'L': node_print_desc = LID_ONLY; @@ -1671,14 +1681,13 @@ int main(int argc, char **argv) node_print_desc = NAME_OF_GUID; break; case 's': - query_type = IB_MAD_ATTR_PORTINFO_RECORD; + command = SAQUERY_CMD_ISSM; break; case 'g': - query_type = IB_MAD_ATTR_MCMEMBER_RECORD; + command = SAQUERY_CMD_MCGROUPS; break; case 'm': - query_type = IB_MAD_ATTR_MCMEMBER_RECORD; - members = 1; + command = SAQUERY_CMD_MCMEMBERS; break; case 'x': query_type = IB_MAD_ATTR_LINK_RECORD; @@ -1751,11 +1760,11 @@ int main(int argc, char **argv) h = get_bind_handle(); node_name_map = open_node_name_map(node_name_map_file); - switch (query_type) { - case IB_MAD_ATTR_NODE_RECORD: + switch (command) { + case SAQUERY_CMD_NODE_RECORD: status = print_node_records(h); break; - case IB_MAD_ATTR_PATH_RECORD: + case SAQUERY_CMD_PATH_RECORD: if (src && dst) { src_lid = get_lid(h, src); dst_lid = get_lid(h, dst); @@ -1784,17 +1793,17 @@ int main(int argc, char **argv) status = query_path_records(q, h, 0, NULL); } break; - case IB_MAD_ATTR_CLASS_PORT_INFO: + case SAQUERY_CMD_CLASS_PORT_INFO: status = get_print_class_port_info(h); break; - case IB_MAD_ATTR_PORTINFO_RECORD: - status = print_portinfo_records(h); + case SAQUERY_CMD_ISSM: + status = print_issm_records(h); + break; + case SAQUERY_CMD_MCGROUPS: + status = print_multicast_group_records(h); break; - case IB_MAD_ATTR_MCMEMBER_RECORD: - if (members) - status = print_multicast_member_records(h); - else - status = print_multicast_group_records(h); + case SAQUERY_CMD_MCMEMBERS: + status = print_multicast_member_records(h); break; default: if ((!q && !(q = find_query_by_type(query_type))) -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 13 12:47:27 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 22:47:27 +0200 Subject: [ofa-general] [PATCH] infiniband-diags/saquery: PortInfoRecord query In-Reply-To: <20081213204631.GB15622@sashak.voltaire.com> References: <20081213204549.GA15622@sashak.voltaire.com> <20081213204631.GB15622@sashak.voltaire.com> Message-ID: <20081213204727.GC15622@sashak.voltaire.com> PortInfoRecord query implementation. Alias is "PIR" (case insensitive), usage is [lid]/[port], all is optional. Signed-off-by: Sasha Khapyorsky --- infiniband-diags/man/saquery.8 | 2 +- infiniband-diags/src/saquery.c | 67 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 66 insertions(+), 3 deletions(-) diff --git a/infiniband-diags/man/saquery.8 b/infiniband-diags/man/saquery.8 index 5c75c21..82a5fed 100644 --- a/infiniband-diags/man/saquery.8 +++ b/infiniband-diags/man/saquery.8 @@ -105,7 +105,7 @@ for node name map file format. Only used with the \fB\-O\fR and \fB\-U\fR optio Supported query names (and aliases): ClassPortInfo (CPI) NodeRecord (NR) - PortInfoRecord (PIR) + PortInfoRecord (PIR) [[lid]/[port]] SL2VLTableRecord (SL2VL) [[lid]/[in_port]/[out_port]] PKeyTableRecord (PKTR) [[lid]/[port]/[block]] VLArbitrationTableRecord (VLAR) [[lid]/[port]/[block]] diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c index e4175c2..1cc4aca 100644 --- a/infiniband-diags/src/saquery.c +++ b/infiniband-diags/src/saquery.c @@ -49,6 +49,7 @@ #define _GNU_SOURCE #include +#include #include #include #include @@ -102,6 +103,20 @@ int requested_lid_flag = 0; ib_net64_t requested_guid = 0; int requested_guid_flag = 0; +static void format_buf(char *in, char *out, unsigned size) +{ + unsigned i; + + for (i = 0; i < size - 3 && *in; i++) { + *out++ = *in; + if (*in++ == '\n' && *in) { + *out++ = '\t'; + *out++ = '\t'; + } + } + *out = '\0'; +} + /** * Call back for the various record requests. */ @@ -297,6 +312,25 @@ static void dump_portinfo_record(void *data) ); } +static void dump_one_portinfo_record(void *data) +{ + char buf[2048], buf2[4096]; + ib_portinfo_record_t *pir = data; + ib_port_info_t *pi = &pir->port_info; + + mad_dump_portinfo(buf, sizeof(buf), pi, sizeof(*pi)); + + format_buf(buf, buf2, sizeof(buf2)); + + printf("PortInfoRecord dump:\n" + "\tRID:\n" + "\t\tEndPortLid..............%u\n" + "\t\tPortNum.................0x%x\n" + "\t\tReserved................0x%x\n" + "\tPortInfo dump:\n\t\t%s", + cl_ntoh16(pir->lid), pir->port_num, pir->resv, buf2); +} + static void dump_multicast_group_record(void *data) { char gid_str[INET6_ADDRSTRLEN]; @@ -1089,7 +1123,36 @@ static int query_node_records(const struct query_cmd *q, static int query_portinfo_records(const struct query_cmd *q, osm_bind_handle_t h, int argc, char *argv[]) { - return print_issm_records(h); + ib_portinfo_record_t pir; + ib_net64_t comp_mask = 0; + int lid = 0, port = -1; + ib_api_status_t status; + + if (argc > 0) + parse_lid_and_ports(h, argv[0], &lid, &port, NULL); + + memset(&pir, 0, sizeof(pir)); + + if (lid > 0) { + pir.lid = cl_hton16(lid); + comp_mask |= IB_PIR_COMPMASK_LID; + } + if (port >= 0) { + pir.port_num = cl_hton16(port); + comp_mask |= IB_PIR_COMPMASK_PORTNUM; + } + + status = get_any_records(h, IB_MAD_ATTR_PORTINFO_RECORD, 0, + comp_mask, &pir, + ib_get_attr_offset(sizeof(pir)), 0); + + if (status != IB_SUCCESS) + return status; + + dump_results(&result, dump_one_portinfo_record); + return_mad(); + + return 0; } static int query_mcmember_records(const struct query_cmd *q, @@ -1439,7 +1502,7 @@ static const struct query_cmd query_cmds[] = { {"NodeRecord", "NR", IB_MAD_ATTR_NODE_RECORD, NULL, query_node_records}, {"PortInfoRecord", "PIR", IB_MAD_ATTR_PORTINFO_RECORD, - NULL, query_portinfo_records}, + "[[lid]/[port]]", query_portinfo_records}, {"SL2VLTableRecord", "SL2VL", IB_MAD_ATTR_SLVL_RECORD, "[[lid]/[in_port]/[out_port]]", query_sl2vl_records}, {"PKeyTableRecord", "PKTR", IB_MAD_ATTR_PKEY_TBL_RECORD, -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 13 12:55:40 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 13 Dec 2008 22:55:40 +0200 Subject: [ofa-general] [PATCH] infinabd-diags: convert type uint -> unsigned int Message-ID: <20081213205540.GD15622@sashak.voltaire.com> Convert uint type to unsignedint in accordance with c99 definitions. Signed-off-by: Sasha Khapyorsky --- infiniband-diags/src/ibping.c | 2 +- infiniband-diags/src/sminfo.c | 4 ++-- infiniband-diags/src/smpquery.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/infiniband-diags/src/ibping.c b/infiniband-diags/src/ibping.c index ceb6dd4..4fd2dcb 100644 --- a/infiniband-diags/src/ibping.c +++ b/infiniband-diags/src/ibping.c @@ -194,7 +194,7 @@ main(int argc, char **argv) int timeout = 0, udebug = 0, server = 0, flood = 0; int oui = IB_OPENIB_OUI; uint64_t rtt; - uint count = ~0; + unsigned count = ~0; extern int ibdebug; char *err; char *ca = 0; diff --git a/infiniband-diags/src/sminfo.c b/infiniband-diags/src/sminfo.c index fdd3071..c811057 100644 --- a/infiniband-diags/src/sminfo.c +++ b/infiniband-diags/src/sminfo.c @@ -78,7 +78,7 @@ char *statestr[] = { [SMINFO_MASTER] "SMINFO_MASTER", }; -#define STATESTR(s) (((uint)(s)) < SMINFO_STATE_LAST ? statestr[s] : "???") +#define STATESTR(s) (((unsigned)(s)) < SMINFO_STATE_LAST ? statestr[s] : "???") int main(int argc, char **argv) @@ -88,7 +88,7 @@ main(int argc, char **argv) ib_portid_t portid = {0}; int timeout = 0; /* use default */ uint8_t *p; - uint act = 0; + unsigned act = 0; int prio = 0, state = SMINFO_STANDBY; uint64_t guid = 0, key = 0; extern int ibdebug; diff --git a/infiniband-diags/src/smpquery.c b/infiniband-diags/src/smpquery.c index 0e55afb..ed8ec83 100644 --- a/infiniband-diags/src/smpquery.c +++ b/infiniband-diags/src/smpquery.c @@ -178,7 +178,7 @@ pkey_table(ib_portid_t *dest, char **argv, int argc) uint8_t data[IB_SMP_DATA_SIZE]; uint32_t i, j, k; uint16_t *p; - uint mod; + unsigned mod; int n, t, phy_ports; int portnum = 0; @@ -355,7 +355,7 @@ guid_info(ib_portid_t *dest, char **argv, int argc) uint8_t data[IB_SMP_DATA_SIZE]; uint32_t i, j, k; uint64_t *p; - uint mod; + unsigned mod; int n; /* Get the guid capacity */ -- 1.6.0.4.766.g6fc4a From tziporet at dev.mellanox.co.il Sun Dec 14 01:37:53 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Sun, 14 Dec 2008 11:37:53 +0200 Subject: [ofa-general] mlx4 support for fast_reg_mr In-Reply-To: References: <4940D38E.2000700@mellanox.co.il> Message-ID: <4944D3F1.6050900@mellanox.co.il> James Lentini wrote: > > To be clear, I'm referring to the fast_reg_mr APIs > [ib_alloc_fast_reg_mr(), ib_alloc_fast_reg_page_list(), and the > ib_send_wr's fast_reg type], not the FMR APIs > [ib_alloc_fmr(),ib_map_phys_fmr(), etc]. > FRWR was first implemented in 2.6.27, however we fixed several bugs since then and I am notsure if Roland insert them to stable version of 2.6.27 or only to 2.6.28. > Are the fast_reg_mr APIs supported with mlx4 and FW 2.5.0? > They supported in kernel as above and in OFED 1.4 Regarding FW support: Features that are enabled with FW 2.5.0 only: - Send with invalidate and Local invalidate send queue work requests. - Resize CQ support. Features that are enabled with FW 2.6.0 only: - Fast register MR send queue work requests. - Local DMA L_Key. - Raw Ethertype QP support (one QP per port) -- receive only. FW 2.6.0 is expected to be released this week. Tziporet From dorfman.eli at gmail.com Sun Dec 14 03:34:12 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Sun, 14 Dec 2008 13:34:12 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_inform.c report IB traps to plugin In-Reply-To: <20081213135051.GP15622@sashak.voltaire.com> References: <493CEBBE.2020407@gmail.com> <20081208200217.GB13924@sashak.voltaire.com> <493E3AE5.5000604@gmail.com> <20081213135051.GP15622@sashak.voltaire.com> Message-ID: <694d48600812140334i788338celccdc4db39eddba3@mail.gmail.com> >> > Did you mean to have it osm_report_notice()? Actually it is where OpenSM >> > sends notices, not where OpenSM gets traps. Trap receiver processor is >> > located in osm_trap_rcv.c. >> >> Yes that's what i meant. >> When OpenSM receives traps it calls osm_report_notice(). >> It is also call for OpenSM initiated traps (e.g. GID IN/OUT and MC CREATE/DELETE). > > Ok. I see your point. Then why it should be limited by generic notice > types? No special reason. Just at the moment we handle only generic traps. we may want to report vendor specific event with other event id > Also wouldn't it be better to call plugin report callback after > notice was actually processed (eg. at end of this function)? there is no correlation between reporting an event and whether it was already forwarded. Eli. From vlad at dev.mellanox.co.il Sun Dec 14 05:17:03 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Sun, 14 Dec 2008 15:17:03 +0200 Subject: [ofa-general] Test modules for ib_core In-Reply-To: <494114B5.7000703@ext.bull.net> References: <4940D767.8090200@ext.bull.net> <4940EBB5.6030702@mellanox.co.il> <4940FB22.9090809@ext.bull.net> <494114B5.7000703@ext.bull.net> Message-ID: <4945074F.1050500@dev.mellanox.co.il> Celine Bourde wrote: > I am trying krping on 2.6.27 kernel and OFED1.4 (10 December 2008). > Module compilation is ok, but modprobe failed. > > [root at twin krping]# modprobe rdma_krping > FATAL: Error inserting rdma_krping > (/lib/modules/2.6.27/extra/rdma_krping.ko): Unknown symbol in module, > or unknown parameter (see dmesg) > > dmesg output : > > rdma_krping: disagrees about version of symbol ib_create_cq > rdma_krping: Unknown symbol ib_create_cq > rdma_krping: disagrees about version of symbol > ib_alloc_fast_reg_page_list > rdma_krping: Unknown symbol ib_alloc_fast_reg_page_list > rdma_krping: disagrees about version of symbol rdma_resolve_addr > ... > > Any Advice ? > > Céline Bourde > > > Hi Celine, Copy /usr/src/ofa_kernel/Module.symvers to the krping directory, then run make and insmod. Regards, Vladimir From ishai at mellanox.co.il Sun Dec 14 01:29:28 2008 From: ishai at mellanox.co.il (Ishai Rabinovitz) Date: Sun, 14 Dec 2008 11:29:28 +0200 Subject: [Fwd: [ofa-general] Re: porting IB management code to Windows] In-Reply-To: <4944C949.1050108@dev.mellanox.co.il> References: <4944C949.1050108@dev.mellanox.co.il> Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD013358E7@mtlexch01.mtl.com> Adding ofw at lists.openfabrics.org to the discussion. > -------- Original Message -------- > Subject: [ofa-general] Re: porting IB management code to Windows > Date: Sat, 13 Dec 2008 22:30:14 +0200 > From: Sasha Khapyorsky > To: Sean Hefty > CC: general at lists.openfabrics.org > References: <000201c95bc5$510162a0$1e58180a at amr.corp.intel.com> > > Hi Sean, > > On 11:18 Thu 11 Dec , Sean Hefty wrote: > > > > We've started porting the IB management code (IB-diags at this point) > to > > Windows. My strong preference is to avoid branching the code and > instead keep a > > single source code tree. Is there any objection to accepting changes > against > > the management tree to allow the code to run on both Linux and > Windows? > > Basically I have no objections against porting changes. And I also > would > prefer to keep a single code base. > > However, I would prefer to minimize amount of needed changes and would > really prefer to not get a lot of limitations in using modern C. I will > comment inline in the patch example below. > > > (We can > > figure out the logistics of build related files later. I'm most > concerned about > > the code itself.) > > > > The patch below gives an example of the changes needed to make this > happen. > > Most are a result of compiler differences. > > > > - Sean > > > > --- infiniband-diags-1.4.2\src\sminfo.c 2008-10-19 11:34:42.000000000 > -0700 > > +++ scm\winof\branches\winverbs\tools\infiniband_diags\src\sminfo.c > > 2008-12-10 15:06:01.096000000 -0800 > > @@ -37,12 +37,19 @@ > > > > #include > > #include > > + > > +#if defined(_WIN32) || defined(_WIN64) > > +#include > > +#include > > +#include "..\..\..\..\etc\user\getopt.c" > > +#include "..\ibdiag_common.c" > > +#else > > #include > > #include > > #include > > #include > > +#endif > > Could such ugly header mess be eliminated? > > I'm not familiar with windows environment, but would expect that > headers > like exist there (although I may be wrong about it). Of > course some header file may be missing, this is not so bad - you could > add one somewhere under WinOF tree in the include path, then something > like: > > winof/include/path/getopt.h: > > #ifndef WINOF_GETOPT_H > #define WINOF_GETOPT_H > > #include "..\..\..\..\etc\user\getopt.c" > > #endif > > could resolve the problem. And similar with another header files (also > AFAIK WinOF is not using autotools, so file config.h could be also good > place for various wrappers). > > > -#include > > #include > > #include > > > > @@ -72,13 +79,13 @@ enum { > > }; > > > > char *statestr[] = { > > - [SMINFO_NOTACT] "SMINFO_NOTACT", > > - [SMINFO_DISCOVER] "SMINFO_DISCOVER", > > - [SMINFO_STANDBY] "SMINFO_STANDBY", > > - [SMINFO_MASTER] "SMINFO_MASTER", > > + "SMINFO_NOTACT", > > + "SMINFO_DISCOVER", > > + "SMINFO_STANDBY", > > + "SMINFO_MASTER", > > }; > > Could VC++ understand C99 like initializations (maybe with using some > flags)? I would really prefer to use something like this. > > > > > -#define STATESTR(s) (((uint)(s)) < SMINFO_STATE_LAST ? statestr[s] > : "???") > > +#define STATESTR(s) (((unsigned int)(s)) < SMINFO_STATE_LAST ? > statestr[s] : > > "???") > > > > int > > main(int argc, char **argv) > > @@ -88,7 +95,7 @@ main(int argc, char **argv) > > ib_portid_t portid = {0}; > > int timeout = 0; /* use default */ > > uint8_t *p; > > - uint act = 0; > > + unsigned int act = 0; > > All 'uint' -> 'unsigned int' conversions seem fine for me (I think we > need to do this even w/out connection to WinOF porting issue). > > > int prio = 0, state = SMINFO_STANDBY; > > uint64_t guid = 0, key = 0; > > extern int ibdebug; > > @@ -97,8 +104,8 @@ main(int argc, char **argv) > > char *ca = 0; > > int ca_port = 0; > > > > - static char const str_opts[] = "C:P:t:s:p:a:deDGVhu"; > > - static const struct option long_opts[] = { > > + static char str_opts[] = "C:P:t:s:p:a:deDGVhu"; > > + static struct option long_opts[] = { > > I saw in your another email that 'const' issue could be solved (worst > case it could be masked in WinOF config.h - #define const ). Right? > > > { "C", 1, 0, 'C'}, > > { "P", 1, 0, 'P'}, > > { "debug", 0, 0, 'd'}, > > @@ -112,7 +119,7 @@ main(int argc, char **argv) > > { "timeout", 1, 0, 't'}, > > { "help", 0, 0, 'h'}, > > { "usage", 0, 0, 'u'}, > > - { } > > + { 0 } > > Could VC be learned with some flags to understand {}? Basically we > could > except such change, but it will be hard to remember to follow this rule > on > linux side :) > > > }; > > > > argv0 = argv[0]; > > @@ -188,7 +195,7 @@ main(int argc, char **argv) > > > > if (mod) { > > if (!(p = smp_set(sminfo, &portid, IB_ATTR_SMINFO, mod, > > timeout))) > > - IBERROR("query"); > > + IBERROR("set"); > > This is fine (and guess is not related to porting issue :)) > > Sasha > > > } else > > if (!(p = smp_query(sminfo, &portid, IB_ATTR_SMINFO, 0, > > timeout))) > > IBERROR("query"); > > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From jackm at dev.mellanox.co.il Sun Dec 14 08:14:16 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 14 Dec 2008 18:14:16 +0200 Subject: [ofa-general] [PATCH] mlx4: Adjust ownership bit properly in resize_cq when copying over CQEs Message-ID: <200812141814.17091.jackm@dev.mellanox.co.il> mlx4: Adjust ownership bit properly in resize_cq when copying over CQEs During resize_cq, when copying over unpolled CQEs from the old CQE buffer to the new buffer, the ownership bit must be set appropriately for the new buffer, or the ownership bit in the new buffer gets corrupted. Signed-off-by: Jack Morgenstein --- Roland, Please queue this one up for 2.6.28. I don't think there are any kernel apps as yet which are using resize-cq, but you might consider posting this for 2.6.27 "last stable". Main user is MPI, which does resize_cq in userspace, and not dynamically (so it does not hit this bug). I'll post the equivalent patch for libmlx4. - Jack Index: infiniband/drivers/infiniband/hw/mlx4/cq.c =================================================================== --- infiniband.orig/drivers/infiniband/hw/mlx4/cq.c 2008-12-14 17:42:26.000000000 +0200 +++ infiniband/drivers/infiniband/hw/mlx4/cq.c 2008-12-14 17:47:51.000000000 +0200 @@ -325,15 +325,17 @@ static int mlx4_ib_get_outstanding_cqes( static void mlx4_ib_cq_resize_copy_cqes(struct mlx4_ib_cq *cq) { - struct mlx4_cqe *cqe; + struct mlx4_cqe *cqe, *new_cqe; int i; i = cq->mcq.cons_index; cqe = get_cqe(cq, i & cq->ibcq.cqe); while ((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) != MLX4_CQE_OPCODE_RESIZE) { - memcpy(get_cqe_from_buf(&cq->resize_buf->buf, - (i + 1) & cq->resize_buf->cqe), - get_cqe(cq, i & cq->ibcq.cqe), sizeof(struct mlx4_cqe)); + new_cqe = get_cqe_from_buf(&cq->resize_buf->buf, + (i + 1) & cq->resize_buf->cqe); + memcpy(new_cqe, get_cqe(cq, i & cq->ibcq.cqe), sizeof(struct mlx4_cqe)); + new_cqe->owner_sr_opcode = (cqe->owner_sr_opcode & ~MLX4_CQE_OWNER_MASK) | + (((i + 1) & (cq->resize_buf->cqe + 1)) ? MLX4_CQE_OWNER_MASK : 0); cqe = get_cqe(cq, ++i & cq->ibcq.cqe); } ++cq->mcq.cons_index; From jackm at dev.mellanox.co.il Sun Dec 14 08:14:20 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Sun, 14 Dec 2008 18:14:20 +0200 Subject: [ofa-general] [PATCH] libmlx4: Adjust ownership bit properly in resize_cq when copying over CQEs Message-ID: <200812141814.21076.jackm@dev.mellanox.co.il> During resize_cq, when copying over unpolled CQEs from the old CQE buffer to the new buffer, the ownership bit must be set appropriately for the new buffer, or the ownership bit in the new buffer gets corrupted. Signed-off-by: Jack Morgenstein Index: libmlx4/src/cq.c =================================================================== --- libmlx4.orig/src/cq.c 2008-11-20 11:46:58.000000000 +0200 +++ libmlx4/src/cq.c 2008-12-14 18:10:41.000000000 +0200 @@ -455,6 +455,8 @@ void mlx4_cq_resize_copy_cqes(struct mlx cqe = get_cqe(cq, (i & old_cqe)); while ((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) != MLX4_CQE_OPCODE_RESIZE) { + cqe->owner_sr_opcode = (cqe->owner_sr_opcode & ~MLX4_CQE_OWNER_MASK) | + (((i + 1) & (cq->ibv_cq.cqe + 1)) ? MLX4_CQE_OWNER_MASK : 0); memcpy(buf + ((i + 1) & cq->ibv_cq.cqe) * MLX4_CQ_ENTRY_SIZE, cqe, MLX4_CQ_ENTRY_SIZE); ++i; From monis at Voltaire.COM Mon Dec 15 01:37:38 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Mon, 15 Dec 2008 11:37:38 +0200 Subject: [ofa-general] Resending patches to 2.6.29 Message-ID: <49462562.9050201@Voltaire.COM> Hi Roland I have some patches which you haven't comment yet. Since 2.6.29 is coming I think it's a good time to resend them. I will send them separately in reply to this. thanks From monis at Voltaire.COM Mon Dec 15 01:43:57 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Mon, 15 Dec 2008 11:43:57 +0200 Subject: [ofa-general] [PATCH] mlx4_ib: Fix dispatch of IB_EVENT_LID_CHANGE In-Reply-To: <49462562.9050201@Voltaire.COM> References: <49462562.9050201@Voltaire.COM> Message-ID: <494626DD.90804@Voltaire.COM> When snooping a portinfo MAD, it's client_reregister bit is checked. If the bit is ON then a CLIENT_REREGISTER event is dispatched, otherwise a LID_CHANGE event is dispatched. This way of decision ignores the cases where the MAD changes the LID along with an instruction to reregister (so a necessary LID_CHANGE event won't be dispatched) or the MAD is neither of these (and an unnecessary LID_CHANGE event will be dispatched). This patch dispatches an event if the client_reregister bit is set. In addition, the patch compares the LID in the MAD to the current LID. If and only if they are not identical then a LID_CHANGE event is dispatched. From: Moni Shoua Signed-off-by: Moni Shoua Signed-off-by: Jack Morgenstein Signed-off-by: Yossi Etigin -- Index: ofed_kernel/drivers/infiniband/hw/mlx4/mad.c =================================================================== --- ofed_kernel.orig/drivers/infiniband/hw/mlx4/mad.c 2008-11-30 08:34:47.470355000 +0200 +++ ofed_kernel/drivers/infiniband/hw/mlx4/mad.c 2008-11-30 08:55:50.691654000 +0200 @@ -147,7 +147,8 @@ static void update_sm_ah(struct mlx4_ib_ * Snoop SM MADs for port info and P_Key table sets, so we can * synthesize LID change and P_Key change events. */ -static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad) +static void smp_snoop(struct ib_device *ibdev, u8 port_num, struct ib_mad *mad, + u16 prev_lid) { struct ib_event event; @@ -157,6 +158,7 @@ static void smp_snoop(struct ib_device * if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { struct ib_port_info *pinfo = (struct ib_port_info *) ((struct ib_smp *) mad)->data; + u16 lid = be16_to_cpu(pinfo->lid); update_sm_ah(to_mdev(ibdev), port_num, be16_to_cpu(pinfo->sm_lid), @@ -165,12 +167,15 @@ static void smp_snoop(struct ib_device * event.device = ibdev; event.element.port_num = port_num; - if (pinfo->clientrereg_resv_subnetto & 0x80) + if (pinfo->clientrereg_resv_subnetto & 0x80) { event.event = IB_EVENT_CLIENT_REREGISTER; - else + ib_dispatch_event(&event); + } + if (prev_lid != lid) { event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); + } - ib_dispatch_event(&event); } if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { @@ -228,8 +233,9 @@ int mlx4_ib_process_mad(struct ib_device struct ib_wc *in_wc, struct ib_grh *in_grh, struct ib_mad *in_mad, struct ib_mad *out_mad) { - u16 slid; + u16 slid, prev_lid = 0; int err; + struct ib_port_attr pattr; slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); @@ -263,6 +269,13 @@ int mlx4_ib_process_mad(struct ib_device } else return IB_MAD_RESULT_SUCCESS; + if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || + in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && + in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && + in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO && + !ib_query_port(ibdev, port_num, &pattr)) + prev_lid = pattr.lid; + err = mlx4_MAD_IFC(to_mdev(ibdev), mad_flags & IB_MAD_IGNORE_MKEY, mad_flags & IB_MAD_IGNORE_BKEY, @@ -271,7 +284,7 @@ int mlx4_ib_process_mad(struct ib_device return IB_MAD_RESULT_FAILURE; if (!out_mad->mad_hdr.status) { - smp_snoop(ibdev, port_num, in_mad); + smp_snoop(ibdev, port_num, in_mad, prev_lid); node_desc_override(ibdev, out_mad); } From monis at Voltaire.COM Mon Dec 15 01:46:15 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Mon, 15 Dec 2008 11:46:15 +0200 Subject: [ofa-general] [PATCH] ib_mthca: Fix dispatch of IB_EVENT_LID_CHANGE In-Reply-To: <49462562.9050201@Voltaire.COM> References: <49462562.9050201@Voltaire.COM> Message-ID: <49462767.1020809@Voltaire.COM> When snooping a portinfo MAD, it's client_reregister bit is checked. If the bit is ON then a CLIENT_REREGISTER event is dispatched, otherwise a LID_CHANGE event is dispatched. This way of decision ignores the cases where the MAD changes the LID along with an instruction to reregister (so a necessary LID_CHANGE event won't be dispatched) or the MAD is neither of these (and an unnecessary LID_CHANGE event will be dispatched). This patch dispatches an event if the client_reregister bit is set. In addition, the patch compares the LID in the MAD to the current LID. If and only if they are not identical then a LID_CHANGE event is dispatched. From: Moni Shoua Signed-off-by: Moni Shoua Signed-off-by: Jack Morgenstein Signed-off-by: Yossi Etigin -- Index: ofed_kernel/drivers/infiniband/hw/mthca/mthca_mad.c =================================================================== --- ofed_kernel.orig/drivers/infiniband/hw/mthca/mthca_mad.c 2008-11-30 08:54:54.853708000 +0200 +++ ofed_kernel/drivers/infiniband/hw/mthca/mthca_mad.c 2008-11-30 08:59:54.702643000 +0200 @@ -104,7 +104,8 @@ static void update_sm_ah(struct mthca_de */ static void smp_snoop(struct ib_device *ibdev, u8 port_num, - struct ib_mad *mad) + struct ib_mad *mad, + u16 prev_lid) { struct ib_event event; @@ -114,6 +115,7 @@ static void smp_snoop(struct ib_device * if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO) { struct ib_port_info *pinfo = (struct ib_port_info *) ((struct ib_smp *) mad)->data; + u16 lid = be16_to_cpu(pinfo->lid); mthca_update_rate(to_mdev(ibdev), port_num); update_sm_ah(to_mdev(ibdev), port_num, @@ -123,12 +125,15 @@ static void smp_snoop(struct ib_device * event.device = ibdev; event.element.port_num = port_num; - if (pinfo->clientrereg_resv_subnetto & 0x80) + if (pinfo->clientrereg_resv_subnetto & 0x80) { event.event = IB_EVENT_CLIENT_REREGISTER; - else + ib_dispatch_event(&event); + } + if (prev_lid != lid) { event.event = IB_EVENT_LID_CHANGE; + ib_dispatch_event(&event); + } - ib_dispatch_event(&event); } if (mad->mad_hdr.attr_id == IB_SMP_ATTR_PKEY_TABLE) { @@ -196,6 +201,8 @@ int mthca_process_mad(struct ib_device * int err; u8 status; u16 slid = in_wc ? in_wc->slid : be16_to_cpu(IB_LID_PERMISSIVE); + u16 prev_lid = 0; + struct ib_port_attr pattr; /* Forward locally generated traps to the SM */ if (in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP && @@ -233,6 +240,12 @@ int mthca_process_mad(struct ib_device * return IB_MAD_RESULT_SUCCESS; } else return IB_MAD_RESULT_SUCCESS; + if ((in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || + in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && + in_mad->mad_hdr.method == IB_MGMT_METHOD_SET && + in_mad->mad_hdr.attr_id == IB_SMP_ATTR_PORT_INFO && + !ib_query_port(ibdev, port_num, &pattr)) + prev_lid = pattr.lid; err = mthca_MAD_IFC(to_mdev(ibdev), mad_flags & IB_MAD_IGNORE_MKEY, @@ -252,7 +265,7 @@ int mthca_process_mad(struct ib_device * } if (!out_mad->mad_hdr.status) { - smp_snoop(ibdev, port_num, in_mad); + smp_snoop(ibdev, port_num, in_mad, prev_lid); node_desc_override(ibdev, out_mad); } From monis at Voltaire.COM Mon Dec 15 01:48:47 2008 From: monis at Voltaire.COM (Moni Shoua) Date: Mon, 15 Dec 2008 11:48:47 +0200 Subject: [ofa-general] [PATCH] IPoIB: refresh paths that migh be invalid In-Reply-To: <49462562.9050201@Voltaire.COM> References: <49462562.9050201@Voltaire.COM> Message-ID: <494627FF.1090703@Voltaire.COM> If a standby SM takes over and if only some of the nodes change their LID as a result, the other nodes get an IPOIB_FLUSH_LIGHT event on the that doesn't cause flushing of paths but only marks them as probably invalid. Path refresh will happen only after an ARP probe which may take some time (tens of seconds). This patch adds a task that is responsible to restart the lookup of possibly invalid paths in 2 occasions: handling IPOIB_FLUSH_LIGHT event and when path completion returned with bad status. Signed-off-by: Moni Shoua --- drivers/infiniband/ulp/ipoib/ipoib.h | 6 +++- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 2 - drivers/infiniband/ulp/ipoib/ipoib_main.c | 37 +++++++++++++++++++++++++----- 3 files changed, 37 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index e0c7dfa..98564c3 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -298,6 +298,7 @@ struct ipoib_dev_priv { struct work_struct flush_heavy; struct work_struct restart_task; struct delayed_work ah_reap_task; + struct delayed_work path_refresh_task; struct ib_device *ca; u8 port; @@ -378,7 +379,7 @@ struct ipoib_path { struct rb_node rb_node; struct list_head list; - int valid; + u8 stale; }; struct ipoib_neigh { @@ -442,8 +443,9 @@ int ipoib_add_umcast_attr(struct net_device *dev); void ipoib_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_ah *address, u32 qpn); void ipoib_reap_ah(struct work_struct *work); +void ipoib_refresh_paths(struct work_struct *work); -void ipoib_mark_paths_invalid(struct net_device *dev); +void ipoib_mark_paths_stale(struct net_device *dev); void ipoib_flush_paths(struct net_device *dev); struct ipoib_dev_priv *ipoib_intf_alloc(const char *format); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 28eb6f0..ff52314 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -962,7 +962,7 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, } if (level == IPOIB_FLUSH_LIGHT) { - ipoib_mark_paths_invalid(dev); + ipoib_mark_paths_stale(dev); ipoib_mcast_dev_flush(dev); } diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 85257f6..c9b5890 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -352,7 +352,7 @@ void ipoib_path_iter_read(struct ipoib_path_iter *iter, #endif /* CONFIG_INFINIBAND_IPOIB_DEBUG */ -void ipoib_mark_paths_invalid(struct net_device *dev) +void ipoib_mark_paths_stale(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path, *tp; @@ -360,12 +360,15 @@ void ipoib_mark_paths_invalid(struct net_device *dev) spin_lock_irq(&priv->lock); list_for_each_entry_safe(path, tp, &priv->path_list, list) { - ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " invalid\n", + ipoib_dbg(priv, "mark path LID 0x%04x GID " IPOIB_GID_FMT " stale\n", be16_to_cpu(path->pathrec.dlid), IPOIB_GID_ARG(path->pathrec.dgid)); - path->valid = 0; + path->stale = 1; } + if (!list_empty(&priv->path_list)) + queue_delayed_work(ipoib_workqueue, &priv->path_refresh_task, + round_jiffies_relative(HZ)); spin_unlock_irq(&priv->lock); } @@ -427,6 +430,10 @@ static void path_rec_completion(int status, if (!ib_init_ah_from_path(priv->ca, priv->port, pathrec, &av)) ah = ipoib_create_ah(dev, priv->pd, &av); + } else { + path->stale = 1; + queue_delayed_work(ipoib_workqueue, &priv->path_refresh_task, + round_jiffies_relative(HZ)); } spin_lock_irqsave(&priv->lock, flags); @@ -477,7 +484,6 @@ static void path_rec_completion(int status, while ((skb = __skb_dequeue(&neigh->queue))) __skb_queue_tail(&skqueue, skb); } - path->valid = 1; } path->query = NULL; @@ -551,9 +557,29 @@ static int path_rec_start(struct net_device *dev, return path->query_id; } + path->stale = 0; return 0; } +void ipoib_refresh_paths(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, path_refresh_task.work); + struct net_device *dev = priv->dev; + struct ipoib_path *path, *tp; + + spin_lock_irq(&priv->lock); + list_for_each_entry_safe(path, tp, &priv->path_list, list) { + ipoib_dbg(priv, "restart path LID 0x%04x GID " IPOIB_GID_FMT "\n", + be16_to_cpu(path->pathrec.dlid), + IPOIB_GID_ARG(path->pathrec.dgid)); + if (path->stale) + path_rec_start(dev, path); + } + + spin_unlock_irq(&priv->lock); +} + static void neigh_add_path(struct sk_buff *skb, struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); @@ -656,7 +682,7 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, spin_lock_irqsave(&priv->lock, flags); path = __path_find(dev, phdr->hwaddr + 4); - if (!path || !path->valid) { + if (!path) { if (!path) path = path_rec_create(dev, phdr->hwaddr + 4); if (path) { @@ -1071,6 +1097,7 @@ static void ipoib_setup(struct net_device *dev) INIT_WORK(&priv->flush_heavy, ipoib_ib_dev_flush_heavy); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); + INIT_DELAYED_WORK(&priv->path_refresh_task, ipoib_refresh_paths); } struct ipoib_dev_priv *ipoib_intf_alloc(const char *name) _______________________________________________ general mailing list general at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Mon Dec 15 01:49:27 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 15 Dec 2008 11:49:27 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_inform.c report IB traps to plugin In-Reply-To: <694d48600812140334i788338celccdc4db39eddba3@mail.gmail.com> References: <493CEBBE.2020407@gmail.com> <20081208200217.GB13924@sashak.voltaire.com> <493E3AE5.5000604@gmail.com> <20081213135051.GP15622@sashak.voltaire.com> <694d48600812140334i788338celccdc4db39eddba3@mail.gmail.com> Message-ID: <20081215094927.GA22030@sashak.voltaire.com> On 13:34 Sun 14 Dec , Eli Dorfman wrote: > >> > Did you mean to have it osm_report_notice()? Actually it is where OpenSM > >> > sends notices, not where OpenSM gets traps. Trap receiver processor is > >> > located in osm_trap_rcv.c. > >> > >> Yes that's what i meant. > >> When OpenSM receives traps it calls osm_report_notice(). > >> It is also call for OpenSM initiated traps (e.g. GID IN/OUT and MC CREATE/DELETE). > > > > Ok. I see your point. Then why it should be limited by generic notice > > types? > > No special reason. Just at the moment we handle only generic traps. > we may want to report vendor specific event with other event id OpenSM event plugin is generic API, I think we should report any trap. > > Also wouldn't it be better to call plugin report callback after > > notice was actually processed (eg. at end of this function)? > > there is no correlation between reporting an event and whether it was > already forwarded. This is maybe true from plugin perspective, but reporting an event could (at least potentially) slow down core functionality - trap sending. After all I think it would be better to report OSM_EVENT_ID_TRAP unconditionally and to make this at end of osm_report_notice() function. Agreed? Sasha From jackm at dev.mellanox.co.il Mon Dec 15 03:12:48 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 15 Dec 2008 13:12:48 +0200 Subject: [ofa-general] [PATCH 1 of 2 for 2.6.28] core: Fix Raw Ethertype QP support Message-ID: <200812151312.48632.jackm@dev.mellanox.co.il> From: Igor Yarovinsky core: Fix Raw Ethertype QP support. Fix Raw Ethertype qp support: a. Need a new struct in the wr union in struct ib_send_wr b. Need new helper pack and unpack functions in ud_header.c Signed-off-by: Igor Yarovinsky Signed-off-by: Jack Morgenstein --- This is a repost of http://lists.openfabrics.org/pipermail/general/2008-August/053642.html Raw Ethertype packets on the wire contain only LRH, Raw Header (Ethernet Type), payload, and VCRC. Index: infiniband/include/rdma/ib_verbs.h =================================================================== --- infiniband.orig/include/rdma/ib_verbs.h 2008-08-11 10:37:01.000000000 +0300 +++ infiniband/include/rdma/ib_verbs.h 2008-08-12 16:29:49.000000000 +0300 @@ -752,6 +752,11 @@ struct ib_send_wr { int access_flags; u32 rkey; } fast_reg; + struct { + struct ib_unpacked_lrh *lrh; + u32 eth_type; + u8 static_rate; + } raw_ety; } wr; }; Index: infiniband/drivers/infiniband/core/ud_header.c =================================================================== --- infiniband.orig/drivers/infiniband/core/ud_header.c 2008-07-28 18:20:11.000000000 +0300 +++ infiniband/drivers/infiniband/core/ud_header.c 2008-08-12 10:55:10.000000000 +0300 @@ -241,6 +241,36 @@ void ib_ud_header_init(int pay EXPORT_SYMBOL(ib_ud_header_init); /** + * ib_lrh_header_pack - Pack LRH header struct into wire format + * @lrh:unpacked LRH header struct + * @buf:Buffer to pack into + * + * ib_lrh_header_pack() packs the LRH header structure @lrh into + * wire format in the buffer @buf. + */ +int ib_lrh_header_pack(struct ib_unpacked_lrh *lrh, void *buf) +{ + ib_pack(lrh_table, ARRAY_SIZE(lrh_table), lrh, buf); + return 0; +} +EXPORT_SYMBOL(ib_lrh_header_pack); + +/** + * ib_lrh_header_unpack - Unpack LRH structure from wire format + * @lrh:unpacked LRH header struct + * @buf:Buffer to pack into + * + * ib_lrh_header_unpack() unpacks the LRH header structure from + * wire format (in buf) into @lrh. + */ +int ib_lrh_header_unpack(void *buf, struct ib_unpacked_lrh *lrh) +{ + ib_unpack(lrh_table, ARRAY_SIZE(lrh_table), buf, lrh); + return 0; +} +EXPORT_SYMBOL(ib_lrh_header_unpack); + +/** * ib_ud_header_pack - Pack UD header struct into wire format * @header:UD header struct * @buf:Buffer to pack into Index: infiniband/include/rdma/ib_pack.h =================================================================== --- infiniband.orig/include/rdma/ib_pack.h 2008-07-28 18:20:15.000000000 +0300 +++ infiniband/include/rdma/ib_pack.h 2008-08-12 10:55:10.000000000 +0300 @@ -240,4 +240,7 @@ int ib_ud_header_pack(struct ib_ud_heade int ib_ud_header_unpack(void *buf, struct ib_ud_header *header); +int ib_lrh_header_pack(struct ib_unpacked_lrh *lrh, void *buf); +int ib_lrh_header_unpack(void *buf, struct ib_unpacked_lrh *lrh); + #endif /* IB_PACK_H */ From jackm at dev.mellanox.co.il Mon Dec 15 03:12:45 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 15 Dec 2008 13:12:45 +0200 Subject: [ofa-general] [PATCH 0 of 2 for 2.6.28] Raw Ethertype QP support Message-ID: <200812151312.45290.jackm@dev.mellanox.co.il> This is a re-post of http://lists.openfabrics.org/pipermail/general/2008-August/053641.html This patch set fixes the Raw Ethertype QP implementation in the infiniband core, and implements Raw Ethertype QP support in the mlx4 ib driver. The Raw Ethertype packet is described in the IB Spec Rev 1.2.1, Section 5.3. Raw Datagrams are described in section 9.8.4. Note that the Raw QP types are included under the special QP types (described in section 10.2.4.5). The fields required for sending a raw datagram are given in section 11.4.1.1 (POST SEND REQUEST), and in Section 10.2.10 (INFINIBAND HEADER DATA AND SOURCES), Table 64. - Jack From jackm at dev.mellanox.co.il Mon Dec 15 03:12:53 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 15 Dec 2008 13:12:53 +0200 Subject: [ofa-general] [PATCH 2 of 2 for 2.6.28] mlx4: Add Raw Ethertype QP support Message-ID: <200812151312.53603.jackm@dev.mellanox.co.il> From: Igor Yarovinsky mlx4: Add Raw Ethertype QP support. This implementation supports one Raw Ethertype QP per port. Signed-off-by: Igor Yarovinsky Signed-off-by: Jack Morgenstein --- This is a repost of http://lists.openfabrics.org/pipermail/general/2008-August/053643.html Raw Ethertype is implemented similarly to MADs. When posting sends, the LRH and RWH headers are added as a single 16-byte inline segment. Index: infiniband/drivers/infiniband/hw/mlx4/qp.c =================================================================== --- infiniband.orig/drivers/infiniband/hw/mlx4/qp.c 2008-08-12 16:28:56.000000000 +0300 +++ infiniband/drivers/infiniband/hw/mlx4/qp.c 2008-08-12 16:30:22.000000000 +0300 @@ -54,7 +54,8 @@ enum { /* * Largest possible UD header: send with GRH and immediate data. */ - MLX4_IB_UD_HEADER_SIZE = 72 + MLX4_IB_UD_HEADER_SIZE = 72, + MLX4_IB_MAX_RAW_ETY_HDR_SIZE = 12 }; struct mlx4_ib_sqp { @@ -280,6 +281,12 @@ static int send_wqe_overhead(enum ib_qp_ ALIGN(4 + sizeof (struct mlx4_wqe_inline_seg), sizeof (struct mlx4_wqe_data_seg)); + case IB_QPT_RAW_ETY: + return sizeof(struct mlx4_wqe_ctrl_seg) + + ALIGN(MLX4_IB_MAX_RAW_ETY_HDR_SIZE + + sizeof(struct mlx4_wqe_inline_seg), + sizeof(struct mlx4_wqe_data_seg)); + default: return sizeof (struct mlx4_wqe_ctrl_seg); } @@ -335,6 +342,10 @@ static int set_kernel_sq_size(struct mlx cap->max_send_sge + 2 > dev->dev->caps.max_sq_sg) return -EINVAL; + if (type == IB_QPT_RAW_ETY && + cap->max_send_sge + 1 > dev->dev->caps.max_sq_sg) + return -EINVAL; + s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg), cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) + send_wqe_overhead(type, qp->flags); @@ -375,7 +386,7 @@ static int set_kernel_sq_size(struct mlx */ if (dev->dev->caps.fw_ver >= MLX4_FW_VER_WQE_CTRL_NEC && qp->sq_signal_bits && BITS_PER_LONG == 64 && - type != IB_QPT_SMI && type != IB_QPT_GSI) + type != IB_QPT_SMI && type != IB_QPT_GSI && type != IB_QPT_RAW_ETY) qp->sq.wqe_shift = ilog2(64); else qp->sq.wqe_shift = ilog2(roundup_pow_of_two(s)); @@ -711,6 +722,9 @@ struct ib_qp *mlx4_ib_create_qp(struct i break; } + case IB_QPT_RAW_ETY: + if (!(dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_RAW_ETY)) + return ERR_PTR(-ENOSYS); case IB_QPT_SMI: case IB_QPT_GSI: { @@ -726,7 +740,8 @@ struct ib_qp *mlx4_ib_create_qp(struct i err = create_qp_common(dev, pd, init_attr, udata, dev->dev->caps.sqp_start + - (init_attr->qp_type == IB_QPT_SMI ? 0 : 2) + + (init_attr->qp_type == IB_QPT_RAW_ETY ? 4 : + (init_attr->qp_type == IB_QPT_SMI ? 0 : 2)) + init_attr->port_num - 1, qp); if (err) { @@ -740,7 +755,6 @@ struct ib_qp *mlx4_ib_create_qp(struct i break; } default: - /* Don't support raw QPs */ return ERR_PTR(-EINVAL); } @@ -771,6 +785,7 @@ static int to_mlx4_st(enum ib_qp_type ty case IB_QPT_RC: return MLX4_QP_ST_RC; case IB_QPT_UC: return MLX4_QP_ST_UC; case IB_QPT_UD: return MLX4_QP_ST_UD; + case IB_QPT_RAW_ETY: case IB_QPT_SMI: case IB_QPT_GSI: return MLX4_QP_ST_MLX; default: return -1; @@ -895,7 +910,8 @@ static int __mlx4_ib_modify_qp(struct ib } } - if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI) + if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI || + ibqp->qp_type == IB_QPT_RAW_ETY) context->mtu_msgmax = (IB_MTU_4096 << 5) | 11; else if (ibqp->qp_type == IB_QPT_UD) { if (qp->flags & MLX4_IB_QP_LSO) @@ -1044,7 +1060,7 @@ static int __mlx4_ib_modify_qp(struct ib if (cur_state == IB_QPS_INIT && new_state == IB_QPS_RTR && (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI || - ibqp->qp_type == IB_QPT_UD)) { + ibqp->qp_type == IB_QPT_UD || ibqp->qp_type == IB_QPT_RAW_ETY)) { context->pri_path.sched_queue = (qp->port - 1) << 6; if (is_qp0(dev, qp)) context->pri_path.sched_queue |= MLX4_IB_DEFAULT_QP0_SCHED_QUEUE; @@ -1186,6 +1202,49 @@ out: return err; } +static int build_raw_ety_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, + void *wqe, unsigned *mlx_seg_len) +{ + int payload = 0; + int header_size, packet_length; + struct mlx4_wqe_mlx_seg *mlx = wqe; + struct mlx4_wqe_inline_seg *inl = wqe + sizeof *mlx; + u32 *lrh = wqe + sizeof *mlx + sizeof *inl; + int i; + + /* Only IB_WR_SEND is supported */ + if (wr->opcode != IB_WR_SEND) + return -EINVAL; + + for (i = 0; i < wr->num_sge; ++i) + payload += wr->sg_list[i].length; + + header_size = IB_LRH_BYTES + 4; /* LRH + RAW_HEADER (32 bits) */ + + /* headers + payload and round up */ + packet_length = (header_size + payload + 3) / 4; + + mlx->flags &= cpu_to_be32(MLX4_WQE_CTRL_CQ_UPDATE); + + mlx->flags |= cpu_to_be32(MLX4_WQE_MLX_ICRC | + (wr->wr.raw_ety.lrh->service_level << 8)); + + mlx->rlid = wr->wr.raw_ety.lrh->destination_lid; + + wr->wr.raw_ety.lrh->packet_length = cpu_to_be16(packet_length); + + ib_lrh_header_pack(wr->wr.raw_ety.lrh, lrh); + lrh += IB_LRH_BYTES / 4; /* LRH size is a dword multiple */ + *lrh = cpu_to_be32(wr->wr.raw_ety.eth_type); + + inl->byte_count = cpu_to_be32(1 << 31 | header_size); + + *mlx_seg_len = + ALIGN(sizeof(struct mlx4_wqe_inline_seg) + header_size, 16); + + return 0; +} + static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr, void *wqe, unsigned *mlx_seg_len) { @@ -1601,6 +1660,17 @@ int mlx4_ib_post_send(struct ib_qp *ibqp size += seglen / 16; break; + case IB_QPT_RAW_ETY: + err = build_raw_ety_header(to_msqp(qp), wr, ctrl, + &seglen); + if (unlikely(err)) { + *bad_wr = wr; + goto out; + } + wqe += seglen; + size += seglen / 16; + break; + default: break; } Index: infiniband/drivers/net/mlx4/qp.c =================================================================== --- infiniband.orig/drivers/net/mlx4/qp.c 2008-08-12 16:28:56.000000000 +0300 +++ infiniband/drivers/net/mlx4/qp.c 2008-08-12 16:30:22.000000000 +0300 @@ -247,8 +247,9 @@ EXPORT_SYMBOL_GPL(mlx4_qp_free); static int mlx4_CONF_SPECIAL_QP(struct mlx4_dev *dev, u32 base_qpn) { - return mlx4_cmd(dev, 0, base_qpn, 0, MLX4_CMD_CONF_SPECIAL_QP, - MLX4_CMD_TIME_CLASS_B); + return mlx4_cmd(dev, 0, base_qpn, + (dev->caps.flags & MLX4_DEV_CAP_FLAG_RAW_ETY) ? 4 : 0, + MLX4_CMD_CONF_SPECIAL_QP, MLX4_CMD_TIME_CLASS_B); } int mlx4_init_qp_table(struct mlx4_dev *dev) Index: infiniband/include/linux/mlx4/device.h =================================================================== --- infiniband.orig/include/linux/mlx4/device.h 2008-08-12 16:28:56.000000000 +0300 +++ infiniband/include/linux/mlx4/device.h 2008-08-12 16:30:22.000000000 +0300 @@ -60,6 +60,7 @@ enum { MLX4_DEV_CAP_FLAG_IPOIB_CSUM = 1 << 7, MLX4_DEV_CAP_FLAG_BAD_PKEY_CNTR = 1 << 8, MLX4_DEV_CAP_FLAG_BAD_QKEY_CNTR = 1 << 9, + MLX4_DEV_CAP_FLAG_RAW_ETY = 1 << 13, MLX4_DEV_CAP_FLAG_MEM_WINDOW = 1 << 16, MLX4_DEV_CAP_FLAG_APM = 1 << 17, MLX4_DEV_CAP_FLAG_ATOMIC = 1 << 18, Index: infiniband/include/linux/mlx4/qp.h =================================================================== --- infiniband.orig/include/linux/mlx4/qp.h 2008-08-12 16:28:56.000000000 +0300 +++ infiniband/include/linux/mlx4/qp.h 2008-08-12 16:30:22.000000000 +0300 @@ -191,7 +191,8 @@ struct mlx4_wqe_ctrl_seg { enum { MLX4_WQE_MLX_VL15 = 1 << 17, - MLX4_WQE_MLX_SLR = 1 << 16 + MLX4_WQE_MLX_SLR = 1 << 16, + MLX4_WQE_MLX_ICRC = 1 << 4 }; struct mlx4_wqe_mlx_seg { Index: infiniband/drivers/infiniband/hw/mlx4/main.c =================================================================== --- infiniband.orig/drivers/infiniband/hw/mlx4/main.c 2008-08-12 16:28:56.000000000 +0300 +++ infiniband/drivers/infiniband/hw/mlx4/main.c 2008-08-12 16:30:22.000000000 +0300 @@ -111,6 +111,8 @@ static int mlx4_ib_query_device(struct i (dev->dev->caps.bmme_flags & MLX4_BMME_FLAG_REMOTE_INV) && (dev->dev->caps.bmme_flags & MLX4_BMME_FLAG_FAST_REG_WR)) props->device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS; + if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_RAW_ETY) + props->max_raw_ethy_qp = dev->ib_dev.phys_port_cnt; props->vendor_id = be32_to_cpup((__be32 *) (out_mad->data + 36)) & 0xffffff; From vlad at lists.openfabrics.org Mon Dec 15 03:20:42 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 15 Dec 2008 03:20:42 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081215-0200 daily build status Message-ID: <20081215112042.BBAC6E60F0D@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From sashak at voltaire.com Mon Dec 15 07:17:53 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 15 Dec 2008 17:17:53 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: Message-ID: <20081215151753.GA22506@sashak.voltaire.com> Hi Mike, On 12:31 Wed 10 Dec , Mike Heinz wrote: > While experimenting with the APIs in opensm/libvendor, I was unable to > get the path record queries to work. Reviewing the error logs from the > SM, I discovered that the APIs were not setting the required num_path > field. Actually this part of spec is not 100% clear for me - the only thing I can see is that in table 207 (p.915 - PathRecord) is that SGID and NumbPath parameters are marked as "required for GetTable request". This leave me with some questions: - Could SLID be used in GetTable request instead of SGID (as implemented now in opensm/libvendor)? Maybe not, but then I would expect some explicit mention about this. If yes, what is the meaning of NumbPath then? - What is the reason for such limitation. Do you or anybody on the list could clarify this? > Here's the fix: About the patch. White spaces are mangled. > > --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 > +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 > @@ -615,7 +615,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) (p_query_req-> > @@ -625,6 +625,7 @@ > ((osmv_guid_pair_t *) (p_query_req-> > > p_query_input))-> > src_guid); > + path_rec.num_path = 1; Why should this be '1'? Sasha > break; > > case OSMV_QUERY_PATH_REC_BY_GIDS: > @@ -634,7 +635,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> > @@ -642,6 +643,7 @@ > memcpy(&path_rec.sgid, > &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> > src_gid, sizeof(ib_gid_t)); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_LIDS: > @@ -652,13 +654,14 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); > + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > path_rec.dlid = > ((osmv_lid_pair_t *) (p_query_req->p_query_input))-> > dest_lid; > path_rec.slid = > ((osmv_lid_pair_t *) > (p_query_req->p_query_input))->src_lid; > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_UD_MULTICAST_SET: > --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 > +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 > @@ -743,7 +743,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) (p_query_req-> > @@ -753,6 +753,7 @@ > ((osmv_guid_pair_t *) (p_query_req-> > > p_query_input))-> > src_guid); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_GIDS: > @@ -763,7 +764,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> > @@ -771,6 +772,7 @@ > memcpy(&path_rec.sgid, > &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> > src_gid, sizeof(ib_gid_t)); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_LIDS: > @@ -789,6 +791,7 @@ > dest_lid; > path_rec.slid = > ((osmv_lid_pair_t *) > (p_query_req->p_query_input))->src_lid; > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_UD_MULTICAST_SET: > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From celine.bourde at ext.bull.net Mon Dec 15 07:22:33 2008 From: celine.bourde at ext.bull.net (Celine Bourde) Date: Mon, 15 Dec 2008 16:22:33 +0100 Subject: [ofa-general] Test modules for ib_core In-Reply-To: <4945074F.1050500@dev.mellanox.co.il> References: <4940D767.8090200@ext.bull.net> <4940EBB5.6030702@mellanox.co.il> <4940FB22.9090809@ext.bull.net> <494114B5.7000703@ext.bull.net> <4945074F.1050500@dev.mellanox.co.il> Message-ID: <49467639.3090405@ext.bull.net> Vladimir Sokolovsky wrote: > Celine Bourde wrote: >> I am trying krping on 2.6.27 kernel and OFED1.4 (10 December 2008). >> Module compilation is ok, but modprobe failed. >> >> [root at twin krping]# modprobe rdma_krping >> FATAL: Error inserting rdma_krping >> (/lib/modules/2.6.27/extra/rdma_krping.ko): Unknown symbol in module, >> or unknown parameter (see dmesg) >> >> dmesg output : >> >> rdma_krping: disagrees about version of symbol ib_create_cq >> rdma_krping: Unknown symbol ib_create_cq >> rdma_krping: disagrees about version of symbol >> ib_alloc_fast_reg_page_list >> rdma_krping: Unknown symbol ib_alloc_fast_reg_page_list >> rdma_krping: disagrees about version of symbol rdma_resolve_addr >> ... >> >> Any Advice ? >> >> Céline Bourde >> >> >> > Hi Celine, > Copy /usr/src/ofa_kernel/Module.symvers to the krping directory, then > run make and insmod. > > Regards, > Vladimir > It works. Thanks, Céline Bourde. > > From michael.heinz at qlogic.com Mon Dec 15 07:29:29 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Mon, 15 Dec 2008 09:29:29 -0600 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081215151753.GA22506@sashak.voltaire.com> References: <20081215151753.GA22506@sashak.voltaire.com> Message-ID: Sasha, That's a good question - and I'm going to ask around and double check. My first reaction was that you have to specify how many paths you want from the query - but you're right, the spec doesn't say that. I'm going to do some research on my end. Are you saying that IB_MAD_ATTR_PATH_RECORD should only ever return a single path? -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Sasha Khapyorsky [mailto:sashak at voltaire.com] Sent: Monday, December 15, 2008 10:18 AM To: Mike Heinz Cc: general at lists.openfabrics.org; John Russo; Hal Rosenstock Subject: Re: [ofa-general] Bugs in opensm/libvendor Hi Mike, On 12:31 Wed 10 Dec , Mike Heinz wrote: > While experimenting with the APIs in opensm/libvendor, I was unable to > get the path record queries to work. Reviewing the error logs from the > SM, I discovered that the APIs were not setting the required num_path > field. Actually this part of spec is not 100% clear for me - the only thing I can see is that in table 207 (p.915 - PathRecord) is that SGID and NumbPath parameters are marked as "required for GetTable request". This leave me with some questions: - Could SLID be used in GetTable request instead of SGID (as implemented now in opensm/libvendor)? Maybe not, but then I would expect some explicit mention about this. If yes, what is the meaning of NumbPath then? - What is the reason for such limitation. Do you or anybody on the list could clarify this? > Here's the fix: About the patch. White spaces are mangled. > > --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 > +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 > @@ -615,7 +615,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) > (p_query_req-> @@ -625,6 +625,7 @@ > ((osmv_guid_pair_t *) > (p_query_req-> > > p_query_input))-> > src_guid); > + path_rec.num_path = 1; Why should this be '1'? Sasha > break; > > case OSMV_QUERY_PATH_REC_BY_GIDS: > @@ -634,7 +635,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> @@ -642,6 +643,7 @@ > memcpy(&path_rec.sgid, > &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> > src_gid, sizeof(ib_gid_t)); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_LIDS: > @@ -652,13 +654,14 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); > + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > path_rec.dlid = > ((osmv_lid_pair_t *) (p_query_req->p_query_input))-> > dest_lid; > path_rec.slid = > ((osmv_lid_pair_t *) > (p_query_req->p_query_input))->src_lid; > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_UD_MULTICAST_SET: > --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 > +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 > @@ -743,7 +743,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) > (p_query_req-> @@ -753,6 +753,7 @@ > ((osmv_guid_pair_t *) > (p_query_req-> > > p_query_input))-> > src_guid); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_GIDS: > @@ -763,7 +764,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> @@ -771,6 +772,7 @@ > memcpy(&path_rec.sgid, > &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> > src_gid, sizeof(ib_gid_t)); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_LIDS: > @@ -789,6 +791,7 @@ > dest_lid; > path_rec.slid = > ((osmv_lid_pair_t *) > (p_query_req->p_query_input))->src_lid; > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_UD_MULTICAST_SET: > > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From hal.rosenstock at gmail.com Mon Dec 15 07:30:46 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 15 Dec 2008 10:30:46 -0500 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: Message-ID: On Wed, Dec 10, 2008 at 1:31 PM, Mike Heinz wrote: > While experimenting with the APIs in opensm/libvendor, I was unable to get > the path record queries to work. Reviewing the error logs from the SM, Which SM(s) ? > I discovered that the APIs were not setting the required num_path field. > Here's the fix: The approach used breaks backward compatibility which IMO should be preserved. I think a better approach is as follows: 1. Set num_paths in the application(s) as desired 2. In the library, check for num_paths 0 or not. If not, then set the numbpath compmask bit. -- Hal > --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 > +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 > @@ -615,7 +615,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) (p_query_req-> > @@ -625,6 +625,7 @@ > ((osmv_guid_pair_t *) (p_query_req-> > p_query_input))-> > src_guid); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_GIDS: > @@ -634,7 +635,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> > @@ -642,6 +643,7 @@ > memcpy(&path_rec.sgid, > &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> > src_gid, sizeof(ib_gid_t)); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_LIDS: > @@ -652,13 +654,14 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); > + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > path_rec.dlid = > ((osmv_lid_pair_t *) (p_query_req->p_query_input))-> > dest_lid; > path_rec.slid = > ((osmv_lid_pair_t *) > (p_query_req->p_query_input))->src_lid; > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_UD_MULTICAST_SET: > --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 > +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 > @@ -743,7 +743,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) (p_query_req-> > @@ -753,6 +753,7 @@ > ((osmv_guid_pair_t *) (p_query_req-> > p_query_input))-> > src_guid); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_GIDS: > @@ -763,7 +764,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> > @@ -771,6 +772,7 @@ > memcpy(&path_rec.sgid, > &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> > src_gid, sizeof(ib_gid_t)); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_LIDS: > @@ -789,6 +791,7 @@ > dest_lid; > path_rec.slid = > ((osmv_lid_pair_t *) > (p_query_req->p_query_input))->src_lid; > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_UD_MULTICAST_SET: > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From swise at opengridcomputing.com Mon Dec 15 07:35:39 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 15 Dec 2008 09:35:39 -0600 Subject: [ofa-general] Test modules for ib_core In-Reply-To: <49467639.3090405@ext.bull.net> References: <4940D767.8090200@ext.bull.net> <4940EBB5.6030702@mellanox.co.il> <4940FB22.9090809@ext.bull.net> <494114B5.7000703@ext.bull.net> <4945074F.1050500@dev.mellanox.co.il> <49467639.3090405@ext.bull.net> Message-ID: <4946794B.4080006@opengridcomputing.com> Celine Bourde wrote: > Vladimir Sokolovsky wrote: >> Celine Bourde wrote: >>> I am trying krping on 2.6.27 kernel and OFED1.4 (10 December 2008). >>> Module compilation is ok, but modprobe failed. >>> >>> [root at twin krping]# modprobe rdma_krping >>> FATAL: Error inserting rdma_krping >>> (/lib/modules/2.6.27/extra/rdma_krping.ko): Unknown symbol in >>> module, or unknown parameter (see dmesg) >>> >>> dmesg output : >>> >>> rdma_krping: disagrees about version of symbol ib_create_cq >>> rdma_krping: Unknown symbol ib_create_cq >>> rdma_krping: disagrees about version of symbol >>> ib_alloc_fast_reg_page_list >>> rdma_krping: Unknown symbol ib_alloc_fast_reg_page_list >>> rdma_krping: disagrees about version of symbol rdma_resolve_addr >>> ... >>> >>> Any Advice ? >>> >>> Céline Bourde >>> >>> >>> >> Hi Celine, >> Copy /usr/src/ofa_kernel/Module.symvers to the krping directory, then >> run make and insmod. >> >> Regards, >> Vladimir >> > It works. > Thanks, > > Céline Bourde. >> >> > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From swise at opengridcomputing.com Mon Dec 15 07:36:27 2008 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 15 Dec 2008 09:36:27 -0600 Subject: [ofa-general] Test modules for ib_core In-Reply-To: <49467639.3090405@ext.bull.net> References: <4940D767.8090200@ext.bull.net> <4940EBB5.6030702@mellanox.co.il> <4940FB22.9090809@ext.bull.net> <494114B5.7000703@ext.bull.net> <4945074F.1050500@dev.mellanox.co.il> <49467639.3090405@ext.bull.net> Message-ID: <4946797B.2050501@opengridcomputing.com> >>> >>> >>> >> Hi Celine, >> Copy /usr/src/ofa_kernel/Module.symvers to the krping directory, then >> run make and insmod. >> >> Regards, >> Vladimir >> Hey Vlad, What do you think I should add to the krping makefile to avoid this issue? From sashak at voltaire.com Mon Dec 15 07:38:38 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 15 Dec 2008 17:38:38 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081215151753.GA22506@sashak.voltaire.com> Message-ID: <20081215153838.GB22506@sashak.voltaire.com> On 09:29 Mon 15 Dec , Mike Heinz wrote: > > That's a good question - and I'm going to ask around and double check. > My first reaction was that you have to specify how many paths you want > from the query - but you're right, the spec doesn't say that. Yes, it looks like this (but I cannot understand "why" :( ). But even more strange (IMHO) limitation is mandatory SGID - actually it should make illegal such GetTable queries as all-to-all, SLID-to-all, etc.. I thought that it is permitted. > I'm going to do some research on my end. Are you saying that > IB_MAD_ATTR_PATH_RECORD should only ever return a single path? With GetTable? I think it shouldn't (for some queries it will - such as SLID + DLID). Sasha > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Monday, December 15, 2008 10:18 AM > To: Mike Heinz > Cc: general at lists.openfabrics.org; John Russo; Hal Rosenstock > Subject: Re: [ofa-general] Bugs in opensm/libvendor > > Hi Mike, > > On 12:31 Wed 10 Dec , Mike Heinz wrote: > > While experimenting with the APIs in opensm/libvendor, I was unable to > > > get the path record queries to work. Reviewing the error logs from the > > > SM, I discovered that the APIs were not setting the required num_path > > field. > > Actually this part of spec is not 100% clear for me - the only thing I > can see is that in table 207 (p.915 - PathRecord) is that SGID and > NumbPath parameters are marked as "required for GetTable request". This > leave me with some questions: > > - Could SLID be used in GetTable request instead of SGID (as implemented > now in opensm/libvendor)? Maybe not, but then I would expect some > explicit mention about this. If yes, what is the meaning of NumbPath > then? > > - What is the reason for such limitation. > > Do you or anybody on the list could clarify this? > > > Here's the fix: > > About the patch. > > White spaces are mangled. > > > > > --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 > > +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 > > @@ -615,7 +615,7 @@ > > sa_mad_data.attr_offset = > > ib_get_attr_offset(sizeof(ib_path_rec_t)); > > sa_mad_data.comp_mask = > > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > > IB_PR_COMPMASK_NUMBPATH); > > sa_mad_data.p_attr = &path_rec; > > ib_gid_set_default(&path_rec.dgid, > > ((osmv_guid_pair_t *) > > (p_query_req-> @@ -625,6 +625,7 @@ > > ((osmv_guid_pair_t *) > > (p_query_req-> > > > > p_query_input))-> > > src_guid); > > + path_rec.num_path = 1; > > Why should this be '1'? > > Sasha > > > break; > > > > case OSMV_QUERY_PATH_REC_BY_GIDS: > > @@ -634,7 +635,7 @@ > > sa_mad_data.attr_offset = > > ib_get_attr_offset(sizeof(ib_path_rec_t)); > > sa_mad_data.comp_mask = > > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > > IB_PR_COMPMASK_NUMBPATH); > > sa_mad_data.p_attr = &path_rec; > > memcpy(&path_rec.dgid, > > &((osmv_gid_pair_t *) > > (p_query_req->p_query_input))-> @@ -642,6 +643,7 @@ > > memcpy(&path_rec.sgid, > > &((osmv_gid_pair_t *) > > (p_query_req->p_query_input))-> > > src_gid, sizeof(ib_gid_t)); > > + path_rec.num_path = 1; > > break; > > > > case OSMV_QUERY_PATH_REC_BY_LIDS: > > @@ -652,13 +654,14 @@ > > sa_mad_data.attr_offset = > > ib_get_attr_offset(sizeof(ib_path_rec_t)); > > sa_mad_data.comp_mask = > > - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); > > + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | > > IB_PR_COMPMASK_NUMBPATH); > > sa_mad_data.p_attr = &path_rec; > > path_rec.dlid = > > ((osmv_lid_pair_t *) > (p_query_req->p_query_input))-> > > dest_lid; > > path_rec.slid = > > ((osmv_lid_pair_t *) > > (p_query_req->p_query_input))->src_lid; > > + path_rec.num_path = 1; > > break; > > > > case OSMV_QUERY_UD_MULTICAST_SET: > > --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 > > +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 > > @@ -743,7 +743,7 @@ > > sa_mad_data.attr_offset = > > ib_get_attr_offset(sizeof(ib_path_rec_t)); > > sa_mad_data.comp_mask = > > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > > IB_PR_COMPMASK_NUMBPATH); > > sa_mad_data.p_attr = &path_rec; > > ib_gid_set_default(&path_rec.dgid, > > ((osmv_guid_pair_t *) > > (p_query_req-> @@ -753,6 +753,7 @@ > > ((osmv_guid_pair_t *) > > (p_query_req-> > > > > p_query_input))-> > > src_guid); > > + path_rec.num_path = 1; > > break; > > > > case OSMV_QUERY_PATH_REC_BY_GIDS: > > @@ -763,7 +764,7 @@ > > sa_mad_data.attr_offset = > > ib_get_attr_offset(sizeof(ib_path_rec_t)); > > sa_mad_data.comp_mask = > > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > > IB_PR_COMPMASK_NUMBPATH); > > sa_mad_data.p_attr = &path_rec; > > memcpy(&path_rec.dgid, > > &((osmv_gid_pair_t *) > > (p_query_req->p_query_input))-> @@ -771,6 +772,7 @@ > > memcpy(&path_rec.sgid, > > &((osmv_gid_pair_t *) > > (p_query_req->p_query_input))-> > > src_gid, sizeof(ib_gid_t)); > > + path_rec.num_path = 1; > > break; > > > > case OSMV_QUERY_PATH_REC_BY_LIDS: > > @@ -789,6 +791,7 @@ > > dest_lid; > > path_rec.slid = > > ((osmv_lid_pair_t *) > > (p_query_req->p_query_input))->src_lid; > > + path_rec.num_path = 1; > > break; > > > > case OSMV_QUERY_UD_MULTICAST_SET: > > > > > > -- > > Michael Heinz > > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general From michael.heinz at qlogic.com Mon Dec 15 07:41:12 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Mon, 15 Dec 2008 09:41:12 -0600 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: Message-ID: Hal, I could be wrong, but as I understand it, this function does not permit the user to request more than one path. The API takes a query request which contains pair of lids or guids, but does not have a field for specifying the number of paths. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] Sent: Monday, December 15, 2008 10:31 AM To: Mike Heinz Cc: general at lists.openfabrics.org; John Russo Subject: Re: [ofa-general] Bugs in opensm/libvendor On Wed, Dec 10, 2008 at 1:31 PM, Mike Heinz wrote: > While experimenting with the APIs in opensm/libvendor, I was unable to > get the path record queries to work. Reviewing the error logs from the > SM, Which SM(s) ? > I discovered that the APIs were not setting the required num_path field. > Here's the fix: The approach used breaks backward compatibility which IMO should be preserved. I think a better approach is as follows: 1. Set num_paths in the application(s) as desired 2. In the library, check for num_paths 0 or not. If not, then set the numbpath compmask bit. -- Hal > --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 > +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 > @@ -615,7 +615,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) > (p_query_req-> @@ -625,6 +625,7 @@ > ((osmv_guid_pair_t *) (p_query_req-> > p_query_input))-> > src_guid); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_GIDS: > @@ -634,7 +635,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> @@ -642,6 +643,7 @@ > memcpy(&path_rec.sgid, > &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> > src_gid, sizeof(ib_gid_t)); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_LIDS: > @@ -652,13 +654,14 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); > + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > path_rec.dlid = > ((osmv_lid_pair_t *) (p_query_req->p_query_input))-> > dest_lid; > path_rec.slid = > ((osmv_lid_pair_t *) > (p_query_req->p_query_input))->src_lid; > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_UD_MULTICAST_SET: > --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 > +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 > @@ -743,7 +743,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) > (p_query_req-> @@ -753,6 +753,7 @@ > ((osmv_guid_pair_t *) (p_query_req-> > p_query_input))-> > src_guid); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_GIDS: > @@ -763,7 +764,7 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > IB_PR_COMPMASK_NUMBPATH); > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> @@ -771,6 +772,7 @@ > memcpy(&path_rec.sgid, > &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> > src_gid, sizeof(ib_gid_t)); > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_PATH_REC_BY_LIDS: > @@ -789,6 +791,7 @@ > dest_lid; > path_rec.slid = > ((osmv_lid_pair_t *) > (p_query_req->p_query_input))->src_lid; > + path_rec.num_path = 1; > break; > > case OSMV_QUERY_UD_MULTICAST_SET: > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Mon Dec 15 07:53:48 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 15 Dec 2008 10:53:48 -0500 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: Message-ID: Mike, On Mon, Dec 15, 2008 at 10:41 AM, Mike Heinz wrote: > Hal, > > I could be wrong, but as I understand it, this function does not permit > the user to request more than one path. The API takes a query request > which contains pair of lids or guids, but does not have a field for > specifying the number of paths. You're right; the path requests do not currently support num_paths unlike multipath requests. But there is reliance on this "feature". So the current API needs enhancement to support this IMO to preserve backward compatibility. > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] > Sent: Monday, December 15, 2008 10:31 AM > To: Mike Heinz > Cc: general at lists.openfabrics.org; John Russo > Subject: Re: [ofa-general] Bugs in opensm/libvendor > > On Wed, Dec 10, 2008 at 1:31 PM, Mike Heinz > wrote: >> While experimenting with the APIs in opensm/libvendor, In tree or out of tree application(s) ? > I was unable to > >> get the path record queries to work. Reviewing the error logs from the > >> SM, > > Which SM(s) ? Which SM(s) ? -- Hal >> I discovered that the APIs were not setting the required num_path > field. >> Here's the fix: > > The approach used breaks backward compatibility which IMO should be > preserved. > I think a better approach is as follows: > 1. Set num_paths in the application(s) as desired 2. In the library, > check for num_paths 0 or not. If not, then set the numbpath compmask > bit. > > -- Hal > >> --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 >> +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 >> @@ -615,7 +615,7 @@ >> sa_mad_data.attr_offset = >> ib_get_attr_offset(sizeof(ib_path_rec_t)); >> sa_mad_data.comp_mask = >> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> IB_PR_COMPMASK_NUMBPATH); >> sa_mad_data.p_attr = &path_rec; >> ib_gid_set_default(&path_rec.dgid, >> ((osmv_guid_pair_t *) >> (p_query_req-> @@ -625,6 +625,7 @@ >> ((osmv_guid_pair_t *) > (p_query_req-> >> > p_query_input))-> >> src_guid); >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_PATH_REC_BY_GIDS: >> @@ -634,7 +635,7 @@ >> sa_mad_data.attr_offset = >> ib_get_attr_offset(sizeof(ib_path_rec_t)); >> sa_mad_data.comp_mask = >> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> IB_PR_COMPMASK_NUMBPATH); >> sa_mad_data.p_attr = &path_rec; >> memcpy(&path_rec.dgid, >> &((osmv_gid_pair_t *) >> (p_query_req->p_query_input))-> @@ -642,6 +643,7 @@ >> memcpy(&path_rec.sgid, >> &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> >> src_gid, sizeof(ib_gid_t)); >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_PATH_REC_BY_LIDS: >> @@ -652,13 +654,14 @@ >> sa_mad_data.attr_offset = >> ib_get_attr_offset(sizeof(ib_path_rec_t)); >> sa_mad_data.comp_mask = >> - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); >> + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | >> IB_PR_COMPMASK_NUMBPATH); >> sa_mad_data.p_attr = &path_rec; >> path_rec.dlid = >> ((osmv_lid_pair_t *) > (p_query_req->p_query_input))-> >> dest_lid; >> path_rec.slid = >> ((osmv_lid_pair_t *) >> (p_query_req->p_query_input))->src_lid; >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_UD_MULTICAST_SET: >> --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 >> +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 >> @@ -743,7 +743,7 @@ >> sa_mad_data.attr_offset = >> ib_get_attr_offset(sizeof(ib_path_rec_t)); >> sa_mad_data.comp_mask = >> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> IB_PR_COMPMASK_NUMBPATH); >> sa_mad_data.p_attr = &path_rec; >> ib_gid_set_default(&path_rec.dgid, >> ((osmv_guid_pair_t *) >> (p_query_req-> @@ -753,6 +753,7 @@ >> ((osmv_guid_pair_t *) > (p_query_req-> >> > p_query_input))-> >> src_guid); >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_PATH_REC_BY_GIDS: >> @@ -763,7 +764,7 @@ >> sa_mad_data.attr_offset = >> ib_get_attr_offset(sizeof(ib_path_rec_t)); >> sa_mad_data.comp_mask = >> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> IB_PR_COMPMASK_NUMBPATH); >> sa_mad_data.p_attr = &path_rec; >> memcpy(&path_rec.dgid, >> &((osmv_gid_pair_t *) >> (p_query_req->p_query_input))-> @@ -771,6 +772,7 @@ >> memcpy(&path_rec.sgid, >> &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> >> src_gid, sizeof(ib_gid_t)); >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_PATH_REC_BY_LIDS: >> @@ -789,6 +791,7 @@ >> dest_lid; >> path_rec.slid = >> ((osmv_lid_pair_t *) >> (p_query_req->p_query_input))->src_lid; >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_UD_MULTICAST_SET: >> >> -- >> Michael Heinz >> Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > From vlad at dev.mellanox.co.il Mon Dec 15 08:02:30 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 15 Dec 2008 18:02:30 +0200 Subject: [ofa-general] Test modules for ib_core In-Reply-To: <4946797B.2050501@opengridcomputing.com> References: <4940D767.8090200@ext.bull.net> <4940EBB5.6030702@mellanox.co.il> <4940FB22.9090809@ext.bull.net> <494114B5.7000703@ext.bull.net> <4945074F.1050500@dev.mellanox.co.il> <49467639.3090405@ext.bull.net> <4946797B.2050501@opengridcomputing.com> Message-ID: <49467F96.1020200@dev.mellanox.co.il> Steve Wise wrote: > >>>> >>>> >>>> >>> Hi Celine, >>> Copy /usr/src/ofa_kernel/Module.symvers to the krping directory, >>> then run make and insmod. >>> >>> Regards, >>> Vladimir >>> > Hey Vlad, > > What do you think I should add to the krping makefile to avoid this > issue? > Hi Steve, There are 2 relevant solutions described in kernel's documentation: Use an extra Module.symvers file When an external module is built, a Module.symvers file is generated containing all exported symbols which are not defined in the kernel. To get access to symbols from module 'bar', one can copy the Module.symvers file from the compilation of the 'bar' module to the directory where the 'foo' module is built. During the module build, kbuild will read the Module.symvers file in the directory of the external module and when the build is finished, a new Module.symvers file is created containing the sum of all symbols defined and not part of the kernel. Use make variable KBUILD_EXTRA_SYMBOLS in the Makefile If it is impractical to copy Module.symvers from another module, you can assign a space separated list of files to KBUILD_EXTRA_SYMBOLS in your Makefile. These files will be loaded by modpost during the initialization of its symbol tables. Note: Module.symvers is a part of kernel-ib-devel RPM in OFED and is installed, by default, under /usr/src/ofa_kernel See, Documentation/kbuild/modules.txt Section: 7.3 Symbols from another external module Regards, Vladimir From hal.rosenstock at gmail.com Mon Dec 15 08:03:16 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 15 Dec 2008 11:03:16 -0500 Subject: ***SPAM*** Re: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081215153838.GB22506@sashak.voltaire.com> References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> Message-ID: On Mon, Dec 15, 2008 at 10:38 AM, Sasha Khapyorsky wrote: > On 09:29 Mon 15 Dec , Mike Heinz wrote: >> >> That's a good question - and I'm going to ask around and double check. >> My first reaction was that you have to specify how many paths you want >> from the query - but you're right, the spec doesn't say that. > > Yes, it looks like this (but I cannot understand "why" :( ). The spec says this (for GetTable) and Gets are requests for 1 path. The reason is to limit the amount of returned path records (and the field limits to 255 records in the response). >But even more > strange (IMHO) limitation is mandatory SGID - actually it should make > illegal such GetTable queries as all-to-all, SLID-to-all, etc.. I > thought that it is permitted. It was decided to force SGID. Neither All to all nor SLID to all by itself are spec'd (you could could add SGID along with SLID to all though). Support for those is a proprietary OpenSM extension which is used for testing at least (and also by saquery command). >> I'm going to do some research on my end. Are you saying that >> IB_MAD_ATTR_PATH_RECORD should only ever return a single path? No; some queries only return 1 path and others multiple. The query itself should say how many paths it would like back. > With GetTable? I think it shouldn't (for some queries it will - such as > SLID + DLID). Some queries can get multiple path records in the response. -- Hal > Sasha > >> >> -- >> Michael Heinz >> Principal Engineer, Qlogic Corporation >> King of Prussia, Pennsylvania >> >> -----Original Message----- >> From: Sasha Khapyorsky [mailto:sashak at voltaire.com] >> Sent: Monday, December 15, 2008 10:18 AM >> To: Mike Heinz >> Cc: general at lists.openfabrics.org; John Russo; Hal Rosenstock >> Subject: Re: [ofa-general] Bugs in opensm/libvendor >> >> Hi Mike, >> >> On 12:31 Wed 10 Dec , Mike Heinz wrote: >> > While experimenting with the APIs in opensm/libvendor, I was unable to >> >> > get the path record queries to work. Reviewing the error logs from the >> >> > SM, I discovered that the APIs were not setting the required num_path >> > field. >> >> Actually this part of spec is not 100% clear for me - the only thing I >> can see is that in table 207 (p.915 - PathRecord) is that SGID and >> NumbPath parameters are marked as "required for GetTable request". This >> leave me with some questions: >> >> - Could SLID be used in GetTable request instead of SGID (as implemented >> now in opensm/libvendor)? Maybe not, but then I would expect some >> explicit mention about this. If yes, what is the meaning of NumbPath >> then? >> >> - What is the reason for such limitation. >> >> Do you or anybody on the list could clarify this? >> >> > Here's the fix: >> >> About the patch. >> >> White spaces are mangled. >> >> > >> > --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 >> > +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 >> > @@ -615,7 +615,7 @@ >> > sa_mad_data.attr_offset = >> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >> > sa_mad_data.comp_mask = >> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> > IB_PR_COMPMASK_NUMBPATH); >> > sa_mad_data.p_attr = &path_rec; >> > ib_gid_set_default(&path_rec.dgid, >> > ((osmv_guid_pair_t *) >> > (p_query_req-> @@ -625,6 +625,7 @@ >> > ((osmv_guid_pair_t *) >> > (p_query_req-> >> > >> > p_query_input))-> >> > src_guid); >> > + path_rec.num_path = 1; >> >> Why should this be '1'? >> >> Sasha >> >> > break; >> > >> > case OSMV_QUERY_PATH_REC_BY_GIDS: >> > @@ -634,7 +635,7 @@ >> > sa_mad_data.attr_offset = >> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >> > sa_mad_data.comp_mask = >> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> > IB_PR_COMPMASK_NUMBPATH); >> > sa_mad_data.p_attr = &path_rec; >> > memcpy(&path_rec.dgid, >> > &((osmv_gid_pair_t *) >> > (p_query_req->p_query_input))-> @@ -642,6 +643,7 @@ >> > memcpy(&path_rec.sgid, >> > &((osmv_gid_pair_t *) >> > (p_query_req->p_query_input))-> >> > src_gid, sizeof(ib_gid_t)); >> > + path_rec.num_path = 1; >> > break; >> > >> > case OSMV_QUERY_PATH_REC_BY_LIDS: >> > @@ -652,13 +654,14 @@ >> > sa_mad_data.attr_offset = >> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >> > sa_mad_data.comp_mask = >> > - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); >> > + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | >> > IB_PR_COMPMASK_NUMBPATH); >> > sa_mad_data.p_attr = &path_rec; >> > path_rec.dlid = >> > ((osmv_lid_pair_t *) >> (p_query_req->p_query_input))-> >> > dest_lid; >> > path_rec.slid = >> > ((osmv_lid_pair_t *) >> > (p_query_req->p_query_input))->src_lid; >> > + path_rec.num_path = 1; >> > break; >> > >> > case OSMV_QUERY_UD_MULTICAST_SET: >> > --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 >> > +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 >> > @@ -743,7 +743,7 @@ >> > sa_mad_data.attr_offset = >> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >> > sa_mad_data.comp_mask = >> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> > IB_PR_COMPMASK_NUMBPATH); >> > sa_mad_data.p_attr = &path_rec; >> > ib_gid_set_default(&path_rec.dgid, >> > ((osmv_guid_pair_t *) >> > (p_query_req-> @@ -753,6 +753,7 @@ >> > ((osmv_guid_pair_t *) >> > (p_query_req-> >> > >> > p_query_input))-> >> > src_guid); >> > + path_rec.num_path = 1; >> > break; >> > >> > case OSMV_QUERY_PATH_REC_BY_GIDS: >> > @@ -763,7 +764,7 @@ >> > sa_mad_data.attr_offset = >> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >> > sa_mad_data.comp_mask = >> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> > IB_PR_COMPMASK_NUMBPATH); >> > sa_mad_data.p_attr = &path_rec; >> > memcpy(&path_rec.dgid, >> > &((osmv_gid_pair_t *) >> > (p_query_req->p_query_input))-> @@ -771,6 +772,7 @@ >> > memcpy(&path_rec.sgid, >> > &((osmv_gid_pair_t *) >> > (p_query_req->p_query_input))-> >> > src_gid, sizeof(ib_gid_t)); >> > + path_rec.num_path = 1; >> > break; >> > >> > case OSMV_QUERY_PATH_REC_BY_LIDS: >> > @@ -789,6 +791,7 @@ >> > dest_lid; >> > path_rec.slid = >> > ((osmv_lid_pair_t *) >> > (p_query_req->p_query_input))->src_lid; >> > + path_rec.num_path = 1; >> > break; >> > >> > case OSMV_QUERY_UD_MULTICAST_SET: >> > >> > >> > -- >> > Michael Heinz >> > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania >> > >> >> > _______________________________________________ >> > general mailing list >> > general at lists.openfabrics.org >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> > >> > To unsubscribe, please visit >> > http://openib.org/mailman/listinfo/openib-general > From michael.heinz at qlogic.com Mon Dec 15 08:07:52 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Mon, 15 Dec 2008 10:07:52 -0600 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081215153838.GB22506@sashak.voltaire.com> References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> Message-ID: Sasha, Hal, Reviewing the spec again, on page 915, there's an "X" in the required column for numb_path and GETTABLE requests. Also, looking at the code, this query is being sent as a IB_MAD_METHOD_GETTABLE, not IB_MAD_METHOD_GET, so the number of paths requested needs to be specified. Since this function doesn't support returning multiple paths to the caller, perhaps the correct fix is to change the method to IB_MAD_METHOD_GET? The SM we're testing against it Qlogic's. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Sasha Khapyorsky [mailto:sashak at voltaire.com] Sent: Monday, December 15, 2008 10:39 AM To: Mike Heinz Cc: general at lists.openfabrics.org; John Russo; Hal Rosenstock Subject: Re: [ofa-general] Bugs in opensm/libvendor On 09:29 Mon 15 Dec , Mike Heinz wrote: > > That's a good question - and I'm going to ask around and double check. > My first reaction was that you have to specify how many paths you want > from the query - but you're right, the spec doesn't say that. Yes, it looks like this (but I cannot understand "why" :( ). But even more strange (IMHO) limitation is mandatory SGID - actually it should make illegal such GetTable queries as all-to-all, SLID-to-all, etc.. I thought that it is permitted. > I'm going to do some research on my end. Are you saying that > IB_MAD_ATTR_PATH_RECORD should only ever return a single path? With GetTable? I think it shouldn't (for some queries it will - such as SLID + DLID). Sasha > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Monday, December 15, 2008 10:18 AM > To: Mike Heinz > Cc: general at lists.openfabrics.org; John Russo; Hal Rosenstock > Subject: Re: [ofa-general] Bugs in opensm/libvendor > > Hi Mike, > > On 12:31 Wed 10 Dec , Mike Heinz wrote: > > While experimenting with the APIs in opensm/libvendor, I was unable > > to > > > get the path record queries to work. Reviewing the error logs from > > the > > > SM, I discovered that the APIs were not setting the required > > num_path field. > > Actually this part of spec is not 100% clear for me - the only thing I > can see is that in table 207 (p.915 - PathRecord) is that SGID and > NumbPath parameters are marked as "required for GetTable request". > This leave me with some questions: > > - Could SLID be used in GetTable request instead of SGID (as implemented > now in opensm/libvendor)? Maybe not, but then I would expect some > explicit mention about this. If yes, what is the meaning of NumbPath > then? > > - What is the reason for such limitation. > > Do you or anybody on the list could clarify this? > > > Here's the fix: > > About the patch. > > White spaces are mangled. > > > > > --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 > > +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 > > @@ -615,7 +615,7 @@ > > sa_mad_data.attr_offset = > > ib_get_attr_offset(sizeof(ib_path_rec_t)); > > sa_mad_data.comp_mask = > > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > > IB_PR_COMPMASK_NUMBPATH); > > sa_mad_data.p_attr = &path_rec; > > ib_gid_set_default(&path_rec.dgid, > > ((osmv_guid_pair_t *) > > (p_query_req-> @@ -625,6 +625,7 @@ > > ((osmv_guid_pair_t *) > > (p_query_req-> > > > > p_query_input))-> > > src_guid); > > + path_rec.num_path = 1; > > Why should this be '1'? > > Sasha > > > break; > > > > case OSMV_QUERY_PATH_REC_BY_GIDS: > > @@ -634,7 +635,7 @@ > > sa_mad_data.attr_offset = > > ib_get_attr_offset(sizeof(ib_path_rec_t)); > > sa_mad_data.comp_mask = > > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > > IB_PR_COMPMASK_NUMBPATH); > > sa_mad_data.p_attr = &path_rec; > > memcpy(&path_rec.dgid, > > &((osmv_gid_pair_t *) > > (p_query_req->p_query_input))-> @@ -642,6 +643,7 @@ > > memcpy(&path_rec.sgid, > > &((osmv_gid_pair_t *) > > (p_query_req->p_query_input))-> > > src_gid, sizeof(ib_gid_t)); > > + path_rec.num_path = 1; > > break; > > > > case OSMV_QUERY_PATH_REC_BY_LIDS: > > @@ -652,13 +654,14 @@ > > sa_mad_data.attr_offset = > > ib_get_attr_offset(sizeof(ib_path_rec_t)); > > sa_mad_data.comp_mask = > > - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); > > + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | > > IB_PR_COMPMASK_NUMBPATH); > > sa_mad_data.p_attr = &path_rec; > > path_rec.dlid = > > ((osmv_lid_pair_t *) > (p_query_req->p_query_input))-> > > dest_lid; > > path_rec.slid = > > ((osmv_lid_pair_t *) > > (p_query_req->p_query_input))->src_lid; > > + path_rec.num_path = 1; > > break; > > > > case OSMV_QUERY_UD_MULTICAST_SET: > > --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 > > +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 > > @@ -743,7 +743,7 @@ > > sa_mad_data.attr_offset = > > ib_get_attr_offset(sizeof(ib_path_rec_t)); > > sa_mad_data.comp_mask = > > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > > IB_PR_COMPMASK_NUMBPATH); > > sa_mad_data.p_attr = &path_rec; > > ib_gid_set_default(&path_rec.dgid, > > ((osmv_guid_pair_t *) > > (p_query_req-> @@ -753,6 +753,7 @@ > > ((osmv_guid_pair_t *) > > (p_query_req-> > > > > p_query_input))-> > > src_guid); > > + path_rec.num_path = 1; > > break; > > > > case OSMV_QUERY_PATH_REC_BY_GIDS: > > @@ -763,7 +764,7 @@ > > sa_mad_data.attr_offset = > > ib_get_attr_offset(sizeof(ib_path_rec_t)); > > sa_mad_data.comp_mask = > > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | > > IB_PR_COMPMASK_NUMBPATH); > > sa_mad_data.p_attr = &path_rec; > > memcpy(&path_rec.dgid, > > &((osmv_gid_pair_t *) > > (p_query_req->p_query_input))-> @@ -771,6 +772,7 @@ > > memcpy(&path_rec.sgid, > > &((osmv_gid_pair_t *) > > (p_query_req->p_query_input))-> > > src_gid, sizeof(ib_gid_t)); > > + path_rec.num_path = 1; > > break; > > > > case OSMV_QUERY_PATH_REC_BY_LIDS: > > @@ -789,6 +791,7 @@ > > dest_lid; > > path_rec.slid = > > ((osmv_lid_pair_t *) > > (p_query_req->p_query_input))->src_lid; > > + path_rec.num_path = 1; > > break; > > > > case OSMV_QUERY_UD_MULTICAST_SET: > > > > > > -- > > Michael Heinz > > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > > > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general From hal.rosenstock at gmail.com Mon Dec 15 08:11:00 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 15 Dec 2008 11:11:00 -0500 Subject: ***SPAM*** Re: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> Message-ID: Mike, On Mon, Dec 15, 2008 at 11:07 AM, Mike Heinz wrote: > Sasha, Hal, > > Reviewing the spec again, on page 915, there's an "X" in the required > column for numb_path and GETTABLE requests. Also, looking at the code, > this query is being sent as a IB_MAD_METHOD_GETTABLE, not > IB_MAD_METHOD_GET, so the number of paths requested needs to be > specified. Yes, that is the spec compliant view. OpenSM and a variety of tools rely on the extension to the spec where numb paths 0 means unlimited and supports a wider variety of queries here. > Since this function doesn't support returning multiple paths to the > caller, perhaps the correct fix is to change the method to > IB_MAD_METHOD_GET? Not IMO due to the above. -- Hal > The SM we're testing against it Qlogic's. > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Sasha Khapyorsky [mailto:sashak at voltaire.com] > Sent: Monday, December 15, 2008 10:39 AM > To: Mike Heinz > Cc: general at lists.openfabrics.org; John Russo; Hal Rosenstock > Subject: Re: [ofa-general] Bugs in opensm/libvendor > > On 09:29 Mon 15 Dec , Mike Heinz wrote: >> >> That's a good question - and I'm going to ask around and double check. >> My first reaction was that you have to specify how many paths you want > >> from the query - but you're right, the spec doesn't say that. > > Yes, it looks like this (but I cannot understand "why" :( ). But even > more strange (IMHO) limitation is mandatory SGID - actually it should > make illegal such GetTable queries as all-to-all, SLID-to-all, etc.. I > thought that it is permitted. > >> I'm going to do some research on my end. Are you saying that >> IB_MAD_ATTR_PATH_RECORD should only ever return a single path? > > With GetTable? I think it shouldn't (for some queries it will - such as > SLID + DLID). > > Sasha > >> >> -- >> Michael Heinz >> Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania >> >> -----Original Message----- >> From: Sasha Khapyorsky [mailto:sashak at voltaire.com] >> Sent: Monday, December 15, 2008 10:18 AM >> To: Mike Heinz >> Cc: general at lists.openfabrics.org; John Russo; Hal Rosenstock >> Subject: Re: [ofa-general] Bugs in opensm/libvendor >> >> Hi Mike, >> >> On 12:31 Wed 10 Dec , Mike Heinz wrote: >> > While experimenting with the APIs in opensm/libvendor, I was unable >> > to >> >> > get the path record queries to work. Reviewing the error logs from >> > the >> >> > SM, I discovered that the APIs were not setting the required >> > num_path field. >> >> Actually this part of spec is not 100% clear for me - the only thing I > >> can see is that in table 207 (p.915 - PathRecord) is that SGID and >> NumbPath parameters are marked as "required for GetTable request". >> This leave me with some questions: >> >> - Could SLID be used in GetTable request instead of SGID (as > implemented >> now in opensm/libvendor)? Maybe not, but then I would expect some >> explicit mention about this. If yes, what is the meaning of NumbPath >> then? >> >> - What is the reason for such limitation. >> >> Do you or anybody on the list could clarify this? >> >> > Here's the fix: >> >> About the patch. >> >> White spaces are mangled. >> >> > >> > --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 >> > +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 >> > @@ -615,7 +615,7 @@ >> > sa_mad_data.attr_offset = >> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >> > sa_mad_data.comp_mask = >> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> > IB_PR_COMPMASK_NUMBPATH); >> > sa_mad_data.p_attr = &path_rec; >> > ib_gid_set_default(&path_rec.dgid, >> > ((osmv_guid_pair_t *) >> > (p_query_req-> @@ -625,6 +625,7 @@ >> > ((osmv_guid_pair_t *) >> > (p_query_req-> >> > >> > p_query_input))-> >> > src_guid); >> > + path_rec.num_path = 1; >> >> Why should this be '1'? >> >> Sasha >> >> > break; >> > >> > case OSMV_QUERY_PATH_REC_BY_GIDS: >> > @@ -634,7 +635,7 @@ >> > sa_mad_data.attr_offset = >> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >> > sa_mad_data.comp_mask = >> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> > IB_PR_COMPMASK_NUMBPATH); >> > sa_mad_data.p_attr = &path_rec; >> > memcpy(&path_rec.dgid, >> > &((osmv_gid_pair_t *) >> > (p_query_req->p_query_input))-> @@ -642,6 +643,7 @@ >> > memcpy(&path_rec.sgid, >> > &((osmv_gid_pair_t *) >> > (p_query_req->p_query_input))-> >> > src_gid, sizeof(ib_gid_t)); >> > + path_rec.num_path = 1; >> > break; >> > >> > case OSMV_QUERY_PATH_REC_BY_LIDS: >> > @@ -652,13 +654,14 @@ >> > sa_mad_data.attr_offset = >> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >> > sa_mad_data.comp_mask = >> > - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); >> > + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | >> > IB_PR_COMPMASK_NUMBPATH); >> > sa_mad_data.p_attr = &path_rec; >> > path_rec.dlid = >> > ((osmv_lid_pair_t *) >> (p_query_req->p_query_input))-> >> > dest_lid; >> > path_rec.slid = >> > ((osmv_lid_pair_t *) >> > (p_query_req->p_query_input))->src_lid; >> > + path_rec.num_path = 1; >> > break; >> > >> > case OSMV_QUERY_UD_MULTICAST_SET: >> > --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 >> > +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 >> > @@ -743,7 +743,7 @@ >> > sa_mad_data.attr_offset = >> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >> > sa_mad_data.comp_mask = >> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> > IB_PR_COMPMASK_NUMBPATH); >> > sa_mad_data.p_attr = &path_rec; >> > ib_gid_set_default(&path_rec.dgid, >> > ((osmv_guid_pair_t *) >> > (p_query_req-> @@ -753,6 +753,7 @@ >> > ((osmv_guid_pair_t *) >> > (p_query_req-> >> > >> > p_query_input))-> >> > src_guid); >> > + path_rec.num_path = 1; >> > break; >> > >> > case OSMV_QUERY_PATH_REC_BY_GIDS: >> > @@ -763,7 +764,7 @@ >> > sa_mad_data.attr_offset = >> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >> > sa_mad_data.comp_mask = >> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> > IB_PR_COMPMASK_NUMBPATH); >> > sa_mad_data.p_attr = &path_rec; >> > memcpy(&path_rec.dgid, >> > &((osmv_gid_pair_t *) >> > (p_query_req->p_query_input))-> @@ -771,6 +772,7 @@ >> > memcpy(&path_rec.sgid, >> > &((osmv_gid_pair_t *) >> > (p_query_req->p_query_input))-> >> > src_gid, sizeof(ib_gid_t)); >> > + path_rec.num_path = 1; >> > break; >> > >> > case OSMV_QUERY_PATH_REC_BY_LIDS: >> > @@ -789,6 +791,7 @@ >> > dest_lid; >> > path_rec.slid = >> > ((osmv_lid_pair_t *) >> > (p_query_req->p_query_input))->src_lid; >> > + path_rec.num_path = 1; >> > break; >> > >> > case OSMV_QUERY_UD_MULTICAST_SET: >> > >> > >> > -- >> > Michael Heinz >> > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania >> > >> >> > _______________________________________________ >> > general mailing list >> > general at lists.openfabrics.org >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> > >> > To unsubscribe, please visit >> > http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Mon Dec 15 08:13:25 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 15 Dec 2008 11:13:25 -0500 Subject: ***SPAM*** Re: ***SPAM*** Re: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> Message-ID: On Mon, Dec 15, 2008 at 11:11 AM, Hal Rosenstock wrote: > Mike, > > On Mon, Dec 15, 2008 at 11:07 AM, Mike Heinz wrote: >> Sasha, Hal, >> >> Reviewing the spec again, on page 915, there's an "X" in the required >> column for numb_path and GETTABLE requests. Also, looking at the code, >> this query is being sent as a IB_MAD_METHOD_GETTABLE, not >> IB_MAD_METHOD_GET, so the number of paths requested needs to be >> specified. > > Yes, that is the spec compliant view. OpenSM and a variety of tools > rely on the extension to the spec where numb paths 0 means unlimited > and supports a wider variety of queries here. > >> Since this function doesn't support returning multiple paths to the >> caller, perhaps the correct fix is to change the method to >> IB_MAD_METHOD_GET? > > Not IMO due to the above. IMO the best solution is to extend the API for these to include num_paths and support explicit settting of this as well as 0 so backward compatibility is not lost. -- Hal > -- Hal > >> The SM we're testing against it Qlogic's. >> >> -- >> Michael Heinz >> Principal Engineer, Qlogic Corporation >> King of Prussia, Pennsylvania >> >> -----Original Message----- >> From: Sasha Khapyorsky [mailto:sashak at voltaire.com] >> Sent: Monday, December 15, 2008 10:39 AM >> To: Mike Heinz >> Cc: general at lists.openfabrics.org; John Russo; Hal Rosenstock >> Subject: Re: [ofa-general] Bugs in opensm/libvendor >> >> On 09:29 Mon 15 Dec , Mike Heinz wrote: >>> >>> That's a good question - and I'm going to ask around and double check. >>> My first reaction was that you have to specify how many paths you want >> >>> from the query - but you're right, the spec doesn't say that. >> >> Yes, it looks like this (but I cannot understand "why" :( ). But even >> more strange (IMHO) limitation is mandatory SGID - actually it should >> make illegal such GetTable queries as all-to-all, SLID-to-all, etc.. I >> thought that it is permitted. >> >>> I'm going to do some research on my end. Are you saying that >>> IB_MAD_ATTR_PATH_RECORD should only ever return a single path? >> >> With GetTable? I think it shouldn't (for some queries it will - such as >> SLID + DLID). >> >> Sasha >> >>> >>> -- >>> Michael Heinz >>> Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania >>> >>> -----Original Message----- >>> From: Sasha Khapyorsky [mailto:sashak at voltaire.com] >>> Sent: Monday, December 15, 2008 10:18 AM >>> To: Mike Heinz >>> Cc: general at lists.openfabrics.org; John Russo; Hal Rosenstock >>> Subject: Re: [ofa-general] Bugs in opensm/libvendor >>> >>> Hi Mike, >>> >>> On 12:31 Wed 10 Dec , Mike Heinz wrote: >>> > While experimenting with the APIs in opensm/libvendor, I was unable >>> > to >>> >>> > get the path record queries to work. Reviewing the error logs from >>> > the >>> >>> > SM, I discovered that the APIs were not setting the required >>> > num_path field. >>> >>> Actually this part of spec is not 100% clear for me - the only thing I >> >>> can see is that in table 207 (p.915 - PathRecord) is that SGID and >>> NumbPath parameters are marked as "required for GetTable request". >>> This leave me with some questions: >>> >>> - Could SLID be used in GetTable request instead of SGID (as >> implemented >>> now in opensm/libvendor)? Maybe not, but then I would expect some >>> explicit mention about this. If yes, what is the meaning of NumbPath >>> then? >>> >>> - What is the reason for such limitation. >>> >>> Do you or anybody on the list could clarify this? >>> >>> > Here's the fix: >>> >>> About the patch. >>> >>> White spaces are mangled. >>> >>> > >>> > --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 >>> > +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 >>> > @@ -615,7 +615,7 @@ >>> > sa_mad_data.attr_offset = >>> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >>> > sa_mad_data.comp_mask = >>> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >>> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >>> > IB_PR_COMPMASK_NUMBPATH); >>> > sa_mad_data.p_attr = &path_rec; >>> > ib_gid_set_default(&path_rec.dgid, >>> > ((osmv_guid_pair_t *) >>> > (p_query_req-> @@ -625,6 +625,7 @@ >>> > ((osmv_guid_pair_t *) >>> > (p_query_req-> >>> > >>> > p_query_input))-> >>> > src_guid); >>> > + path_rec.num_path = 1; >>> >>> Why should this be '1'? >>> >>> Sasha >>> >>> > break; >>> > >>> > case OSMV_QUERY_PATH_REC_BY_GIDS: >>> > @@ -634,7 +635,7 @@ >>> > sa_mad_data.attr_offset = >>> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >>> > sa_mad_data.comp_mask = >>> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >>> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >>> > IB_PR_COMPMASK_NUMBPATH); >>> > sa_mad_data.p_attr = &path_rec; >>> > memcpy(&path_rec.dgid, >>> > &((osmv_gid_pair_t *) >>> > (p_query_req->p_query_input))-> @@ -642,6 +643,7 @@ >>> > memcpy(&path_rec.sgid, >>> > &((osmv_gid_pair_t *) >>> > (p_query_req->p_query_input))-> >>> > src_gid, sizeof(ib_gid_t)); >>> > + path_rec.num_path = 1; >>> > break; >>> > >>> > case OSMV_QUERY_PATH_REC_BY_LIDS: >>> > @@ -652,13 +654,14 @@ >>> > sa_mad_data.attr_offset = >>> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >>> > sa_mad_data.comp_mask = >>> > - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); >>> > + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | >>> > IB_PR_COMPMASK_NUMBPATH); >>> > sa_mad_data.p_attr = &path_rec; >>> > path_rec.dlid = >>> > ((osmv_lid_pair_t *) >>> (p_query_req->p_query_input))-> >>> > dest_lid; >>> > path_rec.slid = >>> > ((osmv_lid_pair_t *) >>> > (p_query_req->p_query_input))->src_lid; >>> > + path_rec.num_path = 1; >>> > break; >>> > >>> > case OSMV_QUERY_UD_MULTICAST_SET: >>> > --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 >>> > +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 >>> > @@ -743,7 +743,7 @@ >>> > sa_mad_data.attr_offset = >>> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >>> > sa_mad_data.comp_mask = >>> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >>> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >>> > IB_PR_COMPMASK_NUMBPATH); >>> > sa_mad_data.p_attr = &path_rec; >>> > ib_gid_set_default(&path_rec.dgid, >>> > ((osmv_guid_pair_t *) >>> > (p_query_req-> @@ -753,6 +753,7 @@ >>> > ((osmv_guid_pair_t *) >>> > (p_query_req-> >>> > >>> > p_query_input))-> >>> > src_guid); >>> > + path_rec.num_path = 1; >>> > break; >>> > >>> > case OSMV_QUERY_PATH_REC_BY_GIDS: >>> > @@ -763,7 +764,7 @@ >>> > sa_mad_data.attr_offset = >>> > ib_get_attr_offset(sizeof(ib_path_rec_t)); >>> > sa_mad_data.comp_mask = >>> > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >>> > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >>> > IB_PR_COMPMASK_NUMBPATH); >>> > sa_mad_data.p_attr = &path_rec; >>> > memcpy(&path_rec.dgid, >>> > &((osmv_gid_pair_t *) >>> > (p_query_req->p_query_input))-> @@ -771,6 +772,7 @@ >>> > memcpy(&path_rec.sgid, >>> > &((osmv_gid_pair_t *) >>> > (p_query_req->p_query_input))-> >>> > src_gid, sizeof(ib_gid_t)); >>> > + path_rec.num_path = 1; >>> > break; >>> > >>> > case OSMV_QUERY_PATH_REC_BY_LIDS: >>> > @@ -789,6 +791,7 @@ >>> > dest_lid; >>> > path_rec.slid = >>> > ((osmv_lid_pair_t *) >>> > (p_query_req->p_query_input))->src_lid; >>> > + path_rec.num_path = 1; >>> > break; >>> > >>> > case OSMV_QUERY_UD_MULTICAST_SET: >>> > >>> > >>> > -- >>> > Michael Heinz >>> > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania >>> > >>> >>> > _______________________________________________ >>> > general mailing list >>> > general at lists.openfabrics.org >>> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> > >>> > To unsubscribe, please visit >>> > http://openib.org/mailman/listinfo/openib-general >> > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Mon Dec 15 08:17:32 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 15 Dec 2008 11:17:32 -0500 Subject: ***SPAM*** Re: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> Message-ID: On Mon, Dec 15, 2008 at 11:03 AM, Hal Rosenstock wrote: > The spec says this (for GetTable) and Gets are requests for 1 path. > The reason is to limit the amount of returned path records (and the > field limits to 255 records in the response). ^^^^ Meant 127 here... From michael.heinz at qlogic.com Mon Dec 15 08:25:46 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Mon, 15 Dec 2008 10:25:46 -0600 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: Message-ID: As long as any API change permits interoperation with spec-compliant SMs, that's fine. At the moment, the infinibanddiags tool, saquery, will fail when looking for path records on SMs other than OpenSM; that was the problem that actually sent me down this path. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] Sent: Monday, December 15, 2008 10:54 AM To: Mike Heinz Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] Bugs in opensm/libvendor Mike, On Mon, Dec 15, 2008 at 10:41 AM, Mike Heinz wrote: > Hal, > > I could be wrong, but as I understand it, this function does not > permit the user to request more than one path. The API takes a query > request which contains pair of lids or guids, but does not have a > field for specifying the number of paths. You're right; the path requests do not currently support num_paths unlike multipath requests. But there is reliance on this "feature". So the current API needs enhancement to support this IMO to preserve backward compatibility. > -- > Michael Heinz > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > -----Original Message----- > From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] > Sent: Monday, December 15, 2008 10:31 AM > To: Mike Heinz > Cc: general at lists.openfabrics.org; John Russo > Subject: Re: [ofa-general] Bugs in opensm/libvendor > > On Wed, Dec 10, 2008 at 1:31 PM, Mike Heinz > wrote: >> While experimenting with the APIs in opensm/libvendor, In tree or out of tree application(s) ? > I was unable to > >> get the path record queries to work. Reviewing the error logs from >> the > >> SM, > > Which SM(s) ? Which SM(s) ? -- Hal >> I discovered that the APIs were not setting the required num_path > field. >> Here's the fix: > > The approach used breaks backward compatibility which IMO should be > preserved. > I think a better approach is as follows: > 1. Set num_paths in the application(s) as desired 2. In the library, > check for num_paths 0 or not. If not, then set the numbpath compmask > bit. > > -- Hal > >> --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 >> +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 >> @@ -615,7 +615,7 @@ >> sa_mad_data.attr_offset = >> ib_get_attr_offset(sizeof(ib_path_rec_t)); >> sa_mad_data.comp_mask = >> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> IB_PR_COMPMASK_NUMBPATH); >> sa_mad_data.p_attr = &path_rec; >> ib_gid_set_default(&path_rec.dgid, >> ((osmv_guid_pair_t *) >> (p_query_req-> @@ -625,6 +625,7 @@ >> ((osmv_guid_pair_t *) > (p_query_req-> >> > p_query_input))-> >> src_guid); >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_PATH_REC_BY_GIDS: >> @@ -634,7 +635,7 @@ >> sa_mad_data.attr_offset = >> ib_get_attr_offset(sizeof(ib_path_rec_t)); >> sa_mad_data.comp_mask = >> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> IB_PR_COMPMASK_NUMBPATH); >> sa_mad_data.p_attr = &path_rec; >> memcpy(&path_rec.dgid, >> &((osmv_gid_pair_t *) >> (p_query_req->p_query_input))-> @@ -642,6 +643,7 @@ >> memcpy(&path_rec.sgid, >> &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> >> src_gid, sizeof(ib_gid_t)); >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_PATH_REC_BY_LIDS: >> @@ -652,13 +654,14 @@ >> sa_mad_data.attr_offset = >> ib_get_attr_offset(sizeof(ib_path_rec_t)); >> sa_mad_data.comp_mask = >> - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); >> + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | >> IB_PR_COMPMASK_NUMBPATH); >> sa_mad_data.p_attr = &path_rec; >> path_rec.dlid = >> ((osmv_lid_pair_t *) > (p_query_req->p_query_input))-> >> dest_lid; >> path_rec.slid = >> ((osmv_lid_pair_t *) >> (p_query_req->p_query_input))->src_lid; >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_UD_MULTICAST_SET: >> --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 >> +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 >> @@ -743,7 +743,7 @@ >> sa_mad_data.attr_offset = >> ib_get_attr_offset(sizeof(ib_path_rec_t)); >> sa_mad_data.comp_mask = >> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> IB_PR_COMPMASK_NUMBPATH); >> sa_mad_data.p_attr = &path_rec; >> ib_gid_set_default(&path_rec.dgid, >> ((osmv_guid_pair_t *) >> (p_query_req-> @@ -753,6 +753,7 @@ >> ((osmv_guid_pair_t *) > (p_query_req-> >> > p_query_input))-> >> src_guid); >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_PATH_REC_BY_GIDS: >> @@ -763,7 +764,7 @@ >> sa_mad_data.attr_offset = >> ib_get_attr_offset(sizeof(ib_path_rec_t)); >> sa_mad_data.comp_mask = >> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >> IB_PR_COMPMASK_NUMBPATH); >> sa_mad_data.p_attr = &path_rec; >> memcpy(&path_rec.dgid, >> &((osmv_gid_pair_t *) >> (p_query_req->p_query_input))-> @@ -771,6 +772,7 @@ >> memcpy(&path_rec.sgid, >> &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> >> src_gid, sizeof(ib_gid_t)); >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_PATH_REC_BY_LIDS: >> @@ -789,6 +791,7 @@ >> dest_lid; >> path_rec.slid = >> ((osmv_lid_pair_t *) >> (p_query_req->p_query_input))->src_lid; >> + path_rec.num_path = 1; >> break; >> >> case OSMV_QUERY_UD_MULTICAST_SET: >> >> -- >> Michael Heinz >> Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > From hal.rosenstock at gmail.com Mon Dec 15 08:31:26 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Mon, 15 Dec 2008 11:31:26 -0500 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: Message-ID: On Mon, Dec 15, 2008 at 11:25 AM, Mike Heinz wrote: > As long as any API change permits interoperation with spec-compliant > SMs, that's fine. That's exactly the idea; Allow for both (spec compliant and enhanced) modes of operation. -- Hal >At the moment, the infinibanddiags tool, saquery, will > fail when looking for path records on SMs other than OpenSM; that was > the problem that actually sent me down this path. > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] > Sent: Monday, December 15, 2008 10:54 AM > To: Mike Heinz > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] Bugs in opensm/libvendor > > Mike, > > On Mon, Dec 15, 2008 at 10:41 AM, Mike Heinz > wrote: >> Hal, >> >> I could be wrong, but as I understand it, this function does not >> permit the user to request more than one path. The API takes a query >> request which contains pair of lids or guids, but does not have a >> field for specifying the number of paths. > > You're right; the path requests do not currently support num_paths > unlike multipath requests. But there is reliance on this "feature". So > the current API needs enhancement to support this IMO to preserve > backward compatibility. > >> -- >> Michael Heinz >> Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania >> >> -----Original Message----- >> From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] >> Sent: Monday, December 15, 2008 10:31 AM >> To: Mike Heinz >> Cc: general at lists.openfabrics.org; John Russo >> Subject: Re: [ofa-general] Bugs in opensm/libvendor >> >> On Wed, Dec 10, 2008 at 1:31 PM, Mike Heinz >> wrote: >>> While experimenting with the APIs in opensm/libvendor, > > In tree or out of tree application(s) ? > >> I was unable to >> >>> get the path record queries to work. Reviewing the error logs from >>> the >> >>> SM, >> >> Which SM(s) ? > > Which SM(s) ? > > -- Hal > >>> I discovered that the APIs were not setting the required num_path >> field. >>> Here's the fix: >> >> The approach used breaks backward compatibility which IMO should be >> preserved. >> I think a better approach is as follows: >> 1. Set num_paths in the application(s) as desired 2. In the library, >> check for num_paths 0 or not. If not, then set the numbpath compmask >> bit. >> >> -- Hal >> >>> --- osm_vendor_ibumad_sa.bak 2008-12-10 13:21:22.000000000 -0500 >>> +++ osm_vendor_ibumad_sa.c 2008-12-10 13:24:42.000000000 -0500 >>> @@ -615,7 +615,7 @@ >>> sa_mad_data.attr_offset = >>> ib_get_attr_offset(sizeof(ib_path_rec_t)); >>> sa_mad_data.comp_mask = >>> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >>> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >>> IB_PR_COMPMASK_NUMBPATH); >>> sa_mad_data.p_attr = &path_rec; >>> ib_gid_set_default(&path_rec.dgid, >>> ((osmv_guid_pair_t *) >>> (p_query_req-> @@ -625,6 +625,7 @@ >>> ((osmv_guid_pair_t *) >> (p_query_req-> >>> >> p_query_input))-> >>> src_guid); >>> + path_rec.num_path = 1; >>> break; >>> >>> case OSMV_QUERY_PATH_REC_BY_GIDS: >>> @@ -634,7 +635,7 @@ >>> sa_mad_data.attr_offset = >>> ib_get_attr_offset(sizeof(ib_path_rec_t)); >>> sa_mad_data.comp_mask = >>> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >>> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >>> IB_PR_COMPMASK_NUMBPATH); >>> sa_mad_data.p_attr = &path_rec; >>> memcpy(&path_rec.dgid, >>> &((osmv_gid_pair_t *) >>> (p_query_req->p_query_input))-> @@ -642,6 +643,7 @@ >>> memcpy(&path_rec.sgid, >>> &((osmv_gid_pair_t *) >> (p_query_req->p_query_input))-> >>> src_gid, sizeof(ib_gid_t)); >>> + path_rec.num_path = 1; >>> break; >>> >>> case OSMV_QUERY_PATH_REC_BY_LIDS: >>> @@ -652,13 +654,14 @@ >>> sa_mad_data.attr_offset = >>> ib_get_attr_offset(sizeof(ib_path_rec_t)); >>> sa_mad_data.comp_mask = >>> - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); >>> + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | >>> IB_PR_COMPMASK_NUMBPATH); >>> sa_mad_data.p_attr = &path_rec; >>> path_rec.dlid = >>> ((osmv_lid_pair_t *) >> (p_query_req->p_query_input))-> >>> dest_lid; >>> path_rec.slid = >>> ((osmv_lid_pair_t *) >>> (p_query_req->p_query_input))->src_lid; >>> + path_rec.num_path = 1; >>> break; >>> >>> case OSMV_QUERY_UD_MULTICAST_SET: >>> --- osm_vendor_mlx_sa.bak 2008-12-10 13:21:10.000000000 -0500 >>> +++ osm_vendor_mlx_sa.c 2008-12-10 13:24:07.000000000 -0500 >>> @@ -743,7 +743,7 @@ >>> sa_mad_data.attr_offset = >>> ib_get_attr_offset(sizeof(ib_path_rec_t)); >>> sa_mad_data.comp_mask = >>> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >>> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >>> IB_PR_COMPMASK_NUMBPATH); >>> sa_mad_data.p_attr = &path_rec; >>> ib_gid_set_default(&path_rec.dgid, >>> ((osmv_guid_pair_t *) >>> (p_query_req-> @@ -753,6 +753,7 @@ >>> ((osmv_guid_pair_t *) >> (p_query_req-> >>> >> p_query_input))-> >>> src_guid); >>> + path_rec.num_path = 1; >>> break; >>> >>> case OSMV_QUERY_PATH_REC_BY_GIDS: >>> @@ -763,7 +764,7 @@ >>> sa_mad_data.attr_offset = >>> ib_get_attr_offset(sizeof(ib_path_rec_t)); >>> sa_mad_data.comp_mask = >>> - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); >>> + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | >>> IB_PR_COMPMASK_NUMBPATH); >>> sa_mad_data.p_attr = &path_rec; >>> memcpy(&path_rec.dgid, >>> &((osmv_gid_pair_t *) >>> (p_query_req->p_query_input))-> @@ -771,6 +772,7 @@ >>> memcpy(&path_rec.sgid, >>> &((osmv_gid_pair_t *) >> (p_query_req->p_query_input))-> >>> src_gid, sizeof(ib_gid_t)); >>> + path_rec.num_path = 1; >>> break; >>> >>> case OSMV_QUERY_PATH_REC_BY_LIDS: >>> @@ -789,6 +791,7 @@ >>> dest_lid; >>> path_rec.slid = >>> ((osmv_lid_pair_t *) >>> (p_query_req->p_query_input))->src_lid; >>> + path_rec.num_path = 1; >>> break; >>> >>> case OSMV_QUERY_UD_MULTICAST_SET: >>> >>> -- >>> Michael Heinz >>> Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania >>> >>> _______________________________________________ >>> general mailing list >>> general at lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general >>> >> > From yosefe at Voltaire.COM Mon Dec 15 10:04:14 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Mon, 15 Dec 2008 20:04:14 +0200 Subject: [ofa-general] [PATCH 0/4] ipoib: patchset for 2.6.29 Message-ID: <49469C1E.8010307@Voltaire.COM> Hi Roland, I have some patches for review. They fix different issues in ipoib. -- --Yossi From yossi.openib at gmail.com Mon Dec 15 10:11:23 2008 From: yossi.openib at gmail.com (Yossi Etigin) Date: Mon, 15 Dec 2008 20:11:23 +0200 Subject: [ofa-general] ***SPAM*** [PATCH 1/4] ipoib: do not join broadcast group if interface is brought down In-Reply-To: <49469C1E.8010307@Voltaire.COM> References: <49469C1E.8010307@Voltaire.COM> Message-ID: <49469DCB.9090802@gmail.com> Because ipoib_workqueue is not flushed when ipoib interface is brought down, ipoib_mcast_join() may trigger a join to the broadcast group after priv->broadcast was set to NULL (during cleanup). This will cause ipoib to be joined to the broadcast group when interface is down. As a side effect, this breaks the optimization of setting qkey only when joining the broadcast group. Signed-off-by: Yossi Etigin -- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-11-19 21:33:54.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-11-26 18:08:48.000000000 +0200 @@ -497,7 +497,7 @@ static void ipoib_mcast_join(struct net_ IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE; - if (create) { + if (create && priv->broadcast) { comp_mask |= IB_SA_MCMEMBER_REC_QKEY | IB_SA_MCMEMBER_REC_MTU_SELECTOR | @@ -565,7 +565,8 @@ void ipoib_mcast_join_task(struct work_s ipoib_warn(priv, "ib_query_port failed\n"); } - if (!priv->broadcast) { + rtnl_lock(); + if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && !priv->broadcast) { struct ipoib_mcast *broadcast; broadcast = ipoib_mcast_alloc(dev, 1); @@ -576,6 +577,7 @@ void ipoib_mcast_join_task(struct work_s queue_delayed_work(ipoib_workqueue, &priv->mcast_join_task, HZ); mutex_unlock(&mcast_mutex); + rtnl_unlock(); return; } @@ -587,8 +589,10 @@ void ipoib_mcast_join_task(struct work_s __ipoib_mcast_add(dev, priv->broadcast); spin_unlock_irq(&priv->lock); } + rtnl_unlock(); - if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { + if (priv->broadcast && + !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) ipoib_mcast_join(dev, priv->broadcast, 0); return; @@ -617,7 +621,8 @@ void ipoib_mcast_join_task(struct work_s return; } - priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu)); + if (priv->broadcast) + priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu)); if (!ipoib_cm_admin_enabled(dev)) { rtnl_lock(); From yosefe at Voltaire.COM Mon Dec 15 10:14:42 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Mon, 15 Dec 2008 20:14:42 +0200 Subject: [ofa-general] [PATCH 2/4] ipoib: fix loss of connectivity after bonding failover on both sides In-Reply-To: <49469C1E.8010307@Voltaire.COM> References: <49469C1E.8010307@Voltaire.COM> Message-ID: <49469E92.3060001@Voltaire.COM> Fix bonding failover in the case poth peers have failover and gratuitous arp is lost. In that case, ipoib sender side will create ipoib_neigh and issue a path request with the old gid first. When skb->dst->neighbour->ha changes due to arp refresh, ipoib_neigh will not be added to the path->list of the path of the new mgid, because ipoib_neigh already exists. It will not have an ah either, because of sender-side failover. Therefore, it will not get an ah when the path is resolved. The solution here is to compare gids even if neigh->ah is invalid, also initiallize neigh->dgid.raw to have value to compare with. Signed-off-by: Moni Shoua Signed-off-by: Yossi Etigin --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 38 +++++++++++++++--------------- 1 file changed, 19 insertions(+), 19 deletions(-) Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-12-15 19:53:16.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-12-15 19:53:37.000000000 +0200 @@ -687,26 +687,26 @@ static int ipoib_start_xmit(struct sk_bu neigh = *to_ipoib_neigh(skb->dst->neighbour); - if (neigh->ah) - if (unlikely((memcmp(&neigh->dgid.raw, - skb->dst->neighbour->ha + 4, - sizeof(union ib_gid))) || - (neigh->dev != dev))) { - spin_lock_irqsave(&priv->lock, flags); - /* - * It's safe to call ipoib_put_ah() inside - * priv->lock here, because we know that - * path->ah will always hold one more reference, - * so ipoib_put_ah() will never do more than - * decrement the ref count. - */ + if (unlikely((memcmp(&neigh->dgid.raw, + skb->dst->neighbour->ha + 4, + sizeof(union ib_gid))) || + (neigh->dev != dev))) { + spin_lock_irqsave(&priv->lock, flags); + /* + * It's safe to call ipoib_put_ah() inside + * priv->lock here, because we know that + * path->ah will always hold one more reference, + * so ipoib_put_ah() will never do more than + * decrement the ref count. + */ + if (neigh->ah) ipoib_put_ah(neigh->ah); - list_del(&neigh->list); - ipoib_neigh_free(dev, neigh); - spin_unlock_irqrestore(&priv->lock, flags); - ipoib_path_lookup(skb, dev); - return NETDEV_TX_OK; - } + list_del(&neigh->list); + ipoib_neigh_free(dev, neigh); + spin_unlock_irqrestore(&priv->lock, flags); + ipoib_path_lookup(skb, dev); + return NETDEV_TX_OK; + } if (ipoib_cm_get(neigh)) { if (ipoib_cm_up(neigh)) { From hrosenstock at obsidianresearch.com Mon Dec 15 10:22:11 2008 From: hrosenstock at obsidianresearch.com (Hal Rosenstock) Date: Mon, 15 Dec 2008 11:22:11 -0700 Subject: [ofa-general] [PATCH] [TRIVIAL] opensm/libvendor/osm_vendor_sa_api.h: Fix commentary typo Message-ID: <1229365331.29873.385.camel@bertha1.edm.orcorp.ca> Sasha, Attached patch fixed commentary typo in osm_vendor_sa_api.h -- Hal -------------- next part -------------- opensm/libvendor/osm_vendor_sa_api.h: Fix commentary typo Signed-off-by: Hal Rosenstock diff --git a/opensm/include/vendor/osm_vendor_sa_api.h b/opensm/include/vendor/osm_vendor_sa_api.h index 4a4eeaf..dd37c3a 100644 --- a/opensm/include/vendor/osm_vendor_sa_api.h +++ b/opensm/include/vendor/osm_vendor_sa_api.h @@ -753,7 +753,7 @@ typedef struct _osmv_query_req { * and is determined by the specified query_type. * * sm_key -* The M_Key to be provided with the SA MAD for authentication. +* The SM_Key to be provided with the SA MAD for authentication. * Normally 0 is used. * * timeout_ms From yosefe at Voltaire.COM Mon Dec 15 10:31:40 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Mon, 15 Dec 2008 20:31:40 +0200 Subject: [ofa-general] [PATCH 3/4] ipoib: fix a deadlock between ipoib start/stop and child interface create/delete In-Reply-To: <49469C1E.8010307@Voltaire.COM> References: <49469C1E.8010307@Voltaire.COM> Message-ID: <4946A28C.8030409@Voltaire.COM> Fix a deadlock between child interface creation/deletion and ipoib start/stop. The former takes first vlan_mutex, and might take rtnl_lock via register_netdev or unregister_netdev. The latter is executed with rtnl_lock held, and tries to take vlan_mutex. We take the vlan_mutex and bring child interface up/down on a scheduled task instead of during stop/start, since ipoib_workqueue will not be flushed with rtnl_lock held. Signed-off-by: Yossi Etigin --- Fix bug #1198. An alternative approach might be to fine-grain the locking (for example use one mutex to sync child creation/deletion, and another one to sync accesses to child_intfs list). drivers/infiniband/ulp/ipoib/ipoib.h | 3 ++ drivers/infiniband/ulp/ipoib/ipoib_main.c | 33 ++++-------------------------- drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 22 ++++++++++++++++++++ 3 files changed, 30 insertions(+), 28 deletions(-) Index: b/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib.h 2008-12-15 20:25:41.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib.h 2008-12-15 20:26:30.000000000 +0200 @@ -298,6 +298,8 @@ struct ipoib_dev_priv { struct work_struct flush_heavy; struct work_struct restart_task; struct delayed_work ah_reap_task; + struct work_struct vlan_task; + atomic_t vlan_task_flag; struct ib_device *ca; u8 port; @@ -501,6 +503,7 @@ void ipoib_event(struct ib_event_handler int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey); int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey); +void ipoib_vlan_task(struct work_struct *work); void ipoib_pkey_poll(struct work_struct *work); int ipoib_pkey_dev_delay_open(struct net_device *dev); Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-12-15 20:25:41.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-12-15 20:26:30.000000000 +0200 @@ -125,20 +125,8 @@ int ipoib_open(struct net_device *dev) } if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { - struct ipoib_dev_priv *cpriv; - - /* Bring up any child interfaces too */ - mutex_lock(&priv->vlan_mutex); - list_for_each_entry(cpriv, &priv->child_intfs, list) { - int flags; - - flags = cpriv->dev->flags; - if (flags & IFF_UP) - continue; - - dev_change_flags(cpriv->dev, flags | IFF_UP); - } - mutex_unlock(&priv->vlan_mutex); + atomic_set(&priv->vlan_task_flag, 1); + queue_work(ipoib_workqueue, &priv->vlan_task); } netif_start_queue(dev); @@ -161,20 +149,8 @@ static int ipoib_stop(struct net_device ipoib_ib_dev_stop(dev, 0); if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { - struct ipoib_dev_priv *cpriv; - - /* Bring down any child interfaces too */ - mutex_lock(&priv->vlan_mutex); - list_for_each_entry(cpriv, &priv->child_intfs, list) { - int flags; - - flags = cpriv->dev->flags; - if (!(flags & IFF_UP)) - continue; - - dev_change_flags(cpriv->dev, flags & ~IFF_UP); - } - mutex_unlock(&priv->vlan_mutex); + atomic_set(&priv->vlan_task_flag, 0); + queue_work(ipoib_workqueue, &priv->vlan_task); } return 0; @@ -1071,6 +1047,7 @@ static void ipoib_setup(struct net_devic INIT_WORK(&priv->flush_heavy, ipoib_ib_dev_flush_heavy); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); + INIT_WORK(&priv->vlan_task, ipoib_vlan_task); } struct ipoib_dev_priv *ipoib_intf_alloc(const char *name) Index: b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2008-12-15 20:25:41.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2008-12-15 20:26:30.000000000 +0200 @@ -178,3 +178,25 @@ int ipoib_vlan_delete(struct net_device return ret; } + +void ipoib_vlan_task(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, vlan_task); + struct ipoib_dev_priv *cpriv; + int flags, new_flags, iffup_value; + + iffup_value = atomic_read(&priv->vlan_task_flag) ? IFF_UP : 0; + + mutex_lock(&priv->vlan_mutex); + list_for_each_entry(cpriv, &priv->child_intfs, list) { + flags = cpriv->dev->flags; + new_flags = (flags & ~IFF_UP) | iffup_value; + if (flags != new_flags) { + rtnl_lock(); + dev_change_flags(cpriv->dev, new_flags); + rtnl_unlock(); + } + } + mutex_unlock(&priv->vlan_mutex); +} From yosefe at Voltaire.COM Mon Dec 15 10:34:10 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Mon, 15 Dec 2008 20:34:10 +0200 Subject: [ofa-general] [PATCH 4/4] ipoib: do not print error messages for multicast join retries In-Reply-To: <49469C1E.8010307@Voltaire.COM> References: <49469C1E.8010307@Voltaire.COM> Message-ID: <4946A322.9020507@Voltaire.COM> When trying to join multicast from ipoib and SM address handle is NULL, the join returns with -EAGAIN status. In that case, do not print an error. Signed-off-by: Yossi Etigin -- Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-10-22 20:28:06.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-10-27 20:13:59.000000000 +0200 @@ -443,7 +443,7 @@ static int ipoib_mcast_join_complete(int } if (mcast->logcount++ < 20) { - if (status == -ETIMEDOUT) { + if (status == -ETIMEDOUT || status == -EAGAIN) { ipoib_dbg_mcast(priv, "multicast join failed for " IPOIB_GID_FMT ", status %d\n", IPOIB_GID_ARG(mcast->mcmember.mgid), From sean.hefty at intel.com Mon Dec 15 11:08:44 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 15 Dec 2008 11:08:44 -0800 Subject: [ofa-general] RE: porting IB management code to Windows In-Reply-To: <20081213203014.GU15622@sashak.voltaire.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081213203014.GU15622@sashak.voltaire.com> Message-ID: <000001c95ee8$8dc7b970$0ce1180a@amr.corp.intel.com> >> We've started porting the IB management code (IB-diags at this point) to >> Windows. My strong preference is to avoid branching the code and instead >keep a >> single source code tree. Is there any objection to accepting changes against >> the management tree to allow the code to run on both Linux and Windows? > >Basically I have no objections against porting changes. And I also would >prefer to keep a single code base. > >However, I would prefer to minimize amount of needed changes and would >really prefer to not get a lot of limitations in using modern C. I will >comment inline in the patch example below. As long as we have agreement to work towards a single code base, we can be flexible on the changes. I built the code by dropping it into the Windows build tree and using the WDK build environment. There's not a strong reason to require building using the WDK, so I'll look at alternatives to building the diag code on Windows. - Sean From rpearson at systemfabricworks.com Mon Dec 15 12:07:40 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Mon, 15 Dec 2008 14:07:40 -0600 Subject: [ofa-general] RE: [PATCH] [5 of 10] [REVISED] mesh analysis - local geometry Message-ID: <01f901c95ef0$c8be3380$5a3a9a80$@com> Here is the missing p5 From: Robert Pearson [mailto:rpearson at systemfabricworks.com] Sent: Wednesday, December 10, 2008 12:28 PM To: 'Sasha Khapyorsky'; 'general at lists.openfabrics.org' Subject: [PATCH] [5 of 10] [REVISED] mesh analysis - local geometry Sasha, Here is a revised mesh patch #5 that incorporates changes based on your comments. This patch implements - routine to compute characteristics polynomial of a matrix - routine to compute the local 'metric' around each switch - routine to classify switches into a histogram of local geometry classes Regards, Bob Pearson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p5 Type: application/octet-stream Size: 5342 bytes Desc: not available URL: From rpearson at systemfabricworks.com Mon Dec 15 12:08:31 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Mon, 15 Dec 2008 14:08:31 -0600 Subject: [ofa-general] RE: [PATCH] [10 of 10] [REVISED] mesh analysis - integrate into lash core Message-ID: <020301c95ef0$e8555c00$b9001400$@com> The p3 fix impacted this one as well. The rest are unchanged. -----Original Message----- From: Robert Pearson [mailto:rpearson at systemfabricworks.com] Sent: Wednesday, December 10, 2008 12:56 PM To: 'Sasha Khapyorsky'; 'general at lists.openfabrics.org' Subject: [PATCH] [10 of 10] [REVISED] mesh analysis - integrate into lash core Sasha, Here is a revised mesh patch #10 that incorporates changes based on your comments. This patch - hooks mesh code into lash - replaces sw->phys_connections by the equivalent switch->node->links - replaces sw->num_connections by the equivalent switch->node->num_links - replaces sw->virtual_physical_port_table by switch->node->links[]->ports When the do_mesh_analysis flag is not set there is no change to the function except To replace the variables with variables in node that have the same size. In this Case the port table in link_t will always have just one port. When the do_mesh_analysis flag is set multiple physical links will collapse to a Single logical link with a port list with more than one element. - rewrote connect switches to use variables in node - in log Lane requirements (%d) exceed available lanes (%d) Arguments were reversed, fixed - compute physical egress port in routine get_next_port Which will use round robin if there are more than one Physical links between switches Regards, Bob Pearson -------------- next part -------------- A non-text attachment was scrubbed... Name: p10 Type: application/octet-stream Size: 11387 bytes Desc: not available URL: From rpearson at systemfabricworks.com Mon Dec 15 13:29:12 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Mon, 15 Dec 2008 15:29:12 -0600 Subject: [ofa-general] FW: [PATCH] [3 of 10] [REVISED] mesh analysis - node and link structures Message-ID: <021c01c95efc$2d304f00$8790ed00$@com> From: Robert Pearson [mailto:rpearson at systemfabricworks.com] Sent: Monday, December 15, 2008 2:07 PM To: 'Robert Pearson' Subject: RE: [PATCH] [3 of 10] [REVISED] mesh analysis - node and link structures I still need training wheels for git. Sorry. Here is the correct p3. From: Robert Pearson [mailto:rpearson at systemfabricworks.com] Sent: Wednesday, December 10, 2008 12:12 PM To: 'Sasha Khapyorsky'; 'general at lists.openfabrics.org' Subject: [PATCH] [3 of 10] [REVISED] mesh analysis - node and link structures Sasha, Here is a revised mesh patch #3 that incorporates changes based on your comments. This patch - create a per logical switch to switch link structure link_t - creates per mesh node (e.g. switch) data structure mesh_node_t - adds a pointer to mesh_node_t in the switch_t structure in *lash.h - implements create and delete methods for mesh_node_t - calls these in switch_create and swich_delete in *lash.c Regards, Bob Pearson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p3 Type: application/octet-stream Size: 6304 bytes Desc: not available URL: From rpearson at systemfabricworks.com Mon Dec 15 13:55:30 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Mon, 15 Dec 2008 15:55:30 -0600 Subject: [ofa-general] FW: [PATCH] [5 of 10] [REVISED] mesh analysis - local geometry Message-ID: <022801c95eff$d91456b0$8b3d0410$@com> Resending. Last one seems to have gone missing From: Robert Pearson [mailto:rpearson at systemfabricworks.com] Sent: Monday, December 15, 2008 2:08 PM To: 'Robert Pearson'; 'Sasha Khapyorsky'; 'general at lists.openfabrics.org' Subject: RE: [PATCH] [5 of 10] [REVISED] mesh analysis - local geometry Here is the missing p5 From: Robert Pearson [mailto:rpearson at systemfabricworks.com] Sent: Wednesday, December 10, 2008 12:28 PM To: 'Sasha Khapyorsky'; 'general at lists.openfabrics.org' Subject: [PATCH] [5 of 10] [REVISED] mesh analysis - local geometry Sasha, Here is a revised mesh patch #5 that incorporates changes based on your comments. This patch implements - routine to compute characteristics polynomial of a matrix - routine to compute the local 'metric' around each switch - routine to classify switches into a histogram of local geometry classes Regards, Bob Pearson -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p5 Type: application/octet-stream Size: 5342 bytes Desc: not available URL: From rpearson at systemfabricworks.com Mon Dec 15 13:55:50 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Mon, 15 Dec 2008 15:55:50 -0600 Subject: [ofa-general] FW: [PATCH] [10 of 10] [REVISED] mesh analysis - integrate into lash core Message-ID: <022e01c95eff$e52fdb40$af8f91c0$@com> resending -----Original Message----- From: Robert Pearson [mailto:rpearson at systemfabricworks.com] Sent: Monday, December 15, 2008 2:09 PM To: 'Robert Pearson'; 'Sasha Khapyorsky'; 'general at lists.openfabrics.org' Subject: RE: [PATCH] [10 of 10] [REVISED] mesh analysis - integrate into lash core The p3 fix impacted this one as well. The rest are unchanged. -----Original Message----- From: Robert Pearson [mailto:rpearson at systemfabricworks.com] Sent: Wednesday, December 10, 2008 12:56 PM To: 'Sasha Khapyorsky'; 'general at lists.openfabrics.org' Subject: [PATCH] [10 of 10] [REVISED] mesh analysis - integrate into lash core Sasha, Here is a revised mesh patch #10 that incorporates changes based on your comments. This patch - hooks mesh code into lash - replaces sw->phys_connections by the equivalent switch->node->links - replaces sw->num_connections by the equivalent switch->node->num_links - replaces sw->virtual_physical_port_table by switch->node->links[]->ports When the do_mesh_analysis flag is not set there is no change to the function except To replace the variables with variables in node that have the same size. In this Case the port table in link_t will always have just one port. When the do_mesh_analysis flag is set multiple physical links will collapse to a Single logical link with a port list with more than one element. - rewrote connect switches to use variables in node - in log Lane requirements (%d) exceed available lanes (%d) Arguments were reversed, fixed - compute physical egress port in routine get_next_port Which will use round robin if there are more than one Physical links between switches Regards, Bob Pearson -------------- next part -------------- A non-text attachment was scrubbed... Name: p10 Type: application/octet-stream Size: 11387 bytes Desc: not available URL: From sashak at voltaire.com Mon Dec 15 23:27:44 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 16 Dec 2008 09:27:44 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> Message-ID: <20081216072744.GA6780@sashak.voltaire.com> Hi Mike, On 10:07 Mon 15 Dec , Mike Heinz wrote: > > Since this function doesn't support returning multiple paths to the > caller, perhaps the correct fix is to change the method to > IB_MAD_METHOD_GET? This function supports returning multiple paths to the caller (saquery is example of such use). Sasha From sashak at voltaire.com Mon Dec 15 23:38:31 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 16 Dec 2008 09:38:31 +0200 Subject: ***SPAM*** Re: ***SPAM*** Re: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> Message-ID: <20081216073831.GB6780@sashak.voltaire.com> Hi Hal, On 11:13 Mon 15 Dec , Hal Rosenstock wrote: > > IMO the best solution is to extend the API for these to include > num_paths and support explicit settting of this as well as 0 so > backward compatibility is not lost. To preserve backward compatibility we can use num_path = 0x7f. API can be extended OTOH it has already OSMV_QUERY_USER_DEFINED - flexible way to specify any form of SA query. Sasha From sashak at voltaire.com Mon Dec 15 23:43:44 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 16 Dec 2008 09:43:44 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> Message-ID: <20081216074334.GC6780@sashak.voltaire.com> Hi again, Hal, On 11:03 Mon 15 Dec , Hal Rosenstock wrote: > On Mon, Dec 15, 2008 at 10:38 AM, Sasha Khapyorsky wrote: > > On 09:29 Mon 15 Dec , Mike Heinz wrote: > >> > >> That's a good question - and I'm going to ask around and double check. > >> My first reaction was that you have to specify how many paths you want > >> from the query - but you're right, the spec doesn't say that. > > > > Yes, it looks like this (but I cannot understand "why" :( ). > > The spec says this (for GetTable) and Gets are requests for 1 path. > The reason is to limit the amount of returned path records (and the > field limits to 255 records in the response). Do you know what is a reason for this "127 records" limitation? > >But even more > > strange (IMHO) limitation is mandatory SGID - actually it should make > > illegal such GetTable queries as all-to-all, SLID-to-all, etc.. I > > thought that it is permitted. > > It was decided to force SGID. Neither All to all nor SLID to all by > itself are spec'd (you could could add SGID along with SLID to all > though). Support for those is a proprietary OpenSM extension which is > used for testing at least (and also by saquery command). Ok. Not a bad extension IMHO :) Sasha From jenos at ncsa.uiuc.edu Tue Dec 16 02:39:33 2008 From: jenos at ncsa.uiuc.edu (Jeremy Enos) Date: Tue, 16 Dec 2008 04:39:33 -0600 Subject: [ofa-general] ipoib device not loading? In-Reply-To: <4942E7AE.3090104@morey-chaisemartin.com> References: <49418262.3070801@ncsa.uiuc.edu> <4942700F.7080401@ext.bull.net> <4942BB8D.4030800@ncsa.uiuc.edu> <4942E7AE.3090104@morey-chaisemartin.com> Message-ID: <49478565.9010201@ncsa.uiuc.edu> I expanded the ofa_kernel srpm and commented out that section of ipoib_cm.c, rebuilt the srpm, re-ran the install and it worked. It would be nice to have a switch to supply to the installer to disable ipv6 in the future. thx- Jeremy Nicolas Morey-Chaisemartin wrote: > Yes there is, I had the same problem last month. > I'm just not sure what it is the option you need. I'm not at work so I > can't check what I did. > Anyway there is a CONFIG_IPV6 and CONFIG_IPV6_MODULE used in a #ifdef in > drivers/infiniband/ulp/ipoib_cm.h > I guess you could set some flag to force them undefined but there should > and probably is a cleaner way. > > Nicolas > > Jeremy Enos a écrit : > >> Ah.. thanks. (when I installed, I just told it to install all) >> >> [root at host jenos]# modprobe ib_ipoib >> FATAL: Error inserting ib_ipoib >> (/lib/modules/2.6.27.5-41.fc9.x86_64/updates/kernel/drivers/infiniband/ulp/ipoib/ib_ipoib.ko): >> Unknown symbol in module, or unknown parameter (see dmesg) >> >> Then dmesg shows this line: >> ib_ipoib: Unknown symbol icmpv6_send >> >> I have IPv6 disabled on this host for other reasons. Perhaps that's >> causing the problem? If so, is there a way to build w/o IPv6 >> requirements? >> thx- >> >> Jeremy >> >> Nicolas Morey Chaisemartin wrote: >> >>> Jeremy Enos wrote: >>> >>>> [root at host OFED-1.4-20081209-0926]# service openibd status >>>> >>>> HCA driver loaded >>>> >>>> >>>> The following OFED modules are loaded: >>>> >>>> ib_ipath >>>> mlx4_core >>>> mlx4_ib >>>> ib_mthca >>>> ib_uverbs >>>> ib_umad >>>> ib_sa >>>> ib_cm >>>> ib_mad >>>> ib_core >>>> iw_cxgb3 >>>> >>>> [root at host OFED-1.4-20081209-0926]# ifup ib0 >>>> ib_ipoib device ib0 does not seem to be present, delaying >>>> initialization. >>>> >>>> >>>> >>> You need the ib_ipoib module to be able to use ipoib. >>> Did you put the right option (--with_ipoib_mod or something like >>> this) when running ./configure for ofa_kernel? >>> Try to run modprobe ib_ipoib. >>> If it works then, add ipoib to your /etc/infiniband/openib.conf so >>> it'll be loaded by openibd. >>> If it doesn't, so check syslog, dmesg and other logs to see it it was >>> an error or simply the file is missing >>> If it's missing, recompile your ofa_kernel with the ipoib module. >>> >>> Nicolas >>> >>> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> >> >> > > From vlad at lists.openfabrics.org Tue Dec 16 03:12:25 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 16 Dec 2008 03:12:25 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081216-0200 daily build status Message-ID: <20081216111226.4DFDAE60B89@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Passed on ppc64 with linux-2.6.18 Failed: From dorfman.eli at gmail.com Tue Dec 16 03:56:18 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Tue, 16 Dec 2008 13:56:18 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_inform.c report IB traps to plugin In-Reply-To: <20081215094927.GA22030@sashak.voltaire.com> References: <493CEBBE.2020407@gmail.com> <20081208200217.GB13924@sashak.voltaire.com> <493E3AE5.5000604@gmail.com> <20081213135051.GP15622@sashak.voltaire.com> <694d48600812140334i788338celccdc4db39eddba3@mail.gmail.com> <20081215094927.GA22030@sashak.voltaire.com> Message-ID: <49479762.6090504@gmail.com> Sasha Khapyorsky wrote: > On 13:34 Sun 14 Dec , Eli Dorfman wrote: >> >> > Did you mean to have it osm_report_notice()? Actually it is where OpenSM >>>>> sends notices, not where OpenSM gets traps. Trap receiver processor is >>>>> located in osm_trap_rcv.c. >>>> Yes that's what i meant. >>>> When OpenSM receives traps it calls osm_report_notice(). >>>> It is also call for OpenSM initiated traps (e.g. GID IN/OUT and MC CREATE/DELETE). >>> Ok. I see your point. Then why it should be limited by generic notice >>> types? >> No special reason. Just at the moment we handle only generic traps. >> we may want to report vendor specific event with other event id > > OpenSM event plugin is generic API, I think we should report any trap. > >>> Also wouldn't it be better to call plugin report callback after >>> notice was actually processed (eg. at end of this function)? >> there is no correlation between reporting an event and whether it was >> already forwarded. > > This is maybe true from plugin perspective, but reporting an event could > (at least potentially) slow down core functionality - trap sending. > > After all I think it would be better to report OSM_EVENT_ID_TRAP > unconditionally and to make this at end of osm_report_notice() function. > Agreed? Ok, i'll fix the patch From hal.rosenstock at gmail.com Tue Dec 16 04:07:30 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 16 Dec 2008 07:07:30 -0500 Subject: ***SPAM*** Re: ***SPAM*** Re: ***SPAM*** Re: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081216073831.GB6780@sashak.voltaire.com> References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> <20081216073831.GB6780@sashak.voltaire.com> Message-ID: Sasha, On Tue, Dec 16, 2008 at 2:38 AM, Sasha Khapyorsky wrote: > Hi Hal, > > On 11:13 Mon 15 Dec , Hal Rosenstock wrote: >> >> IMO the best solution is to extend the API for these to include >> num_paths and support explicit settting of this as well as 0 so >> backward compatibility is not lost. > > To preserve backward compatibility we can use num_path = 0x7f. Sure; that's another altenative to using num_path 0 and is simpler to implement than adding num_path as a parameter to the effected PR queries. This approach also has the added advantage that it should work with any SM. > API can > be extended OTOH it has already OSMV_QUERY_USER_DEFINED - flexible way to > specify any form of SA query. That's for different extension IMO (e.g. queries not supported by standard SA API). -- Hal > Sasha From dorfman.eli at gmail.com Tue Dec 16 04:10:59 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Tue, 16 Dec 2008 14:10:59 +0200 Subject: [ofa-general] ***SPAM*** Re: [PATCH v2] opensm/osm_inform.c report IB traps to plugin In-Reply-To: <20081215094927.GA22030@sashak.voltaire.com> References: <493CEBBE.2020407@gmail.com> <20081208200217.GB13924@sashak.voltaire.com> <493E3AE5.5000604@gmail.com> <20081213135051.GP15622@sashak.voltaire.com> <694d48600812140334i788338celccdc4db39eddba3@mail.gmail.com> <20081215094927.GA22030@sashak.voltaire.com> Message-ID: <49479AD3.3060505@gmail.com> report IB traps to plugin Signed-off-by: Eli Dorfman --- opensm/opensm/osm_inform.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c index f3c8ed7..2193b30 100644 --- a/opensm/opensm/osm_inform.c +++ b/opensm/opensm/osm_inform.c @@ -609,6 +609,9 @@ osm_report_notice(IN osm_log_t * const p_log, } cl_list_destroy(&infr_to_remove_list); + /* report IB traps to plugin */ + osm_opensm_report_event(p_subn->p_osm, OSM_EVENT_ID_TRAP, p_ntc); + OSM_LOG_EXIT(p_log); return (IB_SUCCESS); -- 1.5.5 From hal.rosenstock at gmail.com Tue Dec 16 04:08:54 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 16 Dec 2008 07:08:54 -0500 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081216074334.GC6780@sashak.voltaire.com> References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> <20081216074334.GC6780@sashak.voltaire.com> Message-ID: Sasha, On Tue, Dec 16, 2008 at 2:43 AM, Sasha Khapyorsky wrote: > Hi again, Hal, > > On 11:03 Mon 15 Dec , Hal Rosenstock wrote: >> On Mon, Dec 15, 2008 at 10:38 AM, Sasha Khapyorsky wrote: >> > On 09:29 Mon 15 Dec , Mike Heinz wrote: >> >> >> >> That's a good question - and I'm going to ask around and double check. >> >> My first reaction was that you have to specify how many paths you want >> >> from the query - but you're right, the spec doesn't say that. >> > >> > Yes, it looks like this (but I cannot understand "why" :( ). >> >> The spec says this (for GetTable) and Gets are requests for 1 path. >> The reason is to limit the amount of returned path records (and the >> field limits to 255 records in the response). > > Do you know what is a reason for this "127 records" limitation? Once you get past the scalability discussion (including limiting it to SGID), is there a need for more than 127 ? I think that allowing more paths is more important with various other types of wildcarded PR queries that are "beyond the spec". -- Hal >> >But even more >> > strange (IMHO) limitation is mandatory SGID - actually it should make >> > illegal such GetTable queries as all-to-all, SLID-to-all, etc.. I >> > thought that it is permitted. >> >> It was decided to force SGID. Neither All to all nor SLID to all by >> itself are spec'd (you could could add SGID along with SLID to all >> though). Support for those is a proprietary OpenSM extension which is >> used for testing at least (and also by saquery command). > > Ok. Not a bad extension IMHO :) > > Sasha > From todd.rimmer at qlogic.com Tue Dec 16 04:39:04 2008 From: todd.rimmer at qlogic.com (Todd Rimmer) Date: Tue, 16 Dec 2008 06:39:04 -0600 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081215153838.GB22506@sashak.voltaire.com> References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> Message-ID: <5AEC2602AE03EB46BFC16C6B9B200DA813477DABA0@MNEXMB2.qlogic.org> > From: general-bounces at lists.openfabrics.org [mailto:general- > bounces at lists.openfabrics.org] On Behalf Of Sasha Khapyorsky > Sent: Monday, December 15, 2008 10:39 AM > > > > That's a good question - and I'm going to ask around and double check. > > My first reaction was that you have to specify how many paths you want > > from the query - but you're right, the spec doesn't say that. > > Yes, it looks like this (but I cannot understand "why" :( ). But even more > strange (IMHO) limitation is mandatory SGID - actually it should make > illegal such GetTable queries as all-to-all, SLID-to-all, etc.. I > thought that it is permitted. [Todd Rimmer] It's about scalability. An all to all query in a fabric would return at least N^2 path records, add in an LMC, some varied SLs and PKeys and quickly it becomes ludicrously large. For example at only 25 nodes, with 8 SLs and 4 PKeys you could have 20,000 path records returned. With 1000 nodes and 1 SL and 1 PKey you have 1,000,000 path records. The practical use of GetTable(PathRecord) is for a node to find a path to another node, hence SGID limitation makes sense. > > > I'm going to do some research on my end. Are you saying that > > IB_MAD_ATTR_PATH_RECORD should only ever return a single path? > > With GetTable? I think it shouldn't (for some queries it will - such as > SLID + DLID). [Todd Rimmer] With GetTable the NumbPath parameter is required and sets a cap on the number of returned paths per SGID-DGID combination. This makes sense since most applications are only prepared to use a handful of paths to each destination. Many would only use 2 paths for a failover scheme. Those which only need 1 path can use Get(PathRecord) in which case NumbPath is implied to be 1. For scalability, it's best to avoid queries which don't specify a destination (eg. always provide DGID, DLID, etc). Queries without a destination can still be huge and consume a lot of SA and memory resources to process (eg. at 1000 nodes LMC=2, 8 SLs you get 32000 path records back). Besides its very rare than an application really wants to know the path to EVERY other node. Todd Rimmer From julia at diku.dk Tue Dec 16 07:12:16 2008 From: julia at diku.dk (Julia Lawall) Date: Tue, 16 Dec 2008 16:12:16 +0100 (CET) Subject: [ofa-general] [PATCH 2/11] drivers/infiniband: Move a dereference below a NULL test Message-ID: From: Julia Lawall In each case, if the NULL test is necessary, then the dereference should be moved below the NULL test. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // @@ type T; expression E; identifier i,fld; statement S; @@ - T i = E->fld; + T i; ... when != E when != i if (E == NULL) S + i = E->fld; // Signed-off-by: Julia Lawall --- drivers/infiniband/hw/nes/nes_cm.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c index 2caf9da..2194ee2 100644 --- a/drivers/infiniband/hw/nes/nes_cm.c +++ b/drivers/infiniband/hw/nes/nes_cm.c @@ -376,13 +376,14 @@ int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb, int close_when_complete) { unsigned long flags; - struct nes_cm_core *cm_core = cm_node->cm_core; + struct nes_cm_core *cm_core; struct nes_timer_entry *new_send; int ret = 0; u32 was_timer_set; if (!cm_node) return -EINVAL; + cm_core = cm_node->cm_core; new_send = kzalloc(sizeof(*new_send), GFP_ATOMIC); if (!new_send) return -1; From slavas at voltaire.com Tue Dec 16 07:37:15 2008 From: slavas at voltaire.com (Slava Strebkov) Date: Tue, 16 Dec 2008 17:37:15 +0200 Subject: [ofa-general] [PATCH ] Message-ID: <39C75744D164D948A170E9792AF8E7CAACE046@exil.voltaire.com> Hi, Attached patch fixes memory leak. It frees memory allocated dynamically for QOS options while doing rescan of qos configuration file. diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index c41962d..3dab92f 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -374,8 +374,17 @@ static void subn_init_qos_options(IN osm_qos_options_t * opt) { opt->max_vls = 0; opt->high_limit = -1; + if (opt->vlarb_high != OSM_DEFAULT_QOS_VLARB_HIGH) { + free(opt->vlarb_high); + } opt->vlarb_high = NULL; + if (opt->vlarb_low != OSM_DEFAULT_QOS_VLARB_LOW) { + free(opt->vlarb_low); + } opt->vlarb_low = NULL; + if (opt->sl2vl != OSM_DEFAULT_QOS_SL2VL) { + free(opt->sl2vl); + } opt->sl2vl = NULL; } Slava Strebkov -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: qos_options_free.patch Type: application/octet-stream Size: 653 bytes Desc: qos_options_free.patch URL: From jean-vincent.ficet at bull.net Tue Dec 16 07:51:34 2008 From: jean-vincent.ficet at bull.net (Vincent Ficet) Date: Tue, 16 Dec 2008 16:51:34 +0100 Subject: [ofa-general] vendstat issue with shark switch Message-ID: <4947CE86.90301@bull.net> Hi, I burnt the latest firmwares on some Mellanox shark switches (QDR / 36 ports) using MFT 2.5.0. Everything worked fine. However, the vendstat command (head of the master git branch fetched today) fails in the classportinfo query. Is this a restriction/bug in vendstat or in the switch firmware (SMA maybe ?) ? root at inti0 mft-2.5.0]# ibswitches Switch : 0x0002c90200404798 ports 36 "Infiniscale-IV Mellanox Technologies" base port 0 lid 2 lmc 0 Switch : 0x0002c902004047c0 ports 36 "Infiniscale-IV Mellanox Technologies" base port 0 lid 4 lmc 0 Switch : 0x0002c902004044e0 ports 36 "Infiniscale-IV Mellanox Technologies" base port 0 lid 5 lmc 0 [root at inti0 mft-2.5.0]# flint -d "lid-2" q Image type: FS2 FW Version: 7.1.0 Device ID: 48438 Chip Revision: A0 Description: Node Sys image GUIDs: 0002c90200404798 0002c9020040479b Board ID: (MT_0C20110003) VSD: PSID: MT_0C20110003 [root at inti0 mft-2.5.0]# vendstat -N 2 vendstat: iberror: failed: classportinfo query Any idea ? Thanks for your help, Vincent From sashak at voltaire.com Tue Dec 16 08:02:13 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 16 Dec 2008 18:02:13 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> <20081216074334.GC6780@sashak.voltaire.com> Message-ID: <20081216160213.GC7311@sashak.voltaire.com> On 07:08 Tue 16 Dec , Hal Rosenstock wrote: > > > > Do you know what is a reason for this "127 records" limitation? > > Once you get past the scalability discussion (including limiting it to > SGID), is there a need for more than 127 ? Let's say we want to get all paths between SGID A and DGID B, LMC=15 and so there could be 256 possible paths (assuming that differences are only in LIDs). Our first query will return 128 paths, then we will need to query SGID/SLID -> DGID for each SLID above base LID A + 7, obviously all this will require more queries, will pend more traffic and will hurt SA scalability more than if there would unlimited (or limited as 256) NumbPath. > I think that allowing more > paths is more important with various other types of wildcarded PR > queries that are "beyond the spec". This also could be good point. Any example? Sasha From ronli.voltaire at gmail.com Tue Dec 16 08:02:44 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Tue, 16 Dec 2008 18:02:44 +0200 Subject: [ofa-general] [PATCH 0/4] Adding new verbs: create_qp_flags In-Reply-To: References: Message-ID: <3b5e77ad0812160802o120079eak177eb65b99220830@mail.gmail.com> Hi Roland, any comments on these patches? Thanks, Ron On Thu, Dec 11, 2008 at 10:59 AM, Ron Livne wrote: > This series of patches will add a new user space verb: > ibv_create_qp_flags > > This new verb works similarly like ibv_create_qp, > excpet for the additional parameter uint32_t create_flags > that it takes in order to create the QP with the specified > creation flags. > > I've already sent a similar series of patches in July, > but it wasn't merged. > > The older patches were based on Jack Morgensteins XRC patches. > > These patches are not based on them. > > The uverbs patch was written based on the for-next branch. > > These patches don't break the ABI and are compatible with > older kernel/libibverbs versions. > > The reason I added another verb in the kernel, is because > I don't think 8 bits (the reserved bits in struct ib_uverbs_create_qp) > will be enough in the future. > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From todd.rimmer at qlogic.com Tue Dec 16 08:04:54 2008 From: todd.rimmer at qlogic.com (Todd Rimmer) Date: Tue, 16 Dec 2008 10:04:54 -0600 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081216160213.GC7311@sashak.voltaire.com> References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> <20081216074334.GC6780@sashak.voltaire.com> <20081216160213.GC7311@sashak.voltaire.com> Message-ID: <5AEC2602AE03EB46BFC16C6B9B200DA813477DABE8@MNEXMB2.qlogic.org> > From: Sasha Khapyorsky > > On 07:08 Tue 16 Dec , Hal Rosenstock wrote: > > > > > > Do you know what is a reason for this "127 records" limitation? > > > > Once you get past the scalability discussion (including limiting it to > > SGID), is there a need for more than 127 ? > > Let's say we want to get all paths between SGID A and DGID B, LMC=15 and > so there could be 256 possible paths (assuming that differences are only > in LIDs). Our first query will return 128 paths, then we will need to > query SGID/SLID -> DGID for each SLID above base LID A + 7, obviously > all this will require more queries, will pend more traffic and will hurt > SA scalability more than if there would unlimited (or limited as 256) > NumbPath. [Todd Rimmer] The maximum value for LMC=7. Hence for a given SGID-DGID pair on a specific SL, there is a limit of 127 paths. So the spec is self consistent in that regard. Todd Rimmer From hal.rosenstock at gmail.com Tue Dec 16 08:07:18 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Tue, 16 Dec 2008 11:07:18 -0500 Subject: [ofa-general] vendstat issue with shark switch In-Reply-To: <4947CE86.90301@bull.net> References: <4947CE86.90301@bull.net> Message-ID: Hi, On Tue, Dec 16, 2008 at 10:51 AM, Vincent Ficet wrote: > Hi, > > I burnt the latest firmwares on some Mellanox shark switches (QDR / 36 > ports) using MFT 2.5.0. Everything worked fine. > However, the vendstat command (head of the master git branch fetched today) > fails in the classportinfo query. > Is this a restriction/bug in vendstat or in the switch firmware (SMA maybe > ?) ? > > root at inti0 mft-2.5.0]# ibswitches > Switch : 0x0002c90200404798 ports 36 "Infiniscale-IV Mellanox > Technologies" base port 0 lid 2 lmc 0 > Switch : 0x0002c902004047c0 ports 36 "Infiniscale-IV Mellanox > Technologies" base port 0 lid 4 lmc 0 > Switch : 0x0002c902004044e0 ports 36 "Infiniscale-IV Mellanox > Technologies" base port 0 lid 5 lmc 0 > > [root at inti0 mft-2.5.0]# flint -d "lid-2" q > Image type: FS2 > FW Version: 7.1.0 > Device ID: 48438 > Chip Revision: A0 > Description: Node Sys image > GUIDs: 0002c90200404798 0002c9020040479b > Board ID: (MT_0C20110003) > VSD: PSID: MT_0C20110003 > > [root at inti0 mft-2.5.0]# vendstat -N 2 > vendstat: iberror: failed: classportinfo query > > Any idea ? ClassPortInfo is a required attribute for any supported class but I'm not 100% sure what is going on. Those vendstat options are for IS-3 and this is IS-4 although there is some device validation supported in the vendstat command. Can you show the output with -d option ? -- Hal > Thanks for your help, > > Vincent > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From sashak at voltaire.com Tue Dec 16 08:14:24 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 16 Dec 2008 18:14:24 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <5AEC2602AE03EB46BFC16C6B9B200DA813477DABA0@MNEXMB2.qlogic.org> References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> <5AEC2602AE03EB46BFC16C6B9B200DA813477DABA0@MNEXMB2.qlogic.org> Message-ID: <20081216161424.GD7311@sashak.voltaire.com> Hi Todd, On 06:39 Tue 16 Dec , Todd Rimmer wrote: > > From: general-bounces at lists.openfabrics.org [mailto:general- > > bounces at lists.openfabrics.org] On Behalf Of Sasha Khapyorsky > > Sent: Monday, December 15, 2008 10:39 AM > > > > > > That's a good question - and I'm going to ask around and double check. > > > My first reaction was that you have to specify how many paths you want > > > from the query - but you're right, the spec doesn't say that. > > > > Yes, it looks like this (but I cannot understand "why" :( ). But even more > > strange (IMHO) limitation is mandatory SGID - actually it should make > > illegal such GetTable queries as all-to-all, SLID-to-all, etc.. I > > thought that it is permitted. > [Todd Rimmer] It's about scalability. An all to all query in a fabric would return at least N^2 path records, add in an LMC, some varied SLs and PKeys and quickly it becomes ludicrously large. For example at only 25 nodes, with 8 SLs and 4 PKeys you could have 20,000 path records returned. With 1000 nodes and 1 SL and 1 PKey you have 1,000,000 path records. > > The practical use of GetTable(PathRecord) is for a node to find a path to another node, hence SGID limitation makes sense. Of course scalability is important, but I'm not sure that discussed limitations are real solution - if some port will want to get all-to-all paths information anyway it just will send much more queries and potentially will hurt SA scalability and fabric traffic even more. > > > I'm going to do some research on my end. Are you saying that > > > IB_MAD_ATTR_PATH_RECORD should only ever return a single path? > > > > With GetTable? I think it shouldn't (for some queries it will - such as > > SLID + DLID). > [Todd Rimmer] With GetTable the NumbPath parameter is required and sets a cap on the number of returned paths per SGID-DGID combination. This makes sense since most applications are only prepared to use a handful of paths to each destination. Many would only use 2 paths for a failover scheme. Those which only need 1 path can use Get(PathRecord) in which case NumbPath is implied to be 1. Sure. But whole discussion is about GetTable, not just Get (this case is clear for me). > For scalability, it's best to avoid queries which don't specify a destination (eg. always provide DGID, DLID, etc). Queries without a destination can still be huge and consume a lot of SA and memory resources to process (eg. at 1000 nodes LMC=2, 8 SLs you get 32000 path records back). Besides its very rare than an application really wants to know the path to EVERY other node. As I described in another email example, let look at practical SGID/DGID case when LMC=15 - this will require at least 256 records, and there is no (known for me) easy way to get this in IBA complaint way when NumbPath is limited by 7 bits. Sasha From stan.smith at intel.com Tue Dec 16 08:19:08 2008 From: stan.smith at intel.com (Smith, Stan) Date: Tue, 16 Dec 2008 08:19:08 -0800 Subject: [ofa-general] RE: [ofw] RE: porting IB management code to Windows In-Reply-To: <000001c95ee8$8dc7b970$0ce1180a@amr.corp.intel.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081213203014.GU15622@sashak.voltaire.com> <000001c95ee8$8dc7b970$0ce1180a@amr.corp.intel.com> Message-ID: <3F6F638B8D880340AB536D29CD4C1E192F4FE7AE@orsmsx501.amr.corp.intel.com> Sean Hefty wrote: >>> We've started porting the IB management code (IB-diags at this >>> point) to Windows. My strong preference is to avoid branching the >>> code and instead keep a single source code tree. Is there any >>> objection to accepting changes against the management tree to allow >>> the code to run on both Linux and Windows? >> >> Basically I have no objections against porting changes. And I also >> would prefer to keep a single code base. >> >> However, I would prefer to minimize amount of needed changes and >> would really prefer to not get a lot of limitations in using modern >> C. I will comment inline in the patch example below. > > As long as we have agreement to work towards a single code base, we > can be flexible on the changes. I built the code by dropping it into > the Windows build tree and using the WDK build environment. There's > not a strong reason to require building using the WDK, so I'll look > at alternatives to building the diag code on Windows. Hold the presses - there is an absolute requirement that any part of a WinOF release builds using the WDK build environment! Stan. > > - Sean > > _______________________________________________ > ofw mailing list > ofw at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw From sashak at voltaire.com Tue Dec 16 09:26:09 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 16 Dec 2008 19:26:09 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <5AEC2602AE03EB46BFC16C6B9B200DA813477DABE8@MNEXMB2.qlogic.org> References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> <20081216074334.GC6780@sashak.voltaire.com> <20081216160213.GC7311@sashak.voltaire.com> <5AEC2602AE03EB46BFC16C6B9B200DA813477DABE8@MNEXMB2.qlogic.org> Message-ID: <20081216172558.GE7311@sashak.voltaire.com> On 10:04 Tue 16 Dec , Todd Rimmer wrote: > > [Todd Rimmer] The maximum value for LMC=7. Oops, bits overflow :( . > Hence for a given SGID-DGID pair on a specific SL, there is a limit of 127 paths. So the spec is self consistent in that regard. This makes sensei then. Sasha From sashak at voltaire.com Tue Dec 16 09:28:10 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 16 Dec 2008 19:28:10 +0200 Subject: [ofa-general] Re: [PATCH] [TRIVIAL] opensm/libvendor/osm_vendor_sa_api.h: Fix commentary typo In-Reply-To: <1229365331.29873.385.camel@bertha1.edm.orcorp.ca> References: <1229365331.29873.385.camel@bertha1.edm.orcorp.ca> Message-ID: <20081216172810.GF7311@sashak.voltaire.com> On 11:22 Mon 15 Dec , Hal Rosenstock wrote: > Sasha, > > Attached patch fixed commentary typo in osm_vendor_sa_api.h > > -- Hal > opensm/libvendor/osm_vendor_sa_api.h: Fix commentary typo > > Signed-off-by: Hal Rosenstock Applied. Thanks. Sasha From sashak at voltaire.com Tue Dec 16 09:28:50 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 16 Dec 2008 19:28:50 +0200 Subject: [ofa-general] Re: [PATCH v2] opensm/osm_inform.c report IB traps to plugin In-Reply-To: <49479AD3.3060505@gmail.com> References: <493CEBBE.2020407@gmail.com> <20081208200217.GB13924@sashak.voltaire.com> <493E3AE5.5000604@gmail.com> <20081213135051.GP15622@sashak.voltaire.com> <694d48600812140334i788338celccdc4db39eddba3@mail.gmail.com> <20081215094927.GA22030@sashak.voltaire.com> <49479AD3.3060505@gmail.com> Message-ID: <20081216172850.GG7311@sashak.voltaire.com> On 14:10 Tue 16 Dec , Eli Dorfman wrote: > report IB traps to plugin > > Signed-off-by: Eli Dorfman Applied. Thanks. Sasha From sashak at voltaire.com Tue Dec 16 09:33:29 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 16 Dec 2008 19:33:29 +0200 Subject: [ofa-general] Re: [PATCH v2] opensm/osm_inform.c report IB traps to plugin In-Reply-To: <20081216172850.GG7311@sashak.voltaire.com> References: <493CEBBE.2020407@gmail.com> <20081208200217.GB13924@sashak.voltaire.com> <493E3AE5.5000604@gmail.com> <20081213135051.GP15622@sashak.voltaire.com> <694d48600812140334i788338celccdc4db39eddba3@mail.gmail.com> <20081215094927.GA22030@sashak.voltaire.com> <49479AD3.3060505@gmail.com> <20081216172850.GG7311@sashak.voltaire.com> Message-ID: <20081216173329.GH7311@sashak.voltaire.com> On 19:28 Tue 16 Dec , Sasha Khapyorsky wrote: > On 14:10 Tue 16 Dec , Eli Dorfman wrote: > > report IB traps to plugin > > > > Signed-off-by: Eli Dorfman > > Applied. Thanks. Also adding #include for OSM_EVENT_ID_TRAP definition. Sasha From hrosenstock at obsidianresearch.com Tue Dec 16 09:44:20 2008 From: hrosenstock at obsidianresearch.com (Hal Rosenstock) Date: Tue, 16 Dec 2008 10:44:20 -0700 Subject: [ofa-general] [PATCH][TRIVIAL] opensm/osm_inform.c: Eliminate compile warning Message-ID: <1229449460.29873.435.camel@bertha1.edm.orcorp.ca> Sasha, Attached patch eliminates newly introduced compile warning in osm_inform.c. -- Hal -------------- next part -------------- opensm/osm_inform.c: Eliminate compile warning Signed-off-by: Hal Rosenstock diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c index b203435..186f620 100644 --- a/opensm/opensm/osm_inform.c +++ b/opensm/opensm/osm_inform.c @@ -52,6 +52,7 @@ #include #include #include +#include typedef struct osm_infr_match_ctxt { cl_list_t *p_remove_infr_list; From sean.hefty at intel.com Tue Dec 16 10:23:03 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 16 Dec 2008 10:23:03 -0800 Subject: [ofa-general] RE: [ofw] RE: porting IB management code to Windows In-Reply-To: <3F6F638B8D880340AB536D29CD4C1E192F4FE7AE@orsmsx501.amr.corp.intel.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081213203014.GU15622@sashak.voltaire.com> <000001c95ee8$8dc7b970$0ce1180a@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE7AE@orsmsx501.amr.corp.intel.com> Message-ID: <000001c95fab$563ca0b0$2c248686@amr.corp.intel.com> >Hold the presses - there is an absolute requirement that any part of a WinOF >release builds using the WDK build environment! Why is this an absolute requirement for userspace applications? ND is provided as a binary component that is built using some unknown build environment. The source code is not available. It has a different license. But it still ships with WinOF. Building in the WDK is a political decision, not a technical one. At the extremes, the IB management maintainers can refuse to support Windows because they don't like the changes to build using the MS WDK compiler. Or, WinOF can refuse to ship IB management utilities because it builds differently. Obviously no one benefits from taking these extreme stances. There are real benefits to maintaining a single source code base. I would guess between 10-25 patches are added to the management tree each week, many, if not most, by developers that only care about Linux. The Windows stack gains a lot by being able to take advantage of those changes. (Note that most patches are against opensm. If we limit the current discussion to only the diags and libibmad, we're looking at roughly 12,000 lines of code and probably about 5 patches per week.) I posted a patch to show the changes necessary to build in the WDK environment. Jason asked, "Is there any way it would be acceptable to use gcc (or even the Intel compiler) as the mandatory Windows C compiler?" And Sasha's commented, "I would prefer to minimize amount of needed changes and would really prefer to not get a lot of limitations in using modern C." My reply was "I personally have no issue with using a different compiler, but the WWG would need to decide on that sort of change." I looked into the issue. The compiler that ships with the WDK does not support c99 to the extent needed. Because using the Intel compiler was convenient for me, I tested with that and was eventually able to reduce the necessary code changes. I built in the WDK build environment. At this point, I see the following options: 1. Fork the code: The benefits or requirement of building within the existing Windows build environment is greater or more important than maintaining a single source code base. 2. Accept the changes as posted: If the benefits or requirement of building within the existing Windows build environment is greater or more important than allowing c99 constructs, but not at the cost of forking the code. 3. Build the IB management with a different compiler: The benefits of support c99 is greater than the benefits of building within the existing Windows build environment. If there's another solution here, I'm open to ideas. Of these three, I prefer option 2 or 3. Option 3 means slightly more work for me because it requires making binaries available, but there are real benefits to supporting c99. - Sean From stan.smith at intel.com Tue Dec 16 11:01:09 2008 From: stan.smith at intel.com (Smith, Stan) Date: Tue, 16 Dec 2008 11:01:09 -0800 Subject: [ofa-general] RE: [ofw] RE: porting IB management code to Windows In-Reply-To: <000001c95fab$563ca0b0$2c248686@amr.corp.intel.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081213203014.GU15622@sashak.voltaire.com> <000001c95ee8$8dc7b970$0ce1180a@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE7AE@orsmsx501.amr.corp.intel.com> <000001c95fab$563ca0b0$2c248686@amr.corp.intel.com> Message-ID: <3F6F638B8D880340AB536D29CD4C1E192F4FE9E4@orsmsx501.amr.corp.intel.com> Hefty, Sean wrote: >> Hold the presses - there is an absolute requirement that any part of >> a WinOF release builds using the WDK build environment! > > Why is this an absolute requirement for userspace applications? > > ND is provided as a binary component that is built using some unknown > build environment. The source code is not available. It has a > different license. But it still ships with WinOF. Building in the > WDK is a political decision, not a technical one. ND is built in the WDK environment by those who have MS src license. Political decision? Maybe so, but nonetheless WDK it is until such a time that the Windows community agrees it is not. The bottom-line fact of the OpenIB/WinOF svn tree has been that upon svn checkout/update the src tree is buildable without further additions other than the WDK build environment. The Windows svn tree is built and tested by at least 4 different companies, some daily. There needs to be a consensus before major changes to the Windows build process are installed. BTW, I am in favor of common IB management src. So if 'everyone' is in favor of having a common src base, what's so hard about having a few #ifdef __WIN32? Yes it's not the optimal approach, although changing compilers, building elsewhere, adding binaries are all certainly more offensive IMHO. BTW, since the 'common src' will he maintained in the Linux world - who is to say patches will consider Windows ramifications and be tested in the Windows environment anyway? History has not demonstrated this type of development diligence, hence the code fork which brings us to today's situation. Make the process easy one everyone and one day into the future, maybe MS will wakeup and the WDK will be c99 compliant? Stan. > > At the extremes, the IB management maintainers can refuse to support > Windows because they don't like the changes to build using the MS WDK > compiler. Or, WinOF can refuse to ship IB management utilities > because it builds differently. Obviously no one benefits from taking > these extreme stances. > > There are real benefits to maintaining a single source code base. I > would guess between 10-25 patches are added to the management tree > each week, many, if not most, by developers that only care about > Linux. The Windows stack gains a lot by being able to take advantage > of those changes. (Note that most patches are against opensm. If we > limit the current discussion to only the diags and libibmad, we're > looking at roughly 12,000 lines of code and probably about 5 patches > per week.) > > I posted a patch to show the changes necessary to build in the WDK > environment. Jason asked, "Is there any way it would be acceptable to > use gcc (or even the Intel compiler) as the mandatory Windows C > compiler?" And Sasha's commented, "I would prefer to minimize amount > of needed changes and would really prefer to not get a lot of > limitations in using modern C." My reply was "I personally have no > issue with using a different compiler, but the WWG would need to > decide on that sort of change." > > I looked into the issue. The compiler that ships with the WDK does > not support c99 to the extent needed. Because using the Intel > compiler was convenient for me, I tested with that and was eventually > able to reduce the necessary code changes. I built in the WDK build > environment. > > At this point, I see the following options: > > 1. Fork the code: > The benefits or requirement of building within the existing Windows > build environment is greater or more important than maintaining a > single source code base. > 2. Accept the changes as posted: > If the benefits or requirement of building within the existing > Windows build environment is greater or more important than allowing > c99 constructs, but not at the cost of forking the code. > 3. Build the IB management with a different compiler: > The benefits of support c99 is greater than the benefits of building > within the existing Windows build environment. > > If there's another solution here, I'm open to ideas. Of these three, > I prefer option 2 or 3. Option 3 means slightly more work for me > because it requires making binaries available, but there are real > benefits to supporting c99. > > - Sean From sashak at voltaire.com Tue Dec 16 11:10:48 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Tue, 16 Dec 2008 21:10:48 +0200 Subject: [ofa-general] Re: [PATCH][TRIVIAL] opensm/osm_inform.c: Eliminate compile warning In-Reply-To: <1229449460.29873.435.camel@bertha1.edm.orcorp.ca> References: <1229449460.29873.435.camel@bertha1.edm.orcorp.ca> Message-ID: <20081216191048.GI7311@sashak.voltaire.com> On 10:44 Tue 16 Dec , Hal Rosenstock wrote: > Sasha, > > Attached patch eliminates newly introduced compile warning in > osm_inform.c. > > -- Hal > opensm/osm_inform.c: Eliminate compile warning > > Signed-off-by: Hal Rosenstock And then #include is not needed. Applied. Thanks. > > diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c > index b203435..186f620 100644 > --- a/opensm/opensm/osm_inform.c > +++ b/opensm/opensm/osm_inform.c > @@ -52,6 +52,7 @@ > #include > #include > #include > +#include > > typedef struct osm_infr_match_ctxt { > cl_list_t *p_remove_infr_list; From jgunthorpe at obsidianresearch.com Tue Dec 16 12:04:49 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 16 Dec 2008 13:04:49 -0700 Subject: [ofa-general] RE: [ofw] RE: porting IB management code to Windows In-Reply-To: <3F6F638B8D880340AB536D29CD4C1E192F4FE9E4@orsmsx501.amr.corp.intel.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081213203014.GU15622@sashak.voltaire.com> <000001c95ee8$8dc7b970$0ce1180a@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE7AE@orsmsx501.amr.corp.intel.com> <000001c95fab$563ca0b0$2c248686@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE9E4@orsmsx501.amr.corp.intel.com> Message-ID: <20081216200449.GE31213@obsidianresearch.com> On Tue, Dec 16, 2008 at 11:01:09AM -0800, Smith, Stan wrote: > So if 'everyone' is in favor of having a common src base, what's so > hard about having a few #ifdef __WIN32? Yes it's not the optimal > approach, although changing compilers, building elsewhere, adding > binaries are all certainly more offensive IMHO. I think you have to appreciate that the code written for Linux is written in C99 and for the POSIX API. WDK does not provide that kind of environment and trying to bake over that with #ifdef is a huge ongoing job. Just look at what Sean has done already, he had to provide an independent getopt implementation just for a tiny utility. > BTW, since the 'common src' will he maintained in the Linux world - > who is to say patches will consider Windows ramifications and be > tested in the Windows environment anyway? History has not > demonstrated this type of development diligence, hence the code fork > which brings us to today's situation. In my experience the #1 reason this happens is because people try to 'port' the code to a completely different environment rather than accepting that POSIX code needs to run on POSIX libraries (SFU/cygwin/etc) and Windows code needs to run on WIN32 libraries (wine/mainwin/etc). Trying to patch over the huge differences with ifdefs will always cause problems in the future as changes are made. > Make the process easy one everyone and one day into the future, > maybe MS will wakeup and the WDK will be c99 compliant? MS already has a C99 POSIX environment they support and distribute, it is called SFU and the bundled Intrix library set. Jason From sean.hefty at intel.com Tue Dec 16 13:17:26 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 16 Dec 2008 13:17:26 -0800 Subject: [ofa-general] RE: [ofw] RE: porting IB management code to Windows In-Reply-To: <20081216200449.GE31213@obsidianresearch.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081213203014.GU15622@sashak.voltaire.com> <000001c95ee8$8dc7b970$0ce1180a@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE7AE@orsmsx501.amr.corp.intel.com> <000001c95fab$563ca0b0$2c248686@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE9E4@orsmsx501.amr.corp.intel.com> <20081216200449.GE31213@obsidianresearch.com> Message-ID: <000101c95fc3$b2e9d4a0$2c248686@amr.corp.intel.com> >I think you have to appreciate that the code written for Linux is >written in C99 and for the POSIX API. WDK does not provide that kind >of environment and trying to bake over that with #ifdef is a huge >ongoing job. Just look at what Sean has done already, he had to >provide an independent getopt implementation just for a tiny utility. Note that the management code has an OS abstraction layer in the form on complib. Probably the biggest hurdle to porting the code is the use of the sys/class filesystem. We have some ideas on how we might work around this, but we haven't studied the actual use of sys/class. >MS already has a C99 POSIX environment they support and distribute, it >is called SFU and the bundled Intrix library set. I like to distinguish between build requirements, versus requirements placed on end-users. Requiring users to download and install SFU separate from WinOF is something that I would like to avoid, if reasonable to do so. So far, nothing that we've seen blocks the ability to share the majority of the libibmad or diag code. (Arlin can provide more details on changes.) However, if the management package is to support multiple OS's, then the code will require some modifications to support that. I'd like to think that the companies involved here would like to support How flexible are people willing to be? Are there objections on the Windows side to taking IB management binaries and including them as part of WinOF? Are there objections on the Linux side from accepting patches that allow the code to be built using the WDK? - Sean From arlin.r.davis at intel.com Tue Dec 16 13:08:56 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Tue, 16 Dec 2008 13:08:56 -0800 Subject: [ofa-general] RE: [ofw] RE: porting IB management code to Windows In-Reply-To: <20081216200449.GE31213@obsidianresearch.com> References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081213203014.GU15622@sashak.voltaire.com> <000001c95ee8$8dc7b970$0ce1180a@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE7AE@orsmsx501.amr.corp.intel.com> <000001c95fab$563ca0b0$2c248686@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE9E4@orsmsx501.amr.corp.intel.com> <20081216200449.GE31213@obsidianresearch.com> Message-ID: >MS already has a C99 POSIX environment they support and distribute, it >is called SFU and the bundled Intrix library set. Wouldn't this require all WinOF customers to have the SFU toolkit installed even for the binary distribution? From jgunthorpe at obsidianresearch.com Tue Dec 16 14:12:25 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 16 Dec 2008 15:12:25 -0700 Subject: [ofa-general] RE: [ofw] RE: porting IB management code to Windows In-Reply-To: References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081213203014.GU15622@sashak.voltaire.com> <000001c95ee8$8dc7b970$0ce1180a@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE7AE@orsmsx501.amr.corp.intel.com> <000001c95fab$563ca0b0$2c248686@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE9E4@orsmsx501.amr.corp.intel.com> <20081216200449.GE31213@obsidianresearch.com> Message-ID: <20081216221225.GG31213@obsidianresearch.com> On Tue, Dec 16, 2008 at 01:08:56PM -0800, Davis, Arlin R wrote: > >MS already has a C99 POSIX environment they support and distribute, it > >is called SFU and the bundled Intrix library set. > Wouldn't this require all WinOF customers to have the SFU toolkit > installed even for the binary distribution? Thats well past my experience, but I have been led to belive it boils down to a small number of support dlls in most cases. Maybe the licensing is such that you can distribute it with the program like you can with mfc*.dll/etc That is the case for cygwin too - there is a core dll that provides the posix API emulation - though there are some provisos.. Jason From rdreier at cisco.com Tue Dec 16 23:02:52 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 16 Dec 2008 23:02:52 -0800 Subject: [ofa-general] Re: [PATCH 2/11] drivers/infiniband: Move a dereference below a NULL test In-Reply-To: (Julia Lawall's message of "Tue, 16 Dec 2008 16:12:16 +0100 (CET)") References: Message-ID: > --- a/drivers/infiniband/hw/nes/nes_cm.c > +++ b/drivers/infiniband/hw/nes/nes_cm.c > @@ -376,13 +376,14 @@ int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb, > int close_when_complete) > { > unsigned long flags; > - struct nes_cm_core *cm_core = cm_node->cm_core; > + struct nes_cm_core *cm_core; > struct nes_timer_entry *new_send; > int ret = 0; > u32 was_timer_set; > > if (!cm_node) > return -EINVAL; > + cm_core = cm_node->cm_core; Thanks... I believe this is already taken care of by a patch pending for 2.6.29 (which should be in -next) that removes the test for NULL. - R. From jean-vincent.ficet at bull.net Tue Dec 16 23:56:36 2008 From: jean-vincent.ficet at bull.net (Vincent Ficet) Date: Wed, 17 Dec 2008 08:56:36 +0100 Subject: [ofa-general] vendstat issue with shark switch In-Reply-To: References: <4947CE86.90301@bull.net> Message-ID: <4948B0B4.8000305@bull.net> Hello Hal, > ClassPortInfo is a required attribute for any supported class but I'm > not 100% sure what is going on. Those vendstat options are for IS-3 > and this is IS-4 although there is some device validation supported in > the vendstat command. > > Can you show the output with -d option ? > > This is what I have: [root at inti0 sbin]# vendstat -dd -N 2 ibwarn: [8822] umad_init: umad_init ibwarn: [8822] umad_open_port: ca (null) port 0 ibwarn: [8822] umad_get_cas_names: max 20 ibwarn: [8822] umad_get_cas_names: return 1 cas ibwarn: [8822] resolve_ca_name: checking ca 'mlx4_0' ibwarn: [8822] resolve_ca_port: checking ca 'mlx4_0' ibwarn: [8822] umad_get_ca: ca_name mlx4_0 ibwarn: [8822] umad_get_ca: opened mlx4_0 ibwarn: [8822] resolve_ca_port: checking port 0 ibwarn: [8822] resolve_ca_port: checking port 1 ibwarn: [8822] resolve_ca_port: found active port 1 ibwarn: [8822] resolve_ca_name: found ca mlx4_0 with port 1 type 1 ibwarn: [8822] resolve_ca_name: found ca mlx4_0 with active port 1 ibwarn: [8822] umad_open_port: opening mlx4_0 port 1 ibwarn: [8822] dev_to_umad_id: mapped mlx4_0 1 to 0 ibwarn: [8822] umad_open_port: opened /dev/infiniband/umad0 fd 3 portid 0 ibwarn: [8822] umad_register: fd 3 mgmt_class 1 mgmt_version 1 rmpp_version 0 method_mask (nil) ibwarn: [8822] umad_register: fd 3 registered to use agent 0 qp 0 ibwarn: [8822] umad_register: fd 3 mgmt_class 129 mgmt_version 1 rmpp_version 0 method_mask (nil) ibwarn: [8822] umad_register: fd 3 registered to use agent 1 qp 0 ibwarn: [8822] umad_register: fd 3 mgmt_class 3 mgmt_version 2 rmpp_version 1 method_mask (nil) ibwarn: [8822] umad_register: fd 3 registered to use agent 2 qp 1 ibwarn: [8822] umad_register: fd 3 mgmt_class 10 mgmt_version 1 rmpp_version 0 method_mask (nil) ibwarn: [8822] umad_register: fd 3 registered to use agent 3 qp 1 ibwarn: [8822] ib_vendor_call: route Lid 2 data 0x7fff19b91c60 ibwarn: [8822] ib_vendor_call: class 0xa method 0x1 attr 0x1 mod 0x0 datasz 232 off 24 res_ex 1 ibwarn: [8822] mad_rpc_rmpp: rmpp (nil) data 0x7fff19b91c60 ibwarn: [8822] umad_set_addr: umad 0x7fff19b912b0 dlid 2 dqp 1 sl 0, qkey 80010000 ibwarn: [8822] _do_madrpc: >>> sending: len 256 pktsz 320 send buf 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 8001 0000 0002 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 010a 0101 0000 0000 0960 1a1a 25a8 c4c0 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ibwarn: [8822] umad_send: fd 3 agentid 3 umad 0x7fff19b912b0 timeout 1000 ibwarn: [8822] umad_recv: fd 3 umad 0x7fff19b90eb0 timeout 1000 ibwarn: [8822] umad_recv: mad received by agent 3 length 320 ibwarn: [8822] _do_madrpc: rcv buf: rcv buf 010a 0181 000c 0000 0000 009f 25a8 c4c0 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ibwarn: [8822] mad_rpc_rmpp: MAD completed with error status 0xc; dport (Lid 2) vendstat: iberror: [pid 8822] main: failed: classportinfo query Cheers, Vincent From sashak at voltaire.com Wed Dec 17 00:12:24 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 17 Dec 2008 10:12:24 +0200 Subject: [ofa-general] [PATCH] opensm/event_plugin: link opensm with -rdynamic flag Message-ID: <20081217081224.GL7311@sashak.voltaire.com> Link OpenSM executable with -rdynamic flag so all global symbols (actually functions) will be available for using in plugins. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/Makefile.am | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/Makefile.am b/opensm/opensm/Makefile.am index 66fbccc..9fd9123 100644 --- a/opensm/opensm/Makefile.am +++ b/opensm/opensm/Makefile.am @@ -25,6 +25,7 @@ libopensm_la_LDFLAGS = -version-info $(opensm_api_version) \ libopensm_la_DEPENDENCIES = $(srcdir)/libopensm.map sbin_PROGRAMS = opensm +opensm_LDFLAGS = -rdynamic opensm_DEPENDENCIES = libopensm.la opensm_SOURCES = main.c osm_console_io.c osm_console.c osm_db_files.c \ osm_db_pack.c osm_drop_mgr.c \ -- 1.6.0.4.766.g6fc4a From vlad at lists.openfabrics.org Wed Dec 17 03:24:28 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 17 Dec 2008 03:24:28 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081217-0200 daily build status Message-ID: <20081217112428.80139E60B89@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From nicolas.morey-chaisemartin at ext.bull.net Wed Dec 17 04:44:36 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Wed, 17 Dec 2008 13:44:36 +0100 Subject: [ofa-general] [OpenSM][PATCH] Corrected incoherency in __osm_ftree_fabric_route_to_non_cns comments Message-ID: <4948F434.8020605@ext.bull.net> It seems to be that there is an error in the comment of __osm_ftree_fabric_route_to_non_cns comments. It said the function was to be called with TRUE,FALSE parameters when it was called with TRUE,TRUE. I'm just discovering the Ftree routing code so I may have misunderstood something. Anyway if I'm right, here's the patch! Signed-off-by: Nicolas Morey-Chaisemartin --- opensm/opensm/osm_ucast_ftree.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) -------------- next part -------------- A non-text attachment was scrubbed... Name: bff9d5f055f9a0a8498dd9895283145c97cfa868.diff Type: text/x-patch Size: 767 bytes Desc: not available URL: From celine.bourde at ext.bull.net Wed Dec 17 04:56:07 2008 From: celine.bourde at ext.bull.net (Celine Bourde) Date: Wed, 17 Dec 2008 13:56:07 +0100 Subject: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition Message-ID: <4948F6E7.9070509@ext.bull.net> Hi, I can't mount an NFS/RDMA partition. I've applied http://www.openfabrics.org//downloads/OFED/ofed-1.4/OFED-1.4-docs/nfs-rdma.release-notes.txt instructions. Every steps (loading modules, /etc/exports implementation, starting nfs daemon, etc..) seems to be ok, but when I do the last command : mount -o rdma,port=2050 192.168.0.13:/export /tmp/nfs_client/ the mount processus blocks even last dmesg output seems correct : "RPC: Registered rdma transport module. rpcrdma: connection to 192.168.0.13:2050 on mlx4_0, memreg 5 slots 32 ird 16 " If I try "ibstat" after that, I have a kernel panic message : "ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device or resource busy)" because device is in use. 100 % of processus is used by ib_mad1 [root at test]top top - 14:55:07 up 19 min, 3 users, load average: 2.00, 1.87, 1.12 Tasks: 190 total, 2 running, 188 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 12.5%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8066156k total, 615096k used, 7451060k free, 45604k buffers Swap: 8193140k total, 0k used, 8193140k free, 343436k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2952 root 15 -5 0 0 0 R 100 0.0 5:23.55 ib_mad1 1 root 20 0 10320 688 572 S 0 0.0 0:02.04 init 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 4 root 15 -5 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0 I can't kill mount process (kill -9 or shutdown -R or echo b > sysrq-trigger) and I have to restart the computer using "ipmitool target chassis power reset". Have any idea ? Moreover, I sometimes have this dmesg log: mlx4_core 0000:01:00.0: HW2SW_MPT failed (-16). (I don't think there is an agreement with mount bug). I saw this error could be occured with old firmeware version but mine is 2.5.9 .. For more details see bug report : https://bugs.openfabrics.org/show_bug.cgi?id=1459 Thanks for your help. Céline Bourde. From eli at mellanox.co.il Wed Dec 17 06:23:47 2008 From: eli at mellanox.co.il (Eli Cohen) Date: Wed, 17 Dec 2008 16:23:47 +0200 Subject: [ofa-general] [PATCH] IPoIB: fix race when manipulating mcast SKBs queue Message-ID: <20081217142347.GA13829@mtls03> ipoib_mcast_free() dequeues SKBs pending on the pkt_queue but needs to do that with netif_tx_lock_bh() acquired. Signed-off-by: Eli Cohen --- I saw the following bug appear in ofed 1.4 on RHAS5.2 but have not yet had the chance to verify that this patch fixes this particular problem but I think this patch is valid anyway: Dec 13 04:57:26 mtilab17 kernel: ib0: dev_queue_xmit failed to requeue packet Dec 13 04:57:36 mtilab17 kernel: ib0: timing out; 63 sends not completed Dec 13 04:57:36 mtilab17 kernel: Attempt to release alive inet socket ffff8102621d3680 Dec 13 04:57:36 mtilab17 kernel: Attempt to release alive inet socket ffff81026ba4b0c0 Dec 13 04:57:36 mtilab17 kernel: BUG: warning at include/net/dst.h:153/dst_release() (Tainted: G ) Dec 13 04:57:36 mtilab17 kernel: Dec 13 04:57:36 mtilab17 kernel: Call Trace: Dec 13 04:57:36 mtilab17 kernel: [] __kfree_skb+0x47/0x110 Dec 13 04:57:36 mtilab17 kernel: [] :ib_ipoib:ipoib_cm_handle_tx_wc+0xc2/0x228 Dec 13 04:57:36 mtilab17 kernel: [] :ib_ipoib:ipoib_poll+0xaa/0x19e Dec 13 04:57:36 mtilab17 kernel: [] net_rx_action+0xa4/0x1a4 Dec 13 04:57:36 mtilab17 kernel: [] :mlx4_core:poll_catas+0x0/0x137 --- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index d9d1223..dd320d5 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -94,12 +94,12 @@ static void ipoib_mcast_free(struct ipoib_mcast *mcast) if (mcast->ah) ipoib_put_ah(mcast->ah); + netif_tx_lock_bh(dev); while (!skb_queue_empty(&mcast->pkt_queue)) { ++tx_dropped; dev_kfree_skb_any(skb_dequeue(&mcast->pkt_queue)); } - netif_tx_lock_bh(dev); dev->stats.tx_dropped += tx_dropped; netif_tx_unlock_bh(dev); -- 1.6.0.5 From kliteyn at dev.mellanox.co.il Wed Dec 17 06:31:59 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 17 Dec 2008 16:31:59 +0200 Subject: [ofa-general] [OpenSM][PATCH] Corrected incoherency in __osm_ftree_fabric_route_to_non_cns comments In-Reply-To: <4948F434.8020605@ext.bull.net> References: <4948F434.8020605@ext.bull.net> Message-ID: <49490D5F.6040409@dev.mellanox.co.il> Hi Nikolas, Nicolas Morey Chaisemartin wrote: > It seems to be that there is an error in the comment of > __osm_ftree_fabric_route_to_non_cns comments. > It said the function was to be called with TRUE,FALSE parameters when it > was called with TRUE,TRUE. > I'm just discovering the Ftree routing code so I may have misunderstood > something. > > Anyway if I'm right, here's the patch! You're right, there's some incoherency in the comments. There were few bug fixes that weren't followed by the comments. In fact, there are more errors in this comment. Here's the fix: Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_ucast_ftree.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c index aa51d23..ebe6612 100644 --- a/opensm/opensm/osm_ucast_ftree.c +++ b/opensm/opensm/osm_ucast_ftree.c @@ -2462,11 +2462,11 @@ static void __osm_ftree_fabric_route_to_cns(IN ftree_fabric_t * p_ftree) * foreach HCA non-CN port in fabric * obtain the LID of the HCA port * get switch that is connected to this HCA port - * set switch LFT(LID) to the port connecting to compute node - * call assign-down-going-port-by-ascending-up(TRUE,FALSE) on CURRENT switch + * set switch LFT(LID) to the port connected to the HCA port + * call assign-down-going-port-by-ascending-up(TRUE,TRUE) on the switch * - * Routing to these HCAs is routing a REAL hca lid on SECONDARY path. - * However, we do want to allow load-leveling of the traffic to the non-CNs, + * Routing to these HCAs is routing a REAL hca lid on MAIN path. + * We want to allow load-leveling of the traffic to the non-CNs, * because such nodes may include IO nodes with heavy usage * - we should set fwd tables * - we should update port counters -- 1.5.1.4 From hal.rosenstock at gmail.com Wed Dec 17 06:43:09 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 17 Dec 2008 09:43:09 -0500 Subject: ***SPAM*** Re: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition In-Reply-To: <4948F6E7.9070509@ext.bull.net> References: <4948F6E7.9070509@ext.bull.net> Message-ID: Hi, On Wed, Dec 17, 2008 at 7:56 AM, Celine Bourde wrote: > Hi, > > I can't mount an NFS/RDMA partition. > I've applied > http://www.openfabrics.org//downloads/OFED/ofed-1.4/OFED-1.4-docs/nfs-rdma.release-notes.txt > instructions. > > Every steps (loading modules, /etc/exports implementation, starting nfs > daemon, > etc..) seems to be ok, but when I do the last command : > mount -o rdma,port=2050 192.168.0.13:/export /tmp/nfs_client/ > the mount processus blocks even last dmesg output seems correct : > "RPC: Registered rdma transport module. > rpcrdma: connection to 192.168.0.13:2050 on mlx4_0, memreg 5 slots 32 ird 16 > " > If I try "ibstat" after that, I have a kernel panic message : > "ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device or > resource busy)" because device is in use. That's an application "panic" meaning some sort of abnormal condition. I'm not familiar with what NFS/RDMA does with the MAD layer but there may be some conflict with the diagnostic tools in this area. Another possibility is that the firmware error causes this error condition. > 100 % of processus is used by ib_mad1 > [root at test]top > top - 14:55:07 up 19 min, 3 users, load average: 2.00, 1.87, 1.12 > Tasks: 190 total, 2 running, 188 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.0%us, 12.5%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 8066156k total, 615096k used, 7451060k free, 45604k buffers > Swap: 8193140k total, 0k used, 8193140k free, 343436k cached > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 2952 root 15 -5 0 0 0 R 100 0.0 5:23.55 ib_mad1 > 1 root 20 0 10320 688 572 S 0 0.0 0:02.04 init > 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd > 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 > 4 root 15 -5 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0 > > > I can't kill mount process (kill -9 or shutdown -R or echo b > > sysrq-trigger) > and I have to restart the computer using "ipmitool target chassis power > reset". > > Have any idea ? Is there anything in dmesg or /var/log/messages relating to ib_mad ? -- Hal > Moreover, I sometimes have this dmesg log: mlx4_core 0000:01:00.0: HW2SW_MPT > failed (-16). (I don't think there is an agreement with mount bug). I saw > this > error could be occured with old firmeware version but mine is 2.5.9 .. > For more details see bug report : > https://bugs.openfabrics.org/show_bug.cgi?id=1459 > > Thanks for your help. > > Céline Bourde. > > > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From kliteyn at dev.mellanox.co.il Wed Dec 17 07:00:54 2008 From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik) Date: Wed, 17 Dec 2008 17:00:54 +0200 Subject: [ofa-general] [PATCH] opensm/osm_ucast_ftree.c: fixing errors in comments Message-ID: <49491426.3030208@dev.mellanox.co.il> Hi Sasha, Following Nicolas' fix, here's some fixes in the fat-tree routing comments. Signed-off-by: Yevgeny Kliteynik --- opensm/opensm/osm_ucast_ftree.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/opensm/opensm/osm_ucast_ftree.c b/opensm/opensm/osm_ucast_ftree.c index aa51d23..ebe6612 100644 --- a/opensm/opensm/osm_ucast_ftree.c +++ b/opensm/opensm/osm_ucast_ftree.c @@ -2462,11 +2462,11 @@ static void __osm_ftree_fabric_route_to_cns(IN ftree_fabric_t * p_ftree) * foreach HCA non-CN port in fabric * obtain the LID of the HCA port * get switch that is connected to this HCA port - * set switch LFT(LID) to the port connecting to compute node - * call assign-down-going-port-by-ascending-up(TRUE,FALSE) on CURRENT switch + * set switch LFT(LID) to the port connected to the HCA port + * call assign-down-going-port-by-ascending-up(TRUE,TRUE) on the switch * - * Routing to these HCAs is routing a REAL hca lid on SECONDARY path. - * However, we do want to allow load-leveling of the traffic to the non-CNs, + * Routing to these HCAs is routing a REAL hca lid on MAIN path. + * We want to allow load-leveling of the traffic to the non-CNs, * because such nodes may include IO nodes with heavy usage * - we should set fwd tables * - we should update port counters -- 1.5.1.4 From Thomas.Talpey at netapp.com Wed Dec 17 07:01:44 2008 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 17 Dec 2008 10:01:44 -0500 Subject: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition In-Reply-To: <4948F6E7.9070509@ext.bull.net> References: <4948F6E7.9070509@ext.bull.net> Message-ID: At 07:56 AM 12/17/2008, Celine Bourde wrote: >Hi, > >I can't mount an NFS/RDMA partition. > >I've applied >http://www.openfabrics.org//downloads/OFED/ofed-1.4/OFED-1.4-docs/nfs- >rdma.release-notes.txt >instructions. Do you really need to run OFED? Since 2.6.24, stock kernel.org supports the NFS/RDMA client, and since 2.6.25 the server is as well. Fedora 9 and Fedora 10 both support NFS/RDMA and IB, for example. > >Every steps (loading modules, /etc/exports implementation, starting nfs daemon, >etc..) seems to be ok, but when I do the last command : >mount -o rdma,port=2050 192.168.0.13:/export /tmp/nfs_client/ >the mount processus blocks even last dmesg output seems correct : >"RPC: Registered rdma transport module. >rpcrdma: connection to 192.168.0.13:2050 on mlx4_0, memreg 5 slots 32 ird 16 >" This is all great - the connection was made, and the initial NFS exchanges were performed. Did you try any NFS operations at this point? >If I try "ibstat" after that, I have a kernel panic message : >"ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device or resource >busy)" because device is in use. I can't explain this - certainly I've never seen it. I am going to guess it's an OFED issue, or something in your setup. Do you have any other detail? Stack trace of the oops? Tom. From Thomas.Talpey at netapp.com Wed Dec 17 07:07:37 2008 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 17 Dec 2008 10:07:37 -0500 Subject: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition In-Reply-To: References: <4948F6E7.9070509@ext.bull.net> Message-ID: At 10:01 AM 12/17/2008, Talpey, Thomas wrote: >At 07:56 AM 12/17/2008, Celine Bourde wrote: >>Hi, >> >>I can't mount an NFS/RDMA partition. >> >>I've applied >>http://www.openfabrics.org//downloads/OFED/ofed-1.4/OFED-1.4-docs/nfs- >>rdma.release-notes.txt >>instructions. > >Do you really need to run OFED? Since 2.6.24, stock kernel.org supports >the NFS/RDMA client, and since 2.6.25 the server is as well. Fedora 9 and >Fedora 10 both support NFS/RDMA and IB, for example. Wait a sec - you are running 2.6.27! You should not install OFED, everything you need is already in the kernel. >>My configuration is : >>kernel : 2.6.27 with NFS options >>last stable OFED 1.4 So my suggestion is, just run the stock kernel. Please post your results - it should work fine. There are some improvements to NFS/RDMA coming in 2.6.28, btw, especially if you want to run over iWARP since the "frmr" memory registration will be supported. These are all present in 2.6.28-rc1 thrugh current -rc8. Tom. > >> >>Every steps (loading modules, /etc/exports implementation, starting >nfs daemon, >>etc..) seems to be ok, but when I do the last command : >>mount -o rdma,port=2050 192.168.0.13:/export /tmp/nfs_client/ >>the mount processus blocks even last dmesg output seems correct : >>"RPC: Registered rdma transport module. >>rpcrdma: connection to 192.168.0.13:2050 on mlx4_0, memreg 5 slots 32 ird 16 >>" > >This is all great - the connection was made, and the initial NFS exchanges were >performed. Did you try any NFS operations at this point? > >>If I try "ibstat" after that, I have a kernel panic message : >>"ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device or resource >>busy)" because device is in use. > >I can't explain this - certainly I've never seen it. I am going to >guess it's an >OFED issue, or something in your setup. Do you have any other detail? Stack >trace of the oops? > >Tom. > >_______________________________________________ >general mailing list >general at lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From celine.bourde at ext.bull.net Wed Dec 17 07:04:00 2008 From: celine.bourde at ext.bull.net (Celine Bourde) Date: Wed, 17 Dec 2008 16:04:00 +0100 Subject: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition In-Reply-To: References: <4948F6E7.9070509@ext.bull.net> Message-ID: <494914E0.60003@ext.bull.net> Hal Rosenstock wrote: > Hi, > > On Wed, Dec 17, 2008 at 7:56 AM, Celine Bourde > wrote: > >> Hi, >> >> I can't mount an NFS/RDMA partition. >> I've applied >> http://www.openfabrics.org//downloads/OFED/ofed-1.4/OFED-1.4-docs/nfs-rdma.release-notes.txt >> instructions. >> >> Every steps (loading modules, /etc/exports implementation, starting nfs >> daemon, >> etc..) seems to be ok, but when I do the last command : >> mount -o rdma,port=2050 192.168.0.13:/export /tmp/nfs_client/ >> the mount processus blocks even last dmesg output seems correct : >> "RPC: Registered rdma transport module. >> rpcrdma: connection to 192.168.0.13:2050 on mlx4_0, memreg 5 slots 32 ird 16 >> " >> If I try "ibstat" after that, I have a kernel panic message : >> "ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device or >> resource busy)" because device is in use. >> > > That's an application "panic" meaning some sort of abnormal condition. > > I'm not familiar with what NFS/RDMA does with the MAD layer but there > may be some conflict with the diagnostic tools in this area. Another > possibility is that the firmware error causes this error condition. > > I sometimes have this dmesg log: mlx4_core 0000:01:00.0: HW2SW_MPT failed (-16). But I don't think there is an agreement with mount bug. I saw this error could be occured with old firmware version but mine is 2.5.9.. My configuration is : kernel : 2.6.27 with NFS options last stable OFED 1.4 mount.nfs (linux nfs-utils 1.1.4) ibstat output (before doing mount) : CA 'mlx4_0' CA type: MT26428 Number of ports: 2 Firmware version: 2.5.900 Hardware version: a0 Node GUID: 0x0002c903000290b2 System image GUID: 0x0002c903000290b5 Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 2 LMC: 0 SM lid: 1 Capability mask: 0x02510868 Port GUID: 0x0002c903000290b3 Port 2: State: Initializing Physical state: LinkUp Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510868 >> 100 % of processus is used by ib_mad1 >> > > >> [root at test]top >> top - 14:55:07 up 19 min, 3 users, load average: 2.00, 1.87, 1.12 >> Tasks: 190 total, 2 running, 188 sleeping, 0 stopped, 0 zombie >> Cpu(s): 0.0%us, 12.5%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, 0.0%si, >> 0.0%st >> Mem: 8066156k total, 615096k used, 7451060k free, 45604k buffers >> Swap: 8193140k total, 0k used, 8193140k free, 343436k cached >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 2952 root 15 -5 0 0 0 R 100 0.0 5:23.55 ib_mad1 >> 1 root 20 0 10320 688 572 S 0 0.0 0:02.04 init >> 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd >> 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 >> 4 root 15 -5 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0 >> >> >> I can't kill mount process (kill -9 or shutdown -R or echo b > >> sysrq-trigger) >> and I have to restart the computer using "ipmitool target chassis power >> reset". >> >> Have any idea ? >> > > Is there anything in dmesg or /var/log/messages relating to ib_mad ? > No, there is no message relating to ib_mad. Céline Bourde. > -- Hal > > >> Moreover, I sometimes have this dmesg log: mlx4_core 0000:01:00.0: HW2SW_MPT >> failed (-16). (I don't think there is an agreement with mount bug). I saw >> this >> error could be occured with old firmeware version but mine is 2.5.9 .. >> For more details see bug report : >> https://bugs.openfabrics.org/show_bug.cgi?id=1459 >> >> Thanks for your help. >> >> Céline Bourde. >> >> >> >> >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> >> > > > From Thomas.Talpey at netapp.com Wed Dec 17 07:12:47 2008 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 17 Dec 2008 10:12:47 -0500 Subject: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition In-Reply-To: References: <4948F6E7.9070509@ext.bull.net> Message-ID: At 09:43 AM 12/17/2008, Hal Rosenstock wrote: >On Wed, Dec 17, 2008 at 7:56 AM, Celine Bourde > wrote: >> Hi, >> If I try "ibstat" after that, I have a kernel panic message : >> "ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device or >> resource busy)" because device is in use. > >That's an application "panic" meaning some sort of abnormal condition. > >I'm not familiar with what NFS/RDMA does with the MAD layer Not a thing. The NFS/RDMA client and server both simply use the rdma_connect() and rdma_listen() api to connect to one another. They never ever call MAD explicitly, as they are coded to be completely transport-independent. Tom. From hal.rosenstock at gmail.com Wed Dec 17 07:28:35 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 17 Dec 2008 10:28:35 -0500 Subject: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition In-Reply-To: References: <4948F6E7.9070509@ext.bull.net> Message-ID: On Wed, Dec 17, 2008 at 10:12 AM, Talpey, Thomas wrote: > At 09:43 AM 12/17/2008, Hal Rosenstock wrote: >>On Wed, Dec 17, 2008 at 7:56 AM, Celine Bourde >> wrote: >>I'm not familiar with what NFS/RDMA does with the MAD layer > > Not a thing. The NFS/RDMA client and server both simply use the rdma_connect() > and rdma_listen() api to connect to one another. They never ever call MAD explicitly, > as they are coded to be completely transport-independent. The ibstat failure is then likely due to a driver/firmware issue. I'm not sure what the driver is saying back to the kernel MAD layer to make it loop though. -- Hal > Tom. > > From hal.rosenstock at gmail.com Wed Dec 17 07:36:44 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 17 Dec 2008 10:36:44 -0500 Subject: [ofa-general] vendstat issue with shark switch In-Reply-To: <4948B0B4.8000305@bull.net> References: <4947CE86.90301@bull.net> <4948B0B4.8000305@bull.net> Message-ID: Hi again Vincent, On Wed, Dec 17, 2008 at 2:56 AM, Vincent Ficet wrote: > Hello Hal, > >> ClassPortInfo is a required attribute for any supported class but I'm >> not 100% sure what is going on. Those vendstat options are for IS-3 >> and this is IS-4 although there is some device validation supported in >> the vendstat command. >> >> Can you show the output with -d option ? >> >> > > This is what I have: > > [root at inti0 sbin]# vendstat -dd -N 2 > ibwarn: [8822] umad_init: umad_init > ibwarn: [8822] umad_open_port: ca (null) port 0 > ibwarn: [8822] umad_get_cas_names: max 20 > ibwarn: [8822] umad_get_cas_names: return 1 cas > ibwarn: [8822] resolve_ca_name: checking ca 'mlx4_0' > ibwarn: [8822] resolve_ca_port: checking ca 'mlx4_0' > ibwarn: [8822] umad_get_ca: ca_name mlx4_0 > ibwarn: [8822] umad_get_ca: opened mlx4_0 > ibwarn: [8822] resolve_ca_port: checking port 0 > ibwarn: [8822] resolve_ca_port: checking port 1 > ibwarn: [8822] resolve_ca_port: found active port 1 > ibwarn: [8822] resolve_ca_name: found ca mlx4_0 with port 1 type 1 > ibwarn: [8822] resolve_ca_name: found ca mlx4_0 with active port 1 > ibwarn: [8822] umad_open_port: opening mlx4_0 port 1 > ibwarn: [8822] dev_to_umad_id: mapped mlx4_0 1 to 0 > ibwarn: [8822] umad_open_port: opened /dev/infiniband/umad0 fd 3 portid 0 > ibwarn: [8822] umad_register: fd 3 mgmt_class 1 mgmt_version 1 rmpp_version > 0 method_mask (nil) > ibwarn: [8822] umad_register: fd 3 registered to use agent 0 qp 0 > ibwarn: [8822] umad_register: fd 3 mgmt_class 129 mgmt_version 1 > rmpp_version 0 method_mask (nil) > ibwarn: [8822] umad_register: fd 3 registered to use agent 1 qp 0 > ibwarn: [8822] umad_register: fd 3 mgmt_class 3 mgmt_version 2 rmpp_version > 1 method_mask (nil) > ibwarn: [8822] umad_register: fd 3 registered to use agent 2 qp 1 > ibwarn: [8822] umad_register: fd 3 mgmt_class 10 mgmt_version 1 rmpp_version > 0 method_mask (nil) > ibwarn: [8822] umad_register: fd 3 registered to use agent 3 qp 1 > ibwarn: [8822] ib_vendor_call: route Lid 2 data 0x7fff19b91c60 > ibwarn: [8822] ib_vendor_call: class 0xa method 0x1 attr 0x1 mod 0x0 datasz > 232 off 24 res_ex 1 > ibwarn: [8822] mad_rpc_rmpp: rmpp (nil) data 0x7fff19b91c60 > ibwarn: [8822] umad_set_addr: umad 0x7fff19b912b0 dlid 2 dqp 1 sl 0, qkey > 80010000 > ibwarn: [8822] _do_madrpc: >>> sending: len 256 pktsz 320 > send buf > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0001 8001 0000 0002 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 010a 0101 0000 0000 0960 1a1a 25a8 c4c0 > 0001 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > ibwarn: [8822] umad_send: fd 3 agentid 3 umad 0x7fff19b912b0 timeout 1000 > ibwarn: [8822] umad_recv: fd 3 umad 0x7fff19b90eb0 timeout 1000 > ibwarn: [8822] umad_recv: mad received by agent 3 length 320 > ibwarn: [8822] _do_madrpc: rcv buf: > rcv buf > 010a 0181 000c 0000 0000 009f 25a8 c4c0 > 0001 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > ibwarn: [8822] mad_rpc_rmpp: MAD completed with error status 0xc; dport (Lid > 2) > vendstat: iberror: [pid 8822] main: failed: classportinfo query This looks like a timeout error on the ClassPortInfo attribute in the vendor class Mellanox uses. So either that class is not implemented in the firmware or this (required) attribute is not implemented. Can you contact Mellanox about this ? Also, can you try the following experiment: rebuild vendstat with #if 0'ing out the following lines: memset(&buf, 0, sizeof(buf)); /* vendor ClassPortInfo is required attribute if class supported */ call.attrid = CLASS_PORT_INFO; if (!ib_vendor_call(&buf, &portid, &call)) IBERROR("classportinfo query"); I'm not sure whether this tool should support IS-IV or not. Currently, as the man page says, it's intended for IS-III only. It may be that the current firmware does not yet support this but will sometime in the future. -- Hal > Cheers, > > Vincent > From jean-vincent.ficet at bull.net Wed Dec 17 08:12:54 2008 From: jean-vincent.ficet at bull.net (Vincent Ficet) Date: Wed, 17 Dec 2008 17:12:54 +0100 Subject: [ofa-general] vendstat issue with shark switch In-Reply-To: References: <4947CE86.90301@bull.net> <4948B0B4.8000305@bull.net> Message-ID: <49492506.8000905@bull.net> Hello Hal, > > This looks like a timeout error on the ClassPortInfo attribute in the > vendor class Mellanox uses. So either that class is not implemented in > the firmware or this (required) attribute is not implemented. Can you > contact Mellanox about this ? Ok, I will get in touch with them about this. > > Also, can you try the following experiment: > > rebuild vendstat with #if 0'ing out the following lines: > > memset(&buf, 0, sizeof(buf)); > /* vendor ClassPortInfo is required attribute if class supported */ > call.attrid = CLASS_PORT_INFO; > if (!ib_vendor_call(&buf, &portid, &call)) > IBERROR("classportinfo query"); > I got the following output (timeout) with the #If 0'ed version: [root at inti0 src]# ./vendstat -dd -N 11 [ ... ] 010a 0181 000c 0000 0000 0031 3760 8692 0017 0000 0000 0000 0000 0000 0000 0000 01b3 01b3 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ffff ffff 0007 0100 0000 2fb8 1008 2008 0000 0011 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 ibwarn: [18182] mad_rpc_rmpp: MAD completed with error status 0xc; dport (Lid 11) vendstat: iberror: [pid 18182] main: failed: vendstat > I'm not sure whether this tool should support IS-IV or not. Currently, > as the man page says, it's intended for IS-III only. It may be that > the current firmware does not yet support this but will sometime in > the future. > > -- Hal Thanks for your help, Vincent From hal.rosenstock at gmail.com Wed Dec 17 08:18:13 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Wed, 17 Dec 2008 11:18:13 -0500 Subject: [ofa-general] vendstat issue with shark switch In-Reply-To: <49492506.8000905@bull.net> References: <4947CE86.90301@bull.net> <4948B0B4.8000305@bull.net> <49492506.8000905@bull.net> Message-ID: Hi Vincent, On Wed, Dec 17, 2008 at 11:12 AM, Vincent Ficet wrote: > Hello Hal, >> >> This looks like a timeout error on the ClassPortInfo attribute in the >> vendor class Mellanox uses. So either that class is not implemented in >> the firmware or this (required) attribute is not implemented. Can you >> contact Mellanox about this ? > > Ok, I will get in touch with them about this. >> >> Also, can you try the following experiment: >> >> rebuild vendstat with #if 0'ing out the following lines: >> >> memset(&buf, 0, sizeof(buf)); >> /* vendor ClassPortInfo is required attribute if class supported */ >> call.attrid = CLASS_PORT_INFO; >> if (!ib_vendor_call(&buf, &portid, &call)) >> IBERROR("classportinfo query"); >> > I got the following output (timeout) with the #If 0'ed version: > > [root at inti0 src]# ./vendstat -dd -N 11 > [ ... ] > 010a 0181 000c 0000 0000 0031 3760 8692 > 0017 0000 0000 0000 0000 0000 0000 0000 > 01b3 01b3 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 ffff ffff > 0007 0100 0000 2fb8 1008 2008 0000 0011 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > 0000 0000 0000 0000 0000 0000 0000 0000 > ibwarn: [18182] mad_rpc_rmpp: MAD completed with error status 0xc; dport > (Lid 11) > vendstat: iberror: [pid 18182] main: failed: vendstat Same error so whole class is likely not supported or not yet supported. -- Hal >> I'm not sure whether this tool should support IS-IV or not. Currently, >> as the man page says, it's intended for IS-III only. It may be that >> the current firmware does not yet support this but will sometime in >> the future. >> >> -- Hal > > Thanks for your help, > > Vincent > > From Jeffrey.C.Becker at nasa.gov Wed Dec 17 09:51:44 2008 From: Jeffrey.C.Becker at nasa.gov (Jeff Becker) Date: Wed, 17 Dec 2008 09:51:44 -0800 Subject: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition In-Reply-To: <4948F6E7.9070509@ext.bull.net> References: <4948F6E7.9070509@ext.bull.net> Message-ID: <49493C30.7040303@nasa.gov> Hi Celine. Celine Bourde wrote: > Hi, > > I can't mount an NFS/RDMA partition. > I've applied > http://www.openfabrics.org//downloads/OFED/ofed-1.4/OFED-1.4-docs/nfs-rdma.release-notes.txt > > instructions. > > Every steps (loading modules, /etc/exports implementation, starting > nfs daemon, > etc..) seems to be ok, but when I do the last command : > mount -o rdma,port=2050 192.168.0.13:/export /tmp/nfs_client/ > the mount processus blocks even last dmesg output seems correct : > "RPC: Registered rdma transport module. > rpcrdma: connection to 192.168.0.13:2050 on mlx4_0, memreg 5 slots 32 > ird 16 > " I've successfully tested 2.6.27 + OFED1.4 + nfs-utils 1.3 + mthca. Does your mlx4 card work correctly independent of NFSRDMA? Also, given later replies, I'm a little concerned about the mad issues you see. Please keep me updated. Thanks. -jeff > If I try "ibstat" after that, I have a kernel panic message : > "ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device or > resource > busy)" because device is in use. > > 100 % of processus is used by ib_mad1 > [root at test]top > top - 14:55:07 up 19 min, 3 users, load average: 2.00, 1.87, 1.12 > Tasks: 190 total, 2 running, 188 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.0%us, 12.5%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, > 0.0%si, 0.0%st > Mem: 8066156k total, 615096k used, 7451060k free, 45604k buffers > Swap: 8193140k total, 0k used, 8193140k free, 343436k cached > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 2952 root 15 -5 0 0 0 R 100 0.0 5:23.55 ib_mad1 > 1 root 20 0 10320 688 572 S 0 0.0 0:02.04 init > 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd > 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 > 4 root 15 -5 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0 > > > I can't kill mount process (kill -9 or shutdown -R or echo b > > sysrq-trigger) > and I have to restart the computer using "ipmitool target chassis > power reset". > > Have any idea ? > > Moreover, I sometimes have this dmesg log: mlx4_core 0000:01:00.0: > HW2SW_MPT > failed (-16). (I don't think there is an agreement with mount bug). I > saw this > error could be occured with old firmeware version but mine is 2.5.9 .. > For more details see bug report : > https://bugs.openfabrics.org/show_bug.cgi?id=1459 > > Thanks for your help. > > Céline Bourde. > > > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Wed Dec 17 11:12:39 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 Dec 2008 11:12:39 -0800 Subject: [ofa-general] Re: [PATCH] IPoIB: fix race when manipulating mcast SKBs queue In-Reply-To: <20081217142347.GA13829@mtls03> (Eli Cohen's message of "Wed, 17 Dec 2008 16:23:47 +0200") References: <20081217142347.GA13829@mtls03> Message-ID: > ipoib_mcast_free() dequeues SKBs pending on the pkt_queue but needs to do that > with netif_tx_lock_bh() acquired. I don't see why this would be required. When ipoib_mcast_free() runs, the mcast structure has been removed from all lists and I don't see how any other context could simultaneously be adding packets to pkt_queue. What is the race that you think this fixes? - R. From rdreier at cisco.com Wed Dec 17 11:18:57 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 17 Dec 2008 11:18:57 -0800 Subject: [ofa-general] RE: [ofw] RE: porting IB management code to Windows In-Reply-To: <3F6F638B8D880340AB536D29CD4C1E192F4FE9E4@orsmsx501.amr.corp.intel.com> (Stan Smith's message of "Tue, 16 Dec 2008 11:01:09 -0800") References: <000201c95bc5$510162a0$1e58180a@amr.corp.intel.com> <20081213203014.GU15622@sashak.voltaire.com> <000001c95ee8$8dc7b970$0ce1180a@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE7AE@orsmsx501.amr.corp.intel.com> <000001c95fab$563ca0b0$2c248686@amr.corp.intel.com> <3F6F638B8D880340AB536D29CD4C1E192F4FE9E4@orsmsx501.amr.corp.intel.com> Message-ID: I don't really have much of a stake in this, but it seems to be that it's probably a good idea to question the assumption that a single codebase shared between Linux and Windows is really a good idea. It's definitely appealing to be able to reuse the code that has been written for Linux, but in the long term trying to keep a single C codebase for such different OSes is probably way more trouble than it's worth. The real problem is that no one seems to use C for native Windows code, let alone all the POSIX library environment stuff. If you want to write cross-platform stuff it probably makes more sense to use a language that is really supported on both platforms -- Java, C#, Python or even C++ would probably be better choices. Of course this would require a lot of work rewriting things in a new language, but if you're really serious about supporting Windows as a first-class platform, I don't think trying to get Linux C code working with a bunch of #ifdef-ery is ever going to get you there. - R. From tziporet at dev.mellanox.co.il Wed Dec 17 12:57:33 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 17 Dec 2008 22:57:33 +0200 Subject: [ofa-general] vendstat issue with shark switch In-Reply-To: References: <4947CE86.90301@bull.net> <4948B0B4.8000305@bull.net> <49492506.8000905@bull.net> Message-ID: <494967BD.1030602@mellanox.co.il> Hal Rosenstock wrote: > Hi Vincent, > > On Wed, Dec 17, 2008 at 11:12 AM, Vincent Ficet > wrote: > >> Hello Hal, >> >>> This looks like a timeout error on the ClassPortInfo attribute in the >>> vendor class Mellanox uses. So either that class is not implemented in >>> the firmware or this (required) attribute is not implemented. Can you >>> contact Mellanox about this ? >>> >> Ok, I will get in touch with them about this. >> >>> Also, can you try the following experiment: >>> >>> rebuild vendstat with #if 0'ing out the following lines: >>> >>> memset(&buf, 0, sizeof(buf)); >>> /* vendor ClassPortInfo is required attribute if class supported */ >>> call.attrid = CLASS_PORT_INFO; >>> if (!ib_vendor_call(&buf, &portid, &call)) >>> IBERROR("classportinfo query"); >>> >>> >> I got the following output (timeout) with the #If 0'ed version: >> >> [root at inti0 src]# ./vendstat -dd -N 11 >> [ ... ] >> 010a 0181 000c 0000 0000 0031 3760 8692 >> 0017 0000 0000 0000 0000 0000 0000 0000 >> 01b3 01b3 0000 0000 0000 0000 0000 0000 >> 0000 0000 0000 0000 0000 0000 ffff ffff >> 0007 0100 0000 2fb8 1008 2008 0000 0011 >> 0000 0000 0000 0000 0000 0000 0000 0000 >> 0000 0000 0000 0000 0000 0000 0000 0000 >> 0000 0000 0000 0000 0000 0000 0000 0000 >> 0000 0000 0000 0000 0000 0000 0000 0000 >> 0000 0000 0000 0000 0000 0000 0000 0000 >> 0000 0000 0000 0000 0000 0000 0000 0000 >> 0000 0000 0000 0000 0000 0000 0000 0000 >> 0000 0000 0000 0000 0000 0000 0000 0000 >> 0000 0000 0000 0000 0000 0000 0000 0000 >> 0000 0000 0000 0000 0000 0000 0000 0000 >> 0000 0000 0000 0000 0000 0000 0000 0000 >> ibwarn: [18182] mad_rpc_rmpp: MAD completed with error status 0xc; dport >> (Lid 11) >> vendstat: iberror: [pid 18182] main: failed: vendstat >> > > Same error so whole class is likely not supported or not yet supported. > > I will forward this to our FW people and see what they say Tziporet From tziporet at dev.mellanox.co.il Wed Dec 17 13:15:52 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Wed, 17 Dec 2008 23:15:52 +0200 Subject: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition In-Reply-To: References: <4948F6E7.9070509@ext.bull.net> Message-ID: <49496C08.6080404@mellanox.co.il> Talpey, Thomas wrote: >>> If I try "ibstat" after that, I have a kernel panic message : >>> "ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device or resource >>> busy)" because device is in use. >>> We should release FW 2.6.0 soon - so it will be worth to try >> I can't explain this - certainly I've never seen it. I am going to >> guess it's an >> OFED issue, or something in your setup. Do you have any other detail? Stack >> trace of the oops? >> >> >> Tom Have you used our ConnectX cards when testing NFS/RDMA? Tziporet From michael.heinz at qlogic.com Wed Dec 17 13:22:24 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Wed, 17 Dec 2008 15:22:24 -0600 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> <20081216074334.GC6780@sashak.voltaire.com> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB3AE9D9A0AC@MNEXMB1.qlogic.org> Hal, Sasha, This came up today in our internal QA meeting; can I promise them that this will be fixed "in the next release"? (say, 1.4.?) Apparently I'm not the only one who noticed that the saquery command isn't working with non-OFED SMs. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] Sent: Tuesday, December 16, 2008 7:09 AM To: Sasha Khapyorsky Cc: Mike Heinz; general at lists.openfabrics.org; John Russo Subject: Re: [ofa-general] Bugs in opensm/libvendor Sasha, On Tue, Dec 16, 2008 at 2:43 AM, Sasha Khapyorsky wrote: > Hi again, Hal, > > On 11:03 Mon 15 Dec , Hal Rosenstock wrote: >> On Mon, Dec 15, 2008 at 10:38 AM, Sasha Khapyorsky wrote: >> > On 09:29 Mon 15 Dec , Mike Heinz wrote: >> >> >> >> That's a good question - and I'm going to ask around and double check. >> >> My first reaction was that you have to specify how many paths you >> >> want from the query - but you're right, the spec doesn't say that. >> > >> > Yes, it looks like this (but I cannot understand "why" :( ). >> >> The spec says this (for GetTable) and Gets are requests for 1 path. >> The reason is to limit the amount of returned path records (and the >> field limits to 255 records in the response). > > Do you know what is a reason for this "127 records" limitation? Once you get past the scalability discussion (including limiting it to SGID), is there a need for more than 127 ? I think that allowing more paths is more important with various other types of wildcarded PR queries that are "beyond the spec". -- Hal >> >But even more >> > strange (IMHO) limitation is mandatory SGID - actually it should >> >make illegal such GetTable queries as all-to-all, SLID-to-all, >> >etc.. I thought that it is permitted. >> >> It was decided to force SGID. Neither All to all nor SLID to all by >> itself are spec'd (you could could add SGID along with SLID to all >> though). Support for those is a proprietary OpenSM extension which is >> used for testing at least (and also by saquery command). > > Ok. Not a bad extension IMHO :) > > Sasha > From Thomas.Talpey at netapp.com Wed Dec 17 13:36:12 2008 From: Thomas.Talpey at netapp.com (Talpey, Thomas) Date: Wed, 17 Dec 2008 16:36:12 -0500 Subject: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition In-Reply-To: <49496C08.6080404@mellanox.co.il> References: <4948F6E7.9070509@ext.bull.net> <49496C08.6080404@mellanox.co.il> Message-ID: At 04:15 PM 12/17/2008, Tziporet Koren wrote: >Talpey, Thomas wrote: >>>> If I try "ibstat" after that, I have a kernel panic message : >>>> "ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device >or resource >>>> busy)" because device is in use. >>>> >We should release FW 2.6.0 soon - so it will be worth to try >>> I can't explain this - certainly I've never seen it. I am going to >>> guess it's an >>> OFED issue, or something in your setup. Do you have any other detail? Stack >>> trace of the oops? >>> >>> >>> >Tom >Have you used our ConnectX cards when testing NFS/RDMA? Yes, I have and it works fine in the mode Bull is using. We did have some interoperability problems between ConnectX and mthca, but those were back in May, and fixed by Roland a short time later. In any case, the Bull kernel log message indicates NFS/RDMA has connected successfully, so I believe the problem lies elsewhere. Tom. From tidyroad at gmail.com Wed Dec 17 17:00:02 2008 From: tidyroad at gmail.com (=?GB2312?B?xuvCtw==?=) Date: Thu, 18 Dec 2008 09:00:02 +0800 Subject: ***SPAM*** [ofa-general] Can't use TOE and iwarp together with Chelsio's NIC Message-ID: <89e62d5b0812171700v6dce1829xc4f036eaa1245001@mail.gmail.com> Hi, I followed the instructions in http://service.chelsio.com/site-bin/readme.cgi?FILE=linux/rdma/iWARP-MPI-HOWTO.txt . My evironment is: RedHat Linux AS4 update 6, OFED-1.3.1, cxgb3toe-1.1.022. If only installing OFED-1.3.1, everything is OK. But while I have installed cxgb3toe-1.1.022, the problem is comming: *one core is used up by iw_cxgb while doing the rping test.* This state can't be recovered, so I have to restart my computer. Have any idea? Thanks for your help. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tziporet at dev.mellanox.co.il Wed Dec 17 23:50:06 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Thu, 18 Dec 2008 09:50:06 +0200 Subject: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition In-Reply-To: References: <4948F6E7.9070509@ext.bull.net> <49496C08.6080404@mellanox.co.il> Message-ID: <494A00AE.1020407@mellanox.co.il> Talpey, Thomas wrote: > At 04:15 PM 12/17/2008, Tziporet Koren wrote: > >> Talpey, Thomas wrote: >> >>>>> If I try "ibstat" after that, I have a kernel panic message : >>>>> "ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device >>>>> >> or resource >> >>>>> busy)" because device is in use. >>>>> >>>>> >> We should release FW 2.6.0 soon - so it will be worth to try >> >>>> I can't explain this - certainly I've never seen it. I am going to >>>> guess it's an >>>> OFED issue, or something in your setup. Do you have any other detail? Stack >>>> trace of the oops? >>>> >>>> >>>> >>>> >> Tom >> Have you used our ConnectX cards when testing NFS/RDMA? >> > > Yes, I have and it works fine in the mode Bull is using. We did have > some interoperability problems between ConnectX and mthca, but > those were back in May, and fixed by Roland a short time later. > > In any case, the Bull kernel log message indicates NFS/RDMA has > connected successfully, so I believe the problem lies elsewhere. > > Tom. > > Jack noticed that ib_mad1 uses 100% of the cpu in the bug report. And the error -16 is -EBUSY, and this is returned if the command times out when using events. Maybe the HCA or the switch are sending a flood of traps and then the HCA is busy handling all of them and does not complete the command on time? Can you check you do not have errors on your line? Tziporet From celine.bourde at ext.bull.net Thu Dec 18 00:30:31 2008 From: celine.bourde at ext.bull.net (Celine Bourde) Date: Thu, 18 Dec 2008 09:30:31 +0100 Subject: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition In-Reply-To: <49493C30.7040303@nasa.gov> References: <4948F6E7.9070509@ext.bull.net> <49493C30.7040303@nasa.gov> Message-ID: <494A0A27.7030904@ext.bull.net> Jeff Becker wrote: > Hi Celine. > > Celine Bourde wrote: > >> Hi, >> >> I can't mount an NFS/RDMA partition. >> I've applied >> http://www.openfabrics.org//downloads/OFED/ofed-1.4/OFED-1.4-docs/nfs-rdma.release-notes.txt >> >> instructions. >> >> Every steps (loading modules, /etc/exports implementation, starting >> nfs daemon, >> etc..) seems to be ok, but when I do the last command : >> mount -o rdma,port=2050 192.168.0.13:/export /tmp/nfs_client/ >> the mount processus blocks even last dmesg output seems correct : >> "RPC: Registered rdma transport module. >> rpcrdma: connection to 192.168.0.13:2050 on mlx4_0, memreg 5 slots 32 >> ird 16 >> " >> > > I've successfully tested 2.6.27 + OFED1.4 + nfs-utils 1.3 + mthca. Does > your mlx4 card work correctly independent of NFSRDMA? Yes it works correctly, I've no other problems. To be sure, I've done performance tests with qperf (bandwith, latence) and everything is ok. I've connected IB back to back, with same ConnectX cards on both computer. > Also, given later > replies, I'm a little concerned about the mad issues you see. Please > keep me updated. Thanks. > > -jeff > Of course. I will wait Tom results and will keep you aware. Céline. > >> If I try "ibstat" after that, I have a kernel panic message : >> "ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device or >> resource >> busy)" because device is in use. >> >> 100 % of processus is used by ib_mad1 >> [root at test]top >> top - 14:55:07 up 19 min, 3 users, load average: 2.00, 1.87, 1.12 >> Tasks: 190 total, 2 running, 188 sleeping, 0 stopped, 0 zombie >> Cpu(s): 0.0%us, 12.5%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, >> 0.0%si, 0.0%st >> Mem: 8066156k total, 615096k used, 7451060k free, 45604k buffers >> Swap: 8193140k total, 0k used, 8193140k free, 343436k cached >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 2952 root 15 -5 0 0 0 R 100 0.0 5:23.55 ib_mad1 >> 1 root 20 0 10320 688 572 S 0 0.0 0:02.04 init >> 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd >> 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0 >> 4 root 15 -5 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0 >> >> >> I can't kill mount process (kill -9 or shutdown -R or echo b > >> sysrq-trigger) >> and I have to restart the computer using "ipmitool target chassis >> power reset". >> >> Have any idea ? >> >> Moreover, I sometimes have this dmesg log: mlx4_core 0000:01:00.0: >> HW2SW_MPT >> failed (-16). (I don't think there is an agreement with mount bug). I >> saw this >> error could be occured with old firmeware version but mine is 2.5.9 .. >> For more details see bug report : >> https://bugs.openfabrics.org/show_bug.cgi?id=1459 >> >> Thanks for your help. >> >> Céline Bourde. >> >> >> >> >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > > > > From vlad at lists.openfabrics.org Thu Dec 18 03:23:41 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 18 Dec 2008 03:23:41 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081218-0200 daily build status Message-ID: <20081218112342.116D6E6108C@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From eli at dev.mellanox.co.il Thu Dec 18 04:02:49 2008 From: eli at dev.mellanox.co.il (Eli Cohen) Date: Thu, 18 Dec 2008 14:02:49 +0200 Subject: [ofa-general] Re: [PATCH] IPoIB: fix race when manipulating mcast SKBs queue In-Reply-To: References: <20081217142347.GA13829@mtls03> Message-ID: <20081218120143.GA16448@mtls03> On Wed, Dec 17, 2008 at 11:12:39AM -0800, Roland Dreier wrote: > I don't see why this would be required. When ipoib_mcast_free() runs, > the mcast structure has been removed from all lists and I don't see how > any other context could simultaneously be adding packets to pkt_queue. > What is the race that you think this fixes? > I missed the fact that mcasts are moved to "remove list" so I guess I there is another problem out there. From sashak at voltaire.com Thu Dec 18 04:12:48 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 Dec 2008 14:12:48 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB3AE9D9A0AC@MNEXMB1.qlogic.org> References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> <20081216074334.GC6780@sashak.voltaire.com> <4C2744E8AD2982428C5BFE523DF8CDCB3AE9D9A0AC@MNEXMB1.qlogic.org> Message-ID: <20081218121248.GE18908@sashak.voltaire.com> Hi Mike, On 15:22 Wed 17 Dec , Mike Heinz wrote: > > This came up today in our internal QA meeting; can I promise them that this will be fixed "in the next release"? (say, 1.4.?) IMO your patch using with GetTable: comp_mask |= EB_PR_COMPMASK_NUMBPATH; num_path = 0x1f; is acceptable (just remember to add Signed-off-by line). Also to be fully complaint SGID should be used with SLID/DLID query. This likely will require API change and could be handled as different patch. > Apparently I'm not the only one who noticed that the saquery command isn't working with non-OFED SMs. Which SM(s)? Sasha From sashak at voltaire.com Thu Dec 18 04:29:53 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 Dec 2008 14:29:53 +0200 Subject: [ofa-general] Re: [OpenSM][PATCH] Corrected incoherency in __osm_ftree_fabric_route_to_non_cns comments In-Reply-To: <4948F434.8020605@ext.bull.net> References: <4948F434.8020605@ext.bull.net> Message-ID: <20081218122953.GH18908@sashak.voltaire.com> On 13:44 Wed 17 Dec , Nicolas Morey Chaisemartin wrote: > It seems to be that there is an error in the comment of > __osm_ftree_fabric_route_to_non_cns comments. > It said the function was to be called with TRUE,FALSE parameters when it > was called with TRUE,TRUE. > I'm just discovering the Ftree routing code so I may have misunderstood > something. > > Anyway if I'm right, here's the patch! > > > Signed-off-by: Nicolas Morey-Chaisemartin Applied. Thanks. Sasha From sashak at voltaire.com Thu Dec 18 04:30:15 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 Dec 2008 14:30:15 +0200 Subject: [ofa-general] Re: [PATCH] opensm/osm_ucast_ftree.c: fixing errors in comments In-Reply-To: <49491426.3030208@dev.mellanox.co.il> References: <49491426.3030208@dev.mellanox.co.il> Message-ID: <20081218123015.GI18908@sashak.voltaire.com> On 17:00 Wed 17 Dec , Yevgeny Kliteynik wrote: > Hi Sasha, > > Following Nicolas' fix, here's some fixes in the fat-tree routing comments. > > Signed-off-by: Yevgeny Kliteynik Applied. Thanks. Sasha From hal.rosenstock at gmail.com Thu Dec 18 04:35:17 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Dec 2008 07:35:17 -0500 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081218121248.GE18908@sashak.voltaire.com> References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> <20081216074334.GC6780@sashak.voltaire.com> <4C2744E8AD2982428C5BFE523DF8CDCB3AE9D9A0AC@MNEXMB1.qlogic.org> <20081218121248.GE18908@sashak.voltaire.com> Message-ID: Sasha, On Thu, Dec 18, 2008 at 7:12 AM, Sasha Khapyorsky wrote: > Hi Mike, > > On 15:22 Wed 17 Dec , Mike Heinz wrote: >> >> This came up today in our internal QA meeting; can I promise them that this will be fixed "in the next release"? (say, 1.4.?) > > IMO your patch using with GetTable: > > comp_mask |= EB_PR_COMPMASK_NUMBPATH; > num_path = 0x1f; > > is acceptable (just remember to add Signed-off-by line). > > Also to be fully complaint SGID should be used with SLID/DLID query. This > likely will require API change and could be handled as different patch. Isn't there an OpenSM change needed as well ? -- Hal >> Apparently I'm not the only one who noticed that the saquery command isn't working with non-OFED SMs. > > Which SM(s)? > > Sasha > From aostvold at platform.com Thu Dec 18 04:41:43 2008 From: aostvold at platform.com (Asmund Ostvold) Date: Thu, 18 Dec 2008 13:41:43 +0100 Subject: [ofa-general] ibv_post_send fails when using malloc in a special way In-Reply-To: References: <49368267.40007@platform.com> <493FDA77.8030802@platform.com> Message-ID: <494A4507.60903@platform.com> Roland, Roland Dreier wrote: > I'm not sure how madvise() would have any relevance to your problem, > since as far as I can see you are not using fork(). In any case, > libibverbs will only call madvise() if you call ibv_fork_init() or set > the IBV_FORK_SAFE environment variable. madvise() is probably not relevant. We _do_ see calls to madvise() in the enclosed program, without any IBV_FORK_SAFE environment variable (and the program does not call fork(), and I assume ibverbs neither do). Snip from ltrace: free(0x2ae80c0a1000 SYS_madvise(0x2ae80c0a9000, ...) = 0 <... free resumed> ) = We assume that the virtual-to-physical mapping of a region which has been initialized (initial page fault) and has been registered with ibv_reg_mr() only changes by 1) negative sbrk() or 2) munmap()/mremap(). Neither of this happens in the enclosed program. Given that our assumption is correct, we still claim the program shows a bug, since the sender RDMAs incorrect data. PS: If you remove the huge failing allocation, a: (void) mallopt(M_TRIM_THRESHOLD, -1); must be inserted in the top of main() to avoid 1) above. By doing this (removing the huge malloc() and inserting the mallopt()), the program work as expected. - Asmund From sashak at voltaire.com Thu Dec 18 04:45:50 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 Dec 2008 14:45:50 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081215151753.GA22506@sashak.voltaire.com> <20081215153838.GB22506@sashak.voltaire.com> <20081216074334.GC6780@sashak.voltaire.com> <4C2744E8AD2982428C5BFE523DF8CDCB3AE9D9A0AC@MNEXMB1.qlogic.org> <20081218121248.GE18908@sashak.voltaire.com> Message-ID: <20081218124540.GK18908@sashak.voltaire.com> Hi Hal, On 07:35 Thu 18 Dec , Hal Rosenstock wrote: > > > > Also to be fully complaint SGID should be used with SLID/DLID query. This > > likely will require API change and could be handled as different patch. > > Isn't there an OpenSM change needed as well ? I'm not sure. Basically OpenSM works correctly - it just doesn't reject non-complaint queries. I think we could preserve this "extension". Sasha From hal.rosenstock at gmail.com Thu Dec 18 05:35:03 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Dec 2008 08:35:03 -0500 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081218124540.GK18908@sashak.voltaire.com> References: <20081215153838.GB22506@sashak.voltaire.com> <20081216074334.GC6780@sashak.voltaire.com> <4C2744E8AD2982428C5BFE523DF8CDCB3AE9D9A0AC@MNEXMB1.qlogic.org> <20081218121248.GE18908@sashak.voltaire.com> <20081218124540.GK18908@sashak.voltaire.com> Message-ID: Sasha, On Thu, Dec 18, 2008 at 7:45 AM, Sasha Khapyorsky wrote: > Hi Hal, > > On 07:35 Thu 18 Dec , Hal Rosenstock wrote: >> > >> > Also to be fully complaint SGID should be used with SLID/DLID query. This >> > likely will require API change and could be handled as different patch. >> >> Isn't there an OpenSM change needed as well ? > > I'm not sure. Basically OpenSM works correctly - it just doesn't reject > non-complaint queries. I think we could preserve this "extension". IMO in order to preserve this extended feature which is used in various places currently, the treatment of num_paths = 127 needs to be changed to how not supplying this component is currently handled. -- Hal > Sasha > From nicolas.morey-chaisemartin at ext.bull.net Thu Dec 18 05:42:17 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Thu, 18 Dec 2008 14:42:17 +0100 Subject: [ofa-general] [PATCH OpenSM 0/3] Fat Tree - Routing between non-CN nodes Message-ID: <494A5339.9030304@ext.bull.net> Hi, We are current working on a Ftree topology where IO nodes are connected on spine switches. Using the cn_guid_file and root_guid_file works great. It is possible to route the whole tree as a fat tree. All the CNs are connected to the other CN and IO nodes. However, we are missing some connectivity between IO nodes. This is the expected behavior as the route between those IO nodes would have to go down to go back up on another spine switch. However, we need at least a bit of connectivity between those nodes. There won't be any real traffic but just some "ping" for HA purposes. Therefore, I have implemented two new options to openSM: io_guid_file and max_reverse_hops. The io_guid_file provides a list of all the IO guid (it may differs from the list of non-CN nodes) The max_reverse_hops gives the number of time IO nodes (described by io_guid_file) are allowed to use a switch backward. According to my tests this has absolutely no effects on regular routing and manages to connect the io nodes together, if max_reverse_hops is big enough. This is a first draft for this feature. I'd be happy to have some feedback about how to upgrade it and make it as clean as possible, wether it is integrated in the mainstream or not. Regards Nicolas Morey- Chaisemartin --------- Nicolas Morey-Chaisemartin (3): Added io_guid_file options and variables in the different structures and functions. Added max_reverse_hops option for I/O nodes Added possible reverse hops for Ftree algorithm. opensm/include/opensm/osm_subnet.h | 6 + opensm/opensm/main.c | 26 +++++- opensm/opensm/osm_subnet.c | 18 ++++ opensm/opensm/osm_ucast_ftree.c | 183 +++++++++++++++++++++++++++++++----- 4 files changed, 207 insertions(+), 26 deletions(-) From nicolas.morey-chaisemartin at ext.bull.net Thu Dec 18 05:43:50 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Thu, 18 Dec 2008 14:43:50 +0100 Subject: [ofa-general] [PATCH OpenSM 1/3] Added io_guid_file options and variables in the different structures and functions. In-Reply-To: <494A5339.9030304@ext.bull.net> References: <494A5339.9030304@ext.bull.net> Message-ID: <494A5396.5040106@ext.bull.net> Signed-off-by: Nicolas Morey-Chaisemartin --- opensm/include/opensm/osm_subnet.h | 5 ++ opensm/opensm/main.c | 13 ++++++ opensm/opensm/osm_subnet.c | 9 ++++ opensm/opensm/osm_ucast_ftree.c | 81 ++++++++++++++++++++++++++++++++---- 4 files changed, 100 insertions(+), 8 deletions(-) -------------- next part -------------- A non-text attachment was scrubbed... Name: 32605f5c1c85e88a12722b00b4f9a7e34a31c5ba.diff Type: text/x-patch Size: 10069 bytes Desc: not available URL: From nicolas.morey-chaisemartin at ext.bull.net Thu Dec 18 05:43:58 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Thu, 18 Dec 2008 14:43:58 +0100 Subject: [ofa-general] [PATCH OpenSM 2/3] Added max_reverse_hops option for I/O nodes In-Reply-To: <494A5339.9030304@ext.bull.net> References: <494A5339.9030304@ext.bull.net> Message-ID: <494A539E.9090708@ext.bull.net> Signed-off-by: Nicolas Morey-Chaisemartin --- opensm/include/opensm/osm_subnet.h | 1 + opensm/opensm/main.c | 15 +++++++++++++-- opensm/opensm/osm_subnet.c | 9 +++++++++ 3 files changed, 23 insertions(+), 2 deletions(-) -------------- next part -------------- A non-text attachment was scrubbed... Name: cac8c56e3c6b6b33663acf7e2db70c3cc32fea80.diff Type: text/x-patch Size: 3703 bytes Desc: not available URL: From nicolas.morey-chaisemartin at ext.bull.net Thu Dec 18 05:44:14 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Thu, 18 Dec 2008 14:44:14 +0100 Subject: [ofa-general] [PATCH OpenSM 3/3] Added possible reverse hops for Ftree algorithm. In-Reply-To: <494A5339.9030304@ext.bull.net> References: <494A5339.9030304@ext.bull.net> Message-ID: <494A53AE.8080706@ext.bull.net> Signed-off-by: Nicolas Morey-Chaisemartin --- opensm/opensm/osm_ucast_ftree.c | 102 ++++++++++++++++++++++++++++++++------- 1 files changed, 85 insertions(+), 17 deletions(-) -------------- next part -------------- A non-text attachment was scrubbed... Name: 1200ef2cdb25efde2196d700c5a4b347356de5f3.diff Type: text/x-patch Size: 10031 bytes Desc: not available URL: From sashak at voltaire.com Thu Dec 18 06:31:41 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 Dec 2008 16:31:41 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081215153838.GB22506@sashak.voltaire.com> <20081216074334.GC6780@sashak.voltaire.com> <4C2744E8AD2982428C5BFE523DF8CDCB3AE9D9A0AC@MNEXMB1.qlogic.org> <20081218121248.GE18908@sashak.voltaire.com> <20081218124540.GK18908@sashak.voltaire.com> Message-ID: <20081218143141.GM18908@sashak.voltaire.com> On 08:35 Thu 18 Dec , Hal Rosenstock wrote: > > IMO in order to preserve this extended feature which is used in > various places currently, the treatment of num_paths = 127 needs to be > changed to how not supplying this component is currently handled. Are you saying we need to return potentially more than 127 paths even if num_paths = 127 was specified explicitly? I don't think that this is a good idea. Assuming you care about breaking backward compatibility where this osm_vendor_sa API (OSMV_QUERY_PATH_REC_BY_*) is used today? Sasha From hal.rosenstock at gmail.com Thu Dec 18 06:43:46 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Dec 2008 09:43:46 -0500 Subject: ***SPAM*** Re: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081218143141.GM18908@sashak.voltaire.com> References: <20081216074334.GC6780@sashak.voltaire.com> <4C2744E8AD2982428C5BFE523DF8CDCB3AE9D9A0AC@MNEXMB1.qlogic.org> <20081218121248.GE18908@sashak.voltaire.com> <20081218124540.GK18908@sashak.voltaire.com> <20081218143141.GM18908@sashak.voltaire.com> Message-ID: On Thu, Dec 18, 2008 at 9:31 AM, Sasha Khapyorsky wrote: > On 08:35 Thu 18 Dec , Hal Rosenstock wrote: >> >> IMO in order to preserve this extended feature which is used in >> various places currently, the treatment of num_paths = 127 needs to be >> changed to how not supplying this component is currently handled. > > Are you saying we need to return potentially more than 127 paths even if > num_paths = 127 was specified explicitly? I don't think that this is a good idea. I thought that was your proposal in order not to change the API for this. I also thought you "liked" the extensions provided by OpenSM. > Assuming you care about breaking backward compatibility Don't you ? > where this osm_vendor_sa API (OSMV_QUERY_PATH_REC_BY_*) is used today? saquery and osmtest. -- Hal > Sasha > From Sumeet.Lahorani at oracle.com Thu Dec 18 00:28:53 2008 From: Sumeet.Lahorani at oracle.com (Sumeet Lahorani) Date: Thu, 18 Dec 2008 00:28:53 -0800 Subject: [ofa-general] IB interfaces occasionally go down & come up for no reason Message-ID: <494A09C5.8000203@oracle.com> Hi, We sometimes see our IB interfaces go down and come back up within 2 or 3 seconds for apparently no reason. Dec 17 14:47:23 dscbax14s kernel: ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 Dec 17 14:47:23 dscbax14s kernel: ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle interface ib0, disabling it in 5000 ms. Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle interface ib1, disabling it in 5000 ms. Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after 2000 ms for interface ib0. Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after 2000 ms for interface ib1. To mask these we've set downdelay & updelay to 5000. But can anybody tell me why these interfaces could be bouncing down & up like this? We are not pulling any cables, resetting ports or resetting switches when this happens. We are using Voltaire ISR9024 switches & Mellanox Technologies MT25418 [ConnectX IB DDR] HCAs. - Sumeet From sashak at voltaire.com Thu Dec 18 06:58:01 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 Dec 2008 16:58:01 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <20081216074334.GC6780@sashak.voltaire.com> <4C2744E8AD2982428C5BFE523DF8CDCB3AE9D9A0AC@MNEXMB1.qlogic.org> <20081218121248.GE18908@sashak.voltaire.com> <20081218124540.GK18908@sashak.voltaire.com> <20081218143141.GM18908@sashak.voltaire.com> Message-ID: <20081218145801.GN18908@sashak.voltaire.com> On 09:43 Thu 18 Dec , Hal Rosenstock wrote: > > > > Are you saying we need to return potentially more than 127 paths even if > > num_paths = 127 was specified explicitly? I don't think that this is a good idea. > > I thought that was your proposal in order not to change the API for this. My proposal is to make OSMV_QUERY_PATH_REC_BY_* stuff (in opensm/libvendor/osm_vendor_*_sa) to be IBA complaint. And to use OSMV_QUERY_USER_DEFINED when we would like to create custom queries (for example without num_paths defined). > I also thought you "liked" the extensions provided by OpenSM. Yes, I'm. So I think it would be nice for OpenSM to handle queries where num_paths is not specified. But in case when num_paths is requested OpenSM should not ignore this and return records accordingly. > > Assuming you care about breaking backward compatibility > > Don't you ? > > > where this osm_vendor_sa API (OSMV_QUERY_PATH_REC_BY_*) is used today? > > saquery and osmtest. I can care about saquery - I always thought that OSMV_QUERY_USER_DEFINED is better and more useful than OSMV_QUERY_PATH_REC_BY_*. I don't expect any hurt for osmtest (however didn't check this yet). Sasha From hal.rosenstock at gmail.com Thu Dec 18 07:18:24 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Dec 2008 10:18:24 -0500 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081218145801.GN18908@sashak.voltaire.com> References: <4C2744E8AD2982428C5BFE523DF8CDCB3AE9D9A0AC@MNEXMB1.qlogic.org> <20081218121248.GE18908@sashak.voltaire.com> <20081218124540.GK18908@sashak.voltaire.com> <20081218143141.GM18908@sashak.voltaire.com> <20081218145801.GN18908@sashak.voltaire.com> Message-ID: On Thu, Dec 18, 2008 at 9:58 AM, Sasha Khapyorsky wrote: > On 09:43 Thu 18 Dec , Hal Rosenstock wrote: >> > >> > Are you saying we need to return potentially more than 127 paths even if >> > num_paths = 127 was specified explicitly? I don't think that this is a good idea. >> >> I thought that was your proposal in order not to change the API for this. > > My proposal is to make OSMV_QUERY_PATH_REC_BY_* stuff (in > opensm/libvendor/osm_vendor_*_sa) to be IBA complaint. Yes, and this affects any potential out of tree uses. > And to use > OSMV_QUERY_USER_DEFINED when we would like to create custom queries > (for example without num_paths defined). > >> I also thought you "liked" the extensions provided by OpenSM. > > Yes, I'm. So I think it would be nice for OpenSM to handle queries where > num_paths is not specified. But in case when num_paths is requested > OpenSM should not ignore this and return records accordingly. OK; I thought you said to use the num_paths = 127 as the extension. >> > Assuming you care about breaking backward compatibility >> >> Don't you ? >> >> > where this osm_vendor_sa API (OSMV_QUERY_PATH_REC_BY_*) is used today? >> >> saquery and osmtest. > > I can care about saquery - Do you care about osmtest ? > I always thought that OSMV_QUERY_USER_DEFINED > is better and more useful than OSMV_QUERY_PATH_REC_BY_*. Depends by what you mean by better and more useful. User defined queries can do any SA query so is way more flexible. AFAIK the specific queries were intended to make things a little easier to use for what were deemed the more common use cases. > I don't expect any hurt for osmtest (however didn't check this yet). Depends on what you mean by hurt: osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_PORT_GUIDS; osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_GIDS; osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_LIDS; osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_PORT_GUIDS; So in line with your approach, these instances should be changed over to user specified ones. Also, IMO saquery should support both compliant and extended queries. -- Hal > Sasha From sashak at voltaire.com Thu Dec 18 07:48:08 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 Dec 2008 17:48:08 +0200 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB3AE9D9A0AC@MNEXMB1.qlogic.org> <20081218121248.GE18908@sashak.voltaire.com> <20081218124540.GK18908@sashak.voltaire.com> <20081218143141.GM18908@sashak.voltaire.com> <20081218145801.GN18908@sashak.voltaire.com> Message-ID: <20081218154808.GP18908@sashak.voltaire.com> On 10:18 Thu 18 Dec , Hal Rosenstock wrote: > > > I don't expect any hurt for osmtest (however didn't check this yet). > > Depends on what you mean by hurt: > > osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_PORT_GUIDS; > osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_GIDS; > osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_LIDS; > osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_PORT_GUIDS; > > So in line with your approach, these instances should be changed over > to user specified ones. Look closer how it is used there - it is transparent to num_paths returned. I don't think we need to change something in osmtest. > Also, IMO saquery should support both compliant and extended queries. If somebody cares I can accept the patch with '--complaint' option implemented. Sasha From hal.rosenstock at gmail.com Thu Dec 18 07:59:30 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Dec 2008 10:59:30 -0500 Subject: [ofa-general] Bugs in opensm/libvendor In-Reply-To: <20081218154808.GP18908@sashak.voltaire.com> References: <20081218121248.GE18908@sashak.voltaire.com> <20081218124540.GK18908@sashak.voltaire.com> <20081218143141.GM18908@sashak.voltaire.com> <20081218145801.GN18908@sashak.voltaire.com> <20081218154808.GP18908@sashak.voltaire.com> Message-ID: On Thu, Dec 18, 2008 at 10:48 AM, Sasha Khapyorsky wrote: > On 10:18 Thu 18 Dec , Hal Rosenstock wrote: >> >> > I don't expect any hurt for osmtest (however didn't check this yet). >> >> Depends on what you mean by hurt: >> >> osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_PORT_GUIDS; >> osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_GIDS; >> osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_LIDS; >> osmtest/osmtest.c: req.query_type = OSMV_QUERY_PATH_REC_BY_PORT_GUIDS; >> >> So in line with your approach, these instances should be changed over >> to user specified ones. > > Look closer how it is used there I'm well aware of how it's used there. > - it is transparent to num_paths > returned. I don't think we need to change something in osmtest. In terms of breaking osmtest, it doesn't but it is a semantical change and no longer will all paths always be able to be validated without such a change. I know I used to find that useful there. >> Also, IMO saquery should support both compliant and extended queries. > > If somebody cares I can accept the patch with '--complaint' option > implemented. I think you mean --noncompliant but anyhow I don't like the backward compatibility/old semantics not being supported (in addition in these in tree applications) but not sure I have the time. -- Hal > Sasha From chu11 at llnl.gov Thu Dec 18 08:33:22 2008 From: chu11 at llnl.gov (Al Chu) Date: Thu, 18 Dec 2008 08:33:22 -0800 Subject: [ofa-general] [PATCH OpenSM 0/3] Fat Tree - Routing between non-CN nodes In-Reply-To: <494A5339.9030304@ext.bull.net> References: <494A5339.9030304@ext.bull.net> Message-ID: <1229618002.6821.53.camel@auk31.llnl.gov> Hi Nicolas, One minor comment. You seen to have no manpage entries for these new options. Some additions into the opensm/doc/ files might be good too. Al On Thu, 2008-12-18 at 14:42 +0100, Nicolas Morey Chaisemartin wrote: > Hi, > > We are current working on a Ftree topology where IO nodes are connected on spine switches. > Using the cn_guid_file and root_guid_file works great. > It is possible to route the whole tree as a fat tree. All the CNs are connected to the other CN and IO nodes. > However, we are missing some connectivity between IO nodes. This is the expected behavior as the route between those IO nodes would have > to go down to go back up on another spine switch. > > However, we need at least a bit of connectivity between those nodes. There won't be any real traffic but just some "ping" for HA purposes. > > Therefore, I have implemented two new options to openSM: io_guid_file and max_reverse_hops. > The io_guid_file provides a list of all the IO guid (it may differs from the list of non-CN nodes) > The max_reverse_hops gives the number of time IO nodes (described by io_guid_file) are allowed to use a switch backward. > > According to my tests this has absolutely no effects on regular routing and manages to connect the io nodes together, if max_reverse_hops is big enough. > > This is a first draft for this feature. I'd be happy to have some feedback about how to upgrade it and make it as clean as possible, wether it is integrated in the mainstream or not. > > Regards > > Nicolas Morey- Chaisemartin > > > > --------- > > Nicolas Morey-Chaisemartin (3): > Added io_guid_file options and variables in the different structures > and functions. > Added max_reverse_hops option for I/O nodes > Added possible reverse hops for Ftree algorithm. > > opensm/include/opensm/osm_subnet.h | 6 + > opensm/opensm/main.c | 26 +++++- > opensm/opensm/osm_subnet.c | 18 ++++ > opensm/opensm/osm_ucast_ftree.c | 183 +++++++++++++++++++++++++++++++----- > 4 files changed, 207 insertions(+), 26 deletions(-) > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From hal.rosenstock at gmail.com Thu Dec 18 08:37:45 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Dec 2008 11:37:45 -0500 Subject: [ofa-general] IB interfaces occasionally go down & come up for no reason In-Reply-To: <494A09C5.8000203@oracle.com> References: <494A09C5.8000203@oracle.com> Message-ID: Hi, On Thu, Dec 18, 2008 at 3:28 AM, Sumeet Lahorani wrote: > > Hi, > > We sometimes see our IB interfaces go down and come back up within 2 or 3 > seconds for apparently no reason. That can occur without cable pulling, etc. when certain errors are present on the link. > Dec 17 14:47:23 dscbax14s kernel: ib0: multicast join failed for > ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 -11 is EAGAIN > Dec 17 14:47:23 dscbax14s kernel: ib1: multicast join failed for > ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 > Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle > interface ib0, disabling it in 5000 ms. > Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle > interface ib1, disabling it in 5000 ms. > Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after > 2000 ms for interface ib0. > Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after > 2000 ms for interface ib1. > > To mask these we've set downdelay & updelay to 5000. But can anybody tell me > why these interfaces could be bouncing down & up like this? We are not > pulling any cables, resetting ports or resetting switches when this happens. > We are using Voltaire ISR9024 switches & Mellanox Technologies MT25418 > [ConnectX IB DDR] HCAs. Which SM flavor ? Would you dump out the port counters and see how they are change before and after one of these "events" ? -- Hal > - Sumeet > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From Sumeet.Lahorani at oracle.com Thu Dec 18 08:54:04 2008 From: Sumeet.Lahorani at oracle.com (Sumeet Lahorani) Date: Thu, 18 Dec 2008 08:54:04 -0800 Subject: [ofa-general] IB interfaces occasionally go down & come up for no reason In-Reply-To: References: <494A09C5.8000203@oracle.com> Message-ID: <494A802C.2000407@oracle.com> We are using the SM on the voltaire switch. I could collect before & after snapshots of the port counters if I had a way of knowing when the event was about to happen. The problem is I don't. I guess we could run ibqueryerrors.pl every 5 seconds or so and correlate this event based on the timestamp. Is there some tracing I could turn on to dump out the reason for the link bounce? Do you have some examples of the errors that can lead to such a link bounce? - Sumeet Hal Rosenstock wrote: > Hi, > > On Thu, Dec 18, 2008 at 3:28 AM, Sumeet Lahorani > wrote: > >> Hi, >> >> We sometimes see our IB interfaces go down and come back up within 2 or 3 >> seconds for apparently no reason. >> > > That can occur without cable pulling, etc. when certain errors are > present on the link. > > >> Dec 17 14:47:23 dscbax14s kernel: ib0: multicast join failed for >> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 >> > > -11 is EAGAIN > > >> Dec 17 14:47:23 dscbax14s kernel: ib1: multicast join failed for >> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 >> Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle >> interface ib0, disabling it in 5000 ms. >> Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle >> interface ib1, disabling it in 5000 ms. >> Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after >> 2000 ms for interface ib0. >> Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after >> 2000 ms for interface ib1. >> >> To mask these we've set downdelay & updelay to 5000. But can anybody tell me >> why these interfaces could be bouncing down & up like this? We are not >> pulling any cables, resetting ports or resetting switches when this happens. >> We are using Voltaire ISR9024 switches & Mellanox Technologies MT25418 >> [ConnectX IB DDR] HCAs. >> > > Which SM flavor ? > > Would you dump out the port counters and see how they are change > before and after one of these "events" ? > > -- Hal > > >> - Sumeet >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> >> > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From weiny2 at llnl.gov Thu Dec 18 09:20:00 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 18 Dec 2008 09:20:00 -0800 Subject: [ofa-general] IB interfaces occasionally go down & come up for no reason In-Reply-To: <494A802C.2000407@oracle.com> References: <494A09C5.8000203@oracle.com> <494A802C.2000407@oracle.com> Message-ID: <20081218092000.44189684.weiny2@llnl.gov> Sumeet, On Thu, 18 Dec 2008 08:54:04 -0800 Sumeet Lahorani wrote: > > We are using the SM on the voltaire switch. > > I could collect before & after snapshots of the port counters if I had a > way of knowing when the event was about to happen. The problem is I > don't. I guess we could run ibqueryerrors.pl every 5 seconds or so and > correlate this event based on the timestamp. You will have to do something like that. I don't know if they Voltaire SM has any performance management they can run, check with them. Last time we ran the Voltaire SM it was the clear, run, and read procedure you describe. Alternately, you could try OpenSM with the Performance Manager. Here you would have to read the errors periodically and look for trends. I am just now getting back to a new version of my plugin to not only store the data in MySQL but to store a history of each read performed to get better historical data. > > Is there some tracing I could turn on to dump out the reason for the > link bounce? I am not sure the link bounced. Perhaps you are just getting errors on the link which will cause the ULP's to give up. For example, RC QP's could be going into error state after a number of failed packets. I think that is why Hal wanted you to look for errors. > > Do you have some examples of the errors that can lead to such a link bounce? If the link does "bounce" ie physically goes down while data is flowing over it, look for the Symbol Errors and Xmit Discards to be "pegged". We see this when a cable is pulled accidentally or a node goes unresponsive in a running job. It will probably be easier to see these errors on the switch port. Hope this helps, Ira > > - Sumeet > > Hal Rosenstock wrote: > > Hi, > > > > On Thu, Dec 18, 2008 at 3:28 AM, Sumeet Lahorani > > wrote: > > > >> Hi, > >> > >> We sometimes see our IB interfaces go down and come back up within 2 or 3 > >> seconds for apparently no reason. > >> > > > > That can occur without cable pulling, etc. when certain errors are > > present on the link. > > > > > >> Dec 17 14:47:23 dscbax14s kernel: ib0: multicast join failed for > >> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 > >> > > > > -11 is EAGAIN > > > > > >> Dec 17 14:47:23 dscbax14s kernel: ib1: multicast join failed for > >> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 > >> Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle > >> interface ib0, disabling it in 5000 ms. > >> Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for idle > >> interface ib1, disabling it in 5000 ms. > >> Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after > >> 2000 ms for interface ib0. > >> Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again after > >> 2000 ms for interface ib1. > >> > >> To mask these we've set downdelay & updelay to 5000. But can anybody tell me > >> why these interfaces could be bouncing down & up like this? We are not > >> pulling any cables, resetting ports or resetting switches when this happens. > >> We are using Voltaire ISR9024 switches & Mellanox Technologies MT25418 > >> [ConnectX IB DDR] HCAs. > >> > > > > Which SM flavor ? > > > > Would you dump out the port counters and see how they are change > > before and after one of these "events" ? > > > > -- Hal > > > > > >> - Sumeet > >> > >> _______________________________________________ > >> general mailing list > >> general at lists.openfabrics.org > >> http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > >> > >> To unsubscribe, please visit > >> http:// openib.org/mailman/listinfo/openib-general > >> > >> > > _______________________________________________ > > general mailing list > > general at lists.openfabrics.org > > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > From hal.rosenstock at gmail.com Thu Dec 18 10:44:05 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Dec 2008 13:44:05 -0500 Subject: [ofa-general] IB interfaces occasionally go down & come up for no reason In-Reply-To: <494A802C.2000407@oracle.com> References: <494A09C5.8000203@oracle.com> <494A802C.2000407@oracle.com> Message-ID: On Thu, Dec 18, 2008 at 11:54 AM, Sumeet Lahorani wrote: > > We are using the SM on the voltaire switch. As Ira indicated, Voltaire also has a performance manager colocated with the SM. > I could collect before & after snapshots of the port counters if I had a way > of knowing when the event was about to happen. The problem is I don't. How often does the link bounce ? Do you see the LEDs on that port change ? How did you determine that the link bounces periodically ? > I guess we could run ibqueryerrors.pl Or just perfquery on the CA port which is thought to be bouncing. > every 5 seconds or so and correlate this event based on the timestamp. As Ira indicated, this may not be fruitful if the Voltaire performance manager is resetting the error counters but it can't hurt to see if any interesting counters change. > Is there some tracing I could turn on to dump out the reason for the link > bounce? That may not be fruitful depending on the nature of the problem. The error counters are the first level diagnostic on where to next look. Also, the level of tracing will depend on what external tools you have. > Do you have some examples of the errors that can lead to such a link bounce? See IBA 1.2.1 vol 2 p. 157 5.7 LINK PHYSICAL ERROR HANDLING LinkErrorRecoveryCounter and LinkDownedCounters will count interesting events at the physical level. One specific example that Ira pointed out is a high rate (exceeding threshold) of SymbolErrors (minor event). There are a number of other ones discussed in that section. -- Hal > - Sumeet > > Hal Rosenstock wrote: >> >> Hi, >> >> On Thu, Dec 18, 2008 at 3:28 AM, Sumeet Lahorani >> wrote: >> >>> >>> Hi, >>> >>> We sometimes see our IB interfaces go down and come back up within 2 or 3 >>> seconds for apparently no reason. >>> >> >> That can occur without cable pulling, etc. when certain errors are >> present on the link. >> >> >>> >>> Dec 17 14:47:23 dscbax14s kernel: ib0: multicast join failed for >>> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 >>> >> >> -11 is EAGAIN >> >> >>> >>> Dec 17 14:47:23 dscbax14s kernel: ib1: multicast join failed for >>> ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11 >>> Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for >>> idle >>> interface ib0, disabling it in 5000 ms. >>> Dec 17 14:47:23 dscbax14s kernel: bonding: bond0: link status down for >>> idle >>> interface ib1, disabling it in 5000 ms. >>> Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again >>> after >>> 2000 ms for interface ib0. >>> Dec 17 14:47:25 dscbax14s kernel: bonding: bond0: link status up again >>> after >>> 2000 ms for interface ib1. >>> >>> To mask these we've set downdelay & updelay to 5000. But can anybody tell >>> me >>> why these interfaces could be bouncing down & up like this? We are not >>> pulling any cables, resetting ports or resetting switches when this >>> happens. >>> We are using Voltaire ISR9024 switches & Mellanox Technologies MT25418 >>> [ConnectX IB DDR] HCAs. >>> >> >> Which SM flavor ? >> >> Would you dump out the port counters and see how they are change >> before and after one of these "events" ? >> >> -- Hal >> >> >>> >>> - Sumeet >>> >>> _______________________________________________ >>> general mailing list >>> general at lists.openfabrics.org >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >>> >>> To unsubscribe, please visit >>> http://openib.org/mailman/listinfo/openib-general >>> >>> >> >> _______________________________________________ >> general mailing list >> general at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general >> >> To unsubscribe, please visit >> http://openib.org/mailman/listinfo/openib-general >> > > From michael.heinz at qlogic.com Thu Dec 18 11:18:03 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu, 18 Dec 2008 13:18:03 -0600 Subject: [ofa-general] Patch for libvendor incompatibility with QLogic SM Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB3E7462465F@MNEXMB1.qlogic.org> Sets the num_path attribute of the path record for path queries, to comply with the IBTA spec while supporting the opensm extended functionality. Signed-off-by: mheinz at qlogic.com (Michael Heinz) -------------------------------- --- osm_vendor_ibumad_sa.c.orig 2008-10-20 01:00:09.000000000 -0400 +++ osm_vendor_ibumad_sa.c 2008-12-18 14:13:05.000000000 -0500 @@ -615,7 +615,8 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); + path_rec.num_path = 0x7f; sa_mad_data.p_attr = &path_rec; ib_gid_set_default(&path_rec.dgid, ((osmv_guid_pair_t *) (p_query_req-> @@ -634,7 +635,8 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); + path_rec.num_path = 0x7f; sa_mad_data.p_attr = &path_rec; memcpy(&path_rec.dgid, &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> @@ -652,7 +654,8 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | IB_PR_COMPMASK_NUMBPATH); + path_rec.num_path = 0x7f; sa_mad_data.p_attr = &path_rec; path_rec.dlid = ((osmv_lid_pair_t *) (p_query_req->p_query_input))-> -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -------------- next part -------------- A non-text attachment was scrubbed... Name: libvendor.patchfile Type: application/octet-stream Size: 1343 bytes Desc: libvendor.patchfile URL: From hal.rosenstock at gmail.com Thu Dec 18 11:33:45 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Dec 2008 14:33:45 -0500 Subject: [ofa-general] Patch for libvendor incompatibility with QLogic SM In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB3E7462465F@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB3E7462465F@MNEXMB1.qlogic.org> Message-ID: Mike, On Thu, Dec 18, 2008 at 2:18 PM, Mike Heinz wrote: > Sets the num_path attribute of the path record for path queries, to comply with the IBTA spec while supporting the opensm extended functionality. Actually, given the discussion, (nit) I don't think this patch does anything in terms of supporting the opensm extended functionality. > Signed-off-by: mheinz at qlogic.com (Michael Heinz) Another nit... Usual format is: Signed-off-by: Michael Heinz > -------------------------------- > --- osm_vendor_ibumad_sa.c.orig 2008-10-20 01:00:09.000000000 -0400 > +++ osm_vendor_ibumad_sa.c 2008-12-18 14:13:05.000000000 -0500 > @@ -615,7 +615,8 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); > + path_rec.num_path = 0x7f; > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) (p_query_req-> > @@ -634,7 +635,8 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); > + path_rec.num_path = 0x7f; > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> > @@ -652,7 +654,8 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID); > + (IB_PR_COMPMASK_DLID | IB_PR_COMPMASK_SLID | IB_PR_COMPMASK_NUMBPATH); > + path_rec.num_path = 0x7f; > sa_mad_data.p_attr = &path_rec; > path_rec.dlid = > ((osmv_lid_pair_t *) (p_query_req->p_query_input))-> > -- Shouldn't there be a similar patch for osm_vendor_mlx_sa.c ? I thought you had this originally. -- Hal > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From michael.heinz at qlogic.com Thu Dec 18 12:03:47 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu, 18 Dec 2008 14:03:47 -0600 Subject: [ofa-general] Patch for libvendor incompatibility with QLogic SM In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB3E7462465F@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB3E74624662@MNEXMB1.qlogic.org> > Shouldn't there be a similar patch for osm_vendor_mlx_sa.c ? I thought you had this originally. Yes, originally there was - but from our discussions over the past few days, I thought we were limiting the scope of the change. I've added it back to the patch, and I'm not clear when the mlx version is used in preference to the umad version. Note that this version no longer sets num_path for LID based queries - I noticed that those queries are IB_MAD_METHOD_GET, not IB_MAD_METHOD_GETTABLE, so they don't need the attribute. Signed-off-by: Michael Heinz --- osm_vendor_ibumad_sa.c.orig 2008-10-20 01:00:09.000000000 -0400 +++ osm_vendor_ibumad_sa.c 2008-12-18 14:50:49.000000000 -0500 @@ -615,7 +615,8 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); + path_rec.num_path = 0x7f; sa_mad_data.p_attr = &path_rec; ib_gid_set_default(&path_rec.dgid, ((osmv_guid_pair_t *) (p_query_req-> @@ -634,7 +635,8 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); + path_rec.num_path = 0x7f; sa_mad_data.p_attr = &path_rec; memcpy(&path_rec.dgid, &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> --- osm_vendor_mlx_sa.c.orig 2008-10-20 01:00:09.000000000 -0400 +++ osm_vendor_mlx_sa.c 2008-12-18 14:51:34.000000000 -0500 @@ -743,7 +743,8 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); + path_rec.num_path = 0x7f; sa_mad_data.p_attr = &path_rec; ib_gid_set_default(&path_rec.dgid, ((osmv_guid_pair_t *) (p_query_req-> @@ -763,7 +764,8 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); + path_rec.num_path = 0x7f; sa_mad_data.p_attr = &path_rec; memcpy(&path_rec.dgid, &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> -------------- next part -------------- A non-text attachment was scrubbed... Name: libvendor.patchfile Type: application/octet-stream Size: 1882 bytes Desc: libvendor.patchfile URL: From hal.rosenstock at gmail.com Thu Dec 18 12:15:46 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Dec 2008 15:15:46 -0500 Subject: [ofa-general] Patch for libvendor incompatibility with QLogic SM In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB3E74624662@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB3E7462465F@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624662@MNEXMB1.qlogic.org> Message-ID: On Thu, Dec 18, 2008 at 3:03 PM, Mike Heinz wrote: > >> Shouldn't there be a similar patch for osm_vendor_mlx_sa.c ? I thought you had this originally. > > Yes, originally there was - but from our discussions over the past few days, I thought we were limiting the scope of the change. I've added it back to the patch, and I'm not clear when the mlx version is used in preference to the umad version. It's not an OpenIB/OpenFabrics thing but it's carried forward in the OpenSM tree. >Note that this version no longer sets num_path for LID based queries - Thanks; I missed that as I didn't look at the line numbers... > I noticed that those queries are IB_MAD_METHOD_GET, not IB_MAD_METHOD_GETTABLE, so they don't need the attribute. Right and it wouldn't need num_paths either (as get assumes 1) so I don't think the changes for OSMV_QUERY_PATH_REC_BY_LIDS in both these patches are needed. -- Hal From michael.heinz at qlogic.com Thu Dec 18 12:22:52 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu, 18 Dec 2008 14:22:52 -0600 Subject: [ofa-general] Patch for libvendor incompatibility with QLogic SM In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB3E7462465F@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624662@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB3E74624663@MNEXMB1.qlogic.org> > Right and it wouldn't need num_paths either (as get assumes 1) so I don't think the changes for OSMV_QUERY_PATH_REC_BY_LIDS in both these patches are needed. Sorry if I was unclear, the last patch submission neither sets the num_path field nor the attribute mask for OSMV_QUERY_PATH_REC_BY_LIDS queries. From hal.rosenstock at gmail.com Thu Dec 18 12:31:50 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Dec 2008 15:31:50 -0500 Subject: ***SPAM*** Re: [ofa-general] Patch for libvendor incompatibility with QLogic SM In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB3E74624663@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB3E7462465F@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624662@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624663@MNEXMB1.qlogic.org> Message-ID: On Thu, Dec 18, 2008 at 3:22 PM, Mike Heinz wrote: > >> Right and it wouldn't need num_paths either (as get assumes 1) so I don't think the changes for OSMV_QUERY_PATH_REC_BY_LIDS in both these patches are needed. > > Sorry if I was unclear, the last patch submission neither sets the num_path field nor the attribute mask for OSMV_QUERY_PATH_REC_BY_LIDS queries. Right; I didn't see the updated patch was for both sa files. In the new patch, one case was missed in terms of the needed change though unless I missed that too... From michael.heinz at qlogic.com Thu Dec 18 12:49:17 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu, 18 Dec 2008 14:49:17 -0600 Subject: [ofa-general] Patch for libvendor incompatibility with QLogic SM In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB3E7462465F@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624662@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624663@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB3E74624665@MNEXMB1.qlogic.org> Hal, You've got me really confused now - there are only two cases that need changing, OSMV_QUERY_PATH_REC_BY_GIDS and OSMV_QUERY_PATH_REC_BY_PORT_GUIDS; OSMV_QUERY_PATH_REC_BY_LIDS does *not* need to be changed because it uses the GET method. Thus, this should be the correct patch. (I'm re-including it for clarity). Signed-off-by: Michael Heinz -------------------------------- --- osm_vendor_ibumad_sa.c.orig 2008-10-20 01:00:09.000000000 -0400 +++ osm_vendor_ibumad_sa.c 2008-12-18 14:50:49.000000000 -0500 @@ -615,7 +615,8 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); + path_rec.num_path = 0x7f; sa_mad_data.p_attr = &path_rec; ib_gid_set_default(&path_rec.dgid, ((osmv_guid_pair_t *) (p_query_req-> @@ -634,7 +635,8 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); + path_rec.num_path = 0x7f; sa_mad_data.p_attr = &path_rec; memcpy(&path_rec.dgid, &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> --- osm_vendor_mlx_sa.c.orig 2008-10-20 01:00:09.000000000 -0400 +++ osm_vendor_mlx_sa.c 2008-12-18 14:51:34.000000000 -0500 @@ -743,7 +743,8 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); + path_rec.num_path = 0x7f; sa_mad_data.p_attr = &path_rec; ib_gid_set_default(&path_rec.dgid, ((osmv_guid_pair_t *) (p_query_req-> @@ -763,7 +764,8 @@ sa_mad_data.attr_offset = ib_get_attr_offset(sizeof(ib_path_rec_t)); sa_mad_data.comp_mask = - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); + path_rec.num_path = 0x7f; sa_mad_data.p_attr = &path_rec; memcpy(&path_rec.dgid, &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] Sent: Thursday, December 18, 2008 3:32 PM To: Mike Heinz Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] Patch for libvendor incompatibility with QLogic SM On Thu, Dec 18, 2008 at 3:22 PM, Mike Heinz wrote: > >> Right and it wouldn't need num_paths either (as get assumes 1) so I don't think the changes for OSMV_QUERY_PATH_REC_BY_LIDS in both these patches are needed. > > Sorry if I was unclear, the last patch submission neither sets the num_path field nor the attribute mask for OSMV_QUERY_PATH_REC_BY_LIDS queries. Right; I didn't see the updated patch was for both sa files. In the new patch, one case was missed in terms of the needed change though unless I missed that too... From hal.rosenstock at gmail.com Thu Dec 18 13:02:09 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Thu, 18 Dec 2008 16:02:09 -0500 Subject: [ofa-general] Patch for libvendor incompatibility with QLogic SM In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB3E74624665@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB3E7462465F@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624662@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624663@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624665@MNEXMB1.qlogic.org> Message-ID: Mike, On Thu, Dec 18, 2008 at 3:49 PM, Mike Heinz wrote: > Hal, > > You've got me really confused now - there are only two cases that need changing, OSMV_QUERY_PATH_REC_BY_GIDS and OSMV_QUERY_PATH_REC_BY_PORT_GUIDS; OSMV_QUERY_PATH_REC_BY_LIDS does *not* need to be changed because it uses the GET method. Thus, this should be the correct patch. (I'm re-including it for clarity). The below looks right to me. The previous one with osm_vendor_mlx_sa.c was truncated somehow in my gmail and appeared to only have 1 of the 2 cases and I didn't look at the attachment. Sorry for the confusion. -- Hal > > Signed-off-by: Michael Heinz > -------------------------------- > --- osm_vendor_ibumad_sa.c.orig 2008-10-20 01:00:09.000000000 -0400 > +++ osm_vendor_ibumad_sa.c 2008-12-18 14:50:49.000000000 -0500 > @@ -615,7 +615,8 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); > + path_rec.num_path = 0x7f; > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) (p_query_req-> > @@ -634,7 +635,8 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); > + path_rec.num_path = 0x7f; > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> > --- osm_vendor_mlx_sa.c.orig 2008-10-20 01:00:09.000000000 -0400 > +++ osm_vendor_mlx_sa.c 2008-12-18 14:51:34.000000000 -0500 > @@ -743,7 +743,8 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); > + path_rec.num_path = 0x7f; > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) (p_query_req-> > @@ -763,7 +764,8 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); > + path_rec.num_path = 0x7f; > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation > King of Prussia, Pennsylvania > > -----Original Message----- > From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] > Sent: Thursday, December 18, 2008 3:32 PM > To: Mike Heinz > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] Patch for libvendor incompatibility with QLogic SM > > On Thu, Dec 18, 2008 at 3:22 PM, Mike Heinz wrote: >> >>> Right and it wouldn't need num_paths either (as get assumes 1) so I don't think the changes for OSMV_QUERY_PATH_REC_BY_LIDS in both these patches are needed. >> >> Sorry if I was unclear, the last patch submission neither sets the num_path field nor the attribute mask for OSMV_QUERY_PATH_REC_BY_LIDS queries. > > Right; I didn't see the updated patch was for both sa files. In the new patch, one case was missed in terms of the needed change though unless I missed that too... > From michael.heinz at qlogic.com Thu Dec 18 13:04:49 2008 From: michael.heinz at qlogic.com (Mike Heinz) Date: Thu, 18 Dec 2008 15:04:49 -0600 Subject: [ofa-general] Patch for libvendor incompatibility with QLogic SM In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB3E7462465F@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624662@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624663@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624665@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB3E74624666@MNEXMB1.qlogic.org> No problem. I figured it had to be something like that. -- Michael Heinz Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania -----Original Message----- From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] Sent: Thursday, December 18, 2008 4:02 PM To: Mike Heinz Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] Patch for libvendor incompatibility with QLogic SM Mike, On Thu, Dec 18, 2008 at 3:49 PM, Mike Heinz wrote: > Hal, > > You've got me really confused now - there are only two cases that need changing, OSMV_QUERY_PATH_REC_BY_GIDS and OSMV_QUERY_PATH_REC_BY_PORT_GUIDS; OSMV_QUERY_PATH_REC_BY_LIDS does *not* need to be changed because it uses the GET method. Thus, this should be the correct patch. (I'm re-including it for clarity). The below looks right to me. The previous one with osm_vendor_mlx_sa.c was truncated somehow in my gmail and appeared to only have 1 of the 2 cases and I didn't look at the attachment. Sorry for the confusion. -- Hal > > Signed-off-by: Michael Heinz > -------------------------------- > --- osm_vendor_ibumad_sa.c.orig 2008-10-20 01:00:09.000000000 -0400 > +++ osm_vendor_ibumad_sa.c 2008-12-18 14:50:49.000000000 -0500 > @@ -615,7 +615,8 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); > + path_rec.num_path = 0x7f; > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) (p_query_req-> > @@ -634,7 +635,8 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); > + path_rec.num_path = 0x7f; > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) (p_query_req->p_query_input))-> > --- osm_vendor_mlx_sa.c.orig 2008-10-20 01:00:09.000000000 -0400 > +++ osm_vendor_mlx_sa.c 2008-12-18 14:51:34.000000000 -0500 > @@ -743,7 +743,8 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); > + path_rec.num_path = 0x7f; > sa_mad_data.p_attr = &path_rec; > ib_gid_set_default(&path_rec.dgid, > ((osmv_guid_pair_t *) (p_query_req-> > @@ -763,7 +764,8 @@ > sa_mad_data.attr_offset = > ib_get_attr_offset(sizeof(ib_path_rec_t)); > sa_mad_data.comp_mask = > - (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID); > + (IB_PR_COMPMASK_DGID | IB_PR_COMPMASK_SGID | IB_PR_COMPMASK_NUMBPATH); > + path_rec.num_path = 0x7f; > sa_mad_data.p_attr = &path_rec; > memcpy(&path_rec.dgid, > &((osmv_gid_pair_t *) > (p_query_req->p_query_input))-> > > -- > Michael Heinz > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania > > -----Original Message----- > From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com] > Sent: Thursday, December 18, 2008 3:32 PM > To: Mike Heinz > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] Patch for libvendor incompatibility with > QLogic SM > > On Thu, Dec 18, 2008 at 3:22 PM, Mike Heinz wrote: >> >>> Right and it wouldn't need num_paths either (as get assumes 1) so I don't think the changes for OSMV_QUERY_PATH_REC_BY_LIDS in both these patches are needed. >> >> Sorry if I was unclear, the last patch submission neither sets the num_path field nor the attribute mask for OSMV_QUERY_PATH_REC_BY_LIDS queries. > > Right; I didn't see the updated patch was for both sa files. In the new patch, one case was missed in terms of the needed change though unless I missed that too... > From sashak at voltaire.com Thu Dec 18 13:32:34 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 18 Dec 2008 23:32:34 +0200 Subject: [ofa-general] Patch for libvendor incompatibility with QLogic SM In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB3E74624662@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB3E7462465F@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E74624662@MNEXMB1.qlogic.org> Message-ID: <20081218213234.GR18908@sashak.voltaire.com> On 14:03 Thu 18 Dec , Mike Heinz wrote: > > > Shouldn't there be a similar patch for osm_vendor_mlx_sa.c ? I thought you had this originally. > > Yes, originally there was - but from our discussions over the past few days, I thought we were limiting the scope of the change. I've added it back to the patch, and I'm not clear when the mlx version is used in preference to the umad version. Note that this version no longer sets num_path for LID based queries - I noticed that those queries are IB_MAD_METHOD_GET, not IB_MAD_METHOD_GETTABLE, so they don't need the attribute. > > Signed-off-by: Michael Heinz Applied. Thanks. Sasha From arlin.r.davis at intel.com Thu Dec 18 15:38:06 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 18 Dec 2008 15:38:06 -0800 Subject: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c Message-ID: Patch 2/6 - dump.c Signed-off by: Arlin Davis diff -aur libibmad-1.2.2/src/dump.c libibmad/src/dump.c --- libibmad-1.2.2/src/dump.c 2008-10-19 11:34:41.000000000 -0700 +++ libibmad/src/dump.c 2008-12-17 17:02:40.947163656 -0800 @@ -38,15 +38,51 @@ #include #include -#include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#include +#define snprintf _snprintf +#else +#include +#include #include #include +#endif #include -#include +#include + +MAD_EXPORT void +xdump(FILE *file, char *msg, void *p, int size) +{ +#define HEX(x) ((x) < 10 ? '0' + (x) : 'a' + ((x) -10)) + uint8_t *cp = p; + int i; + + if (msg) + fputs(msg, file); + + for (i = 0; i < size;) { + fputc(HEX(*cp >> 4), file); + fputc(HEX(*cp & 0xf), file); + if (++i >= size) + break; + fputc(HEX(cp[1] >> 4), file); + fputc(HEX(cp[1] & 0xf), file); + if ((++i) % 16) + fputc(' ', file); + else + fputc('\n', file); + cp += 2; + } + if (i % 16) { + fputc('\n', file); + } +} -void +MAD_EXPORT void mad_dump_int(char *buf, int bufsz, void *val, int valsz) { switch (valsz) { @@ -72,7 +108,7 @@ } } -void +MAD_EXPORT void mad_dump_uint(char *buf, int bufsz, void *val, int valsz) { switch (valsz) { @@ -98,7 +134,7 @@ } } -void +MAD_EXPORT void mad_dump_hex(char *buf, int bufsz, void *val, int valsz) { switch (valsz) { @@ -115,13 +151,13 @@ snprintf(buf, bufsz, "0x%08x", *(uint32_t *)val); break; case 5: - snprintf(buf, bufsz, "0x%010" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffllu); + snprintf(buf, bufsz, "0x%010" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffULL); break; case 6: - snprintf(buf, bufsz, "0x%012" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffllu); + snprintf(buf, bufsz, "0x%012" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffULL); break; case 7: - snprintf(buf, bufsz, "0x%014" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffffllu); + snprintf(buf, bufsz, "0x%014" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffffULL); break; case 8: snprintf(buf, bufsz, "0x%016" PRIx64, *(uint64_t *)val); @@ -132,7 +168,7 @@ } } -void +MAD_EXPORT void mad_dump_rhex(char *buf, int bufsz, void *val, int valsz) { switch (valsz) { @@ -149,13 +185,13 @@ snprintf(buf, bufsz, "%08x", *(uint32_t *)val); break; case 5: - snprintf(buf, bufsz, "%010" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffllu); + snprintf(buf, bufsz, "%010" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffULL); break; case 6: - snprintf(buf, bufsz, "%012" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffllu); + snprintf(buf, bufsz, "%012" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffULL); break; case 7: - snprintf(buf, bufsz, "%014" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffffllu); + snprintf(buf, bufsz, "%014" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffffULL); break; case 8: snprintf(buf, bufsz, "%016" PRIx64, *(uint64_t *)val); @@ -166,7 +202,7 @@ } } -void +MAD_EXPORT void mad_dump_linkwidth(char *buf, int bufsz, void *val, int valsz) { int width = *(int *)val; @@ -212,7 +248,7 @@ buf[n-4] = '\0'; } -void +MAD_EXPORT void mad_dump_linkwidthsup(char *buf, int bufsz, void *val, int valsz) { int width = *(int *)val; @@ -235,7 +271,7 @@ } } -void +MAD_EXPORT void mad_dump_linkwidthen(char *buf, int bufsz, void *val, int valsz) { int width = *(int *)val; @@ -243,7 +279,7 @@ dump_linkwidth(buf, bufsz, width); } -void +MAD_EXPORT void mad_dump_linkspeed(char *buf, int bufsz, void *val, int valsz) { int speed = *(int *)val; @@ -300,7 +336,7 @@ } } -void +MAD_EXPORT void mad_dump_linkspeedsup(char *buf, int bufsz, void *val, int valsz) { int speed = *(int *)val; @@ -308,7 +344,7 @@ dump_linkspeed(buf, bufsz, speed); } -void +MAD_EXPORT void mad_dump_linkspeeden(char *buf, int bufsz, void *val, int valsz) { int speed = *(int *)val; @@ -316,7 +352,7 @@ dump_linkspeed(buf, bufsz, speed); } -void +MAD_EXPORT void mad_dump_portstate(char *buf, int bufsz, void *val, int valsz) { int state = *(int *)val; @@ -342,7 +378,7 @@ } } -void +MAD_EXPORT void mad_dump_linkdowndefstate(char *buf, int bufsz, void *val, int valsz) { int state = *(int *)val; @@ -363,7 +399,7 @@ } } -void +MAD_EXPORT void mad_dump_physportstate(char *buf, int bufsz, void *val, int valsz) { int state = *(int *)val; @@ -398,7 +434,7 @@ } } -void +MAD_EXPORT void mad_dump_mtu(char *buf, int bufsz, void *val, int valsz) { int mtu = *(int *)val; @@ -425,7 +461,7 @@ } } -void +MAD_EXPORT void mad_dump_vlcap(char *buf, int bufsz, void *val, int valsz) { int vlcap = *(int *)val; @@ -451,7 +487,7 @@ } } -void +MAD_EXPORT void mad_dump_opervls(char *buf, int bufsz, void *val, int valsz) { int opervls = *(int *)val; @@ -480,7 +516,7 @@ } } -void +MAD_EXPORT void mad_dump_portcapmask(char *buf, int bufsz, void *val, int valsz) { unsigned mask = *(unsigned *)val; @@ -534,13 +570,13 @@ *(--s) = 0; } -void +MAD_EXPORT void mad_dump_bitfield(char *buf, int bufsz, void *val, int valsz) { snprintf(buf, bufsz, "0x%x", *(uint32_t *)val); } -void +MAD_EXPORT void mad_dump_array(char *buf, int bufsz, void *val, int valsz) { uint8_t *p = val, *e; @@ -553,7 +589,7 @@ sprintf(s, "%02x", *p); } -void +MAD_EXPORT void mad_dump_string(char *buf, int bufsz, void *val, int valsz) { if (bufsz < valsz) @@ -562,7 +598,7 @@ snprintf(buf, valsz, "'%s'", (char *)val); } -void +MAD_EXPORT void mad_dump_node_type(char *buf, int bufsz, void *val, int valsz) { int nodetype = *(int*)val; @@ -603,7 +639,7 @@ uint8_t res_vl; uint8_t weight; } vl_entry[IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]; -} __attribute__((packed)) ib_vl_arb_table_t; +} ib_vl_arb_table_t; static inline void ib_vl_arb_get_vl(uint8_t res_vl, uint8_t *const vl ) @@ -611,7 +647,7 @@ *vl = res_vl & 0x0F; } -void +MAD_EXPORT void mad_dump_sltovl(char *buf, int bufsz, void *val, int valsz) { ib_slvl_table_t* p_slvl_tbl = val; @@ -627,11 +663,11 @@ snprintf(buf + n, bufsz - n, "\n"); } -void +MAD_EXPORT void mad_dump_vlarbitration(char *buf, int bufsz, void *val, int num) { ib_vl_arb_table_t* p_vla_tbl = val; - unsigned i, n; + int i, n; uint8_t vl; num /= sizeof(p_vla_tbl->vl_entry[0]); @@ -678,10 +714,10 @@ bufsz -= n; } - return s - buf; + return (int)(s - buf); } -void +MAD_EXPORT void mad_dump_nodedesc(char *buf, int bufsz, void *val, int valsz) { strncpy(buf, val, bufsz); @@ -690,37 +726,37 @@ buf[valsz] = 0; } -void +MAD_EXPORT void mad_dump_nodeinfo(char *buf, int bufsz, void *val, int valsz) { _dump_fields(buf, bufsz, val, IB_NODE_FIRST_F, IB_NODE_LAST_F); } -void +MAD_EXPORT void mad_dump_portinfo(char *buf, int bufsz, void *val, int valsz) { _dump_fields(buf, bufsz, val, IB_PORT_FIRST_F, IB_PORT_LAST_F); } -void +MAD_EXPORT void mad_dump_portstates(char *buf, int bufsz, void *val, int valsz) { _dump_fields(buf, bufsz, val, IB_PORT_STATE_F, IB_PORT_LINK_DOWN_DEF_F); } -void +MAD_EXPORT void mad_dump_switchinfo(char *buf, int bufsz, void *val, int valsz) { _dump_fields(buf, bufsz, val, IB_SW_FIRST_F, IB_SW_LAST_F); } -void +MAD_EXPORT void mad_dump_perfcounters(char *buf, int bufsz, void *val, int valsz) { _dump_fields(buf, bufsz, val, IB_PC_FIRST_F, IB_PC_LAST_F); } -void +MAD_EXPORT void mad_dump_perfcounters_ext(char *buf, int bufsz, void *val, int valsz) { _dump_fields(buf, bufsz, val, IB_PC_EXT_FIRST_F, IB_PC_EXT_LAST_F); @@ -765,9 +801,13 @@ int _mad_dump(ib_mad_dump_fn *fn, char *name, void *val, int valsz) { - ib_field_t f = { .def_dump_fn = fn, .bitlen = valsz * 8}; + ib_field_t f; char buf[512]; + memset(&f, 0, sizeof(f)); + f.def_dump_fn = fn; + f.bitlen = valsz * 8; + return printf("%s\n", _mad_dump_field(&f, name, buf, sizeof buf, val)); } @@ -776,3 +816,4 @@ { return _mad_dump(f->def_dump_fn, name ? name : f->name, val, valsz ? valsz : ALIGN(f->bitlen, 8) / 8); } + From arlin.r.davis at intel.com Thu Dec 18 15:38:14 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 18 Dec 2008 15:38:14 -0800 Subject: [ofa-general] PATCH[3/6] Windows port of libibmad - fields.c Message-ID: Patch 3/6 - fields.c Signed-off by: Arlin Davis diff -aur libibmad-1.2.2/src/fields.c libibmad/src/fields.c --- libibmad-1.2.2/src/fields.c 2008-08-31 07:15:05.000000000 -0700 +++ libibmad/src/fields.c 2008-12-17 17:02:40.968160464 -0800 @@ -37,11 +37,16 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#include +#else #include #include +#endif #include -#include /* * BITSOFFS and BE_OFFS are required due the fact that the bit offsets are inconsistently @@ -55,10 +60,10 @@ #define BE_TO_BITSOFFS(o, w) (((o) & ~31) | ((32 - ((o) & 31) - (w)))) ib_field_t ib_mad_f [] = { - [0] {0, 0}, /* IB_NO_FIELD - reserved as invalid */ + {0, 0}, /* IB_NO_FIELD - reserved as invalid */ - [IB_GID_PREFIX_F] {0, 64, "GidPrefix", mad_dump_rhex}, - [IB_GID_GUID_F] {64, 64, "GidGuid", mad_dump_rhex}, + {0, 64, "GidPrefix", mad_dump_rhex}, + {64, 64, "GidGuid", mad_dump_rhex}, /* * MAD: common MAD fields (IB spec 13.4.2) @@ -68,298 +73,314 @@ */ /* first MAD word (0-3 bytes) */ - [IB_MAD_METHOD_F] {BE_OFFS(0, 7), "MadMethod", mad_dump_hex}, /* TODO: add dumper */ - [IB_MAD_RESPONSE_F] {BE_OFFS(7, 1), "MadIsResponse", mad_dump_uint}, /* TODO: add dumper */ - [IB_MAD_CLASSVER_F] {BE_OFFS(8, 8), "MadClassVersion", mad_dump_uint}, - [IB_MAD_MGMTCLASS_F] {BE_OFFS(16, 8), "MadMgmtClass", mad_dump_uint}, /* TODO: add dumper */ - [IB_MAD_BASEVER_F] {BE_OFFS(24, 8), "MadBaseVersion", mad_dump_uint}, + {BE_OFFS(0, 7), "MadMethod", mad_dump_hex}, /* TODO: add dumper */ + {BE_OFFS(7, 1), "MadIsResponse", mad_dump_uint}, /* TODO: add dumper */ + {BE_OFFS(8, 8), "MadClassVersion", mad_dump_uint}, + {BE_OFFS(16, 8), "MadMgmtClass", mad_dump_uint}, /* TODO: add dumper */ + {BE_OFFS(24, 8), "MadBaseVersion", mad_dump_uint}, /* second MAD word (4-7 bytes) */ - [IB_MAD_STATUS_F] {BE_OFFS(48, 16), "MadStatus", mad_dump_hex}, /* TODO: add dumper */ + {BE_OFFS(48, 16), "MadStatus", mad_dump_hex}, /* TODO: add dumper */ /* DR SMP only */ - [IB_DRSMP_HOPCNT_F] {BE_OFFS(32, 8), "DrSmpHopCnt", mad_dump_uint}, - [IB_DRSMP_HOPPTR_F] {BE_OFFS(40, 8), "DrSmpHopPtr", mad_dump_uint}, - [IB_DRSMP_STATUS_F] {BE_OFFS(48, 15), "DrSmpStatus", mad_dump_hex}, /* TODO: add dumper */ - [IB_DRSMP_DIRECTION_F] {BE_OFFS(63, 1), "DrSmpDirection", mad_dump_uint}, /* TODO: add dumper */ + {BE_OFFS(32, 8), "DrSmpHopCnt", mad_dump_uint}, + {BE_OFFS(40, 8), "DrSmpHopPtr", mad_dump_uint}, + {BE_OFFS(48, 15), "DrSmpStatus", mad_dump_hex}, /* TODO: add dumper */ + {BE_OFFS(63, 1), "DrSmpDirection", mad_dump_uint}, /* TODO: add dumper */ /* words 3,4,5,6 (8-23 bytes) */ - [IB_MAD_TRID_F] {64, 64, "MadTRID", mad_dump_hex}, - [IB_MAD_ATTRID_F] {BE_OFFS(144, 16), "MadAttr", mad_dump_hex}, /* TODO: add dumper */ - [IB_MAD_ATTRMOD_F] {160, 32, "MadModifier", mad_dump_hex}, /* TODO: add dumper */ + {64, 64, "MadTRID", mad_dump_hex}, + {BE_OFFS(144, 16), "MadAttr", mad_dump_hex}, /* TODO: add dumper */ + {160, 32, "MadModifier", mad_dump_hex}, /* TODO: add dumper */ /* word 7,8 (24-31 bytes) */ - [IB_MAD_MKEY_F] {196, 64, "MadMkey", mad_dump_hex}, + {196, 64, "MadMkey", mad_dump_hex}, /* word 9 (32-37 bytes) */ - [IB_DRSMP_DRDLID_F] {BE_OFFS(256, 16), "DrSmpDLID", mad_dump_hex}, - [IB_DRSMP_DRSLID_F] {BE_OFFS(272, 16), "DrSmpSLID", mad_dump_hex}, + {BE_OFFS(256, 16), "DrSmpDLID", mad_dump_hex}, + {BE_OFFS(272, 16), "DrSmpSLID", mad_dump_hex}, + + /* word 10,11 (36-43 bytes) */ + {0, 0}, /* IB_SA_MKEY_F - reserved as invalid */ /* word 12 (44-47 bytes) */ - [IB_SA_ATTROFFS_F] {BE_OFFS(46*8, 16), "SaAttrOffs", mad_dump_uint}, + {BE_OFFS(46*8, 16), "SaAttrOffs", mad_dump_uint}, /* word 13,14 (48-55 bytes) */ - [IB_SA_COMPMASK_F] {48*8, 64, "SaCompMask", mad_dump_hex}, + {48*8, 64, "SaCompMask", mad_dump_hex}, /* word 13,14 (56-255 bytes) */ - [IB_SA_DATA_F] {56*8, (256-56)*8, "SaData", mad_dump_hex}, - - [IB_DRSMP_PATH_F] {1024, 512, "DrSmpPath", mad_dump_hex}, - [IB_DRSMP_RPATH_F] {1536, 512, "DrSmpRetPath", mad_dump_hex}, - - [IB_GS_DATA_F] {64*8, (256-64) * 8, "GsData", mad_dump_hex}, + {56*8, (256-56)*8, "SaData", mad_dump_hex}, + /* bytes 64 - 127 */ + {0, 0}, /* IB_SM_DATA_F - reserved as invalid */ + + /* bytes 64 - 256 */ + {64*8, (256-64) * 8, "GsData", mad_dump_hex}, + + /* bytes 128 - 191 */ + {1024, 512, "DrSmpPath", mad_dump_hex}, + + /* bytes 192 - 255 */ + {1536, 512, "DrSmpRetPath", mad_dump_hex}, + /* * PortInfo fields: */ - [IB_PORT_MKEY_F] {0, 64, "Mkey", mad_dump_hex}, - [IB_PORT_GID_PREFIX_F] {64, 64, "GidPrefix", mad_dump_hex}, - [IB_PORT_LID_F] {BITSOFFS(128, 16), "Lid", mad_dump_hex}, - [IB_PORT_SMLID_F] {BITSOFFS(144, 16), "SMLid", mad_dump_hex}, - [IB_PORT_CAPMASK_F] {160, 32, "CapMask", mad_dump_portcapmask}, - [IB_PORT_DIAG_F] {BITSOFFS(192, 16), "DiagCode", mad_dump_hex}, - [IB_PORT_MKEY_LEASE_F] {BITSOFFS(208, 16), "MkeyLeasePeriod", mad_dump_uint}, - [IB_PORT_LOCAL_PORT_F] {BITSOFFS(224, 8), "LocalPort", mad_dump_uint}, - [IB_PORT_LINK_WIDTH_ENABLED_F] {BITSOFFS(232, 8), "LinkWidthEnabled", mad_dump_linkwidthen}, - [IB_PORT_LINK_WIDTH_SUPPORTED_F] {BITSOFFS(240, 8), "LinkWidthSupported", mad_dump_linkwidthsup}, - [IB_PORT_LINK_WIDTH_ACTIVE_F] {BITSOFFS(248, 8), "LinkWidthActive", mad_dump_linkwidth}, - [IB_PORT_LINK_SPEED_SUPPORTED_F] {BITSOFFS(256, 4), "LinkSpeedSupported", mad_dump_linkspeedsup}, - [IB_PORT_STATE_F] {BITSOFFS(260, 4), "LinkState", mad_dump_portstate}, - [IB_PORT_PHYS_STATE_F] {BITSOFFS(264, 4), "PhysLinkState", mad_dump_physportstate}, - [IB_PORT_LINK_DOWN_DEF_F] {BITSOFFS(268, 4), "LinkDownDefState", mad_dump_linkdowndefstate}, - [IB_PORT_MKEY_PROT_BITS_F] {BITSOFFS(272, 2), "ProtectBits", mad_dump_uint}, - [IB_PORT_LMC_F] {BITSOFFS(277, 3), "LMC", mad_dump_uint}, - [IB_PORT_LINK_SPEED_ACTIVE_F] {BITSOFFS(280, 4), "LinkSpeedActive", mad_dump_linkspeed}, - [IB_PORT_LINK_SPEED_ENABLED_F] {BITSOFFS(284, 4), "LinkSpeedEnabled", mad_dump_linkspeeden}, - [IB_PORT_NEIGHBOR_MTU_F] {BITSOFFS(288, 4), "NeighborMTU", mad_dump_mtu}, - [IB_PORT_SMSL_F] {BITSOFFS(292, 4), "SMSL", mad_dump_uint}, - [IB_PORT_VL_CAP_F] {BITSOFFS(296, 4), "VLCap", mad_dump_vlcap}, - [IB_PORT_INIT_TYPE_F] {BITSOFFS(300, 4), "InitType", mad_dump_hex}, - [IB_PORT_VL_HIGH_LIMIT_F] {BITSOFFS(304, 8), "VLHighLimit", mad_dump_uint}, - [IB_PORT_VL_ARBITRATION_HIGH_CAP_F] {BITSOFFS(312, 8), "VLArbHighCap", mad_dump_uint}, - [IB_PORT_VL_ARBITRATION_LOW_CAP_F] {BITSOFFS(320, 8), "VLArbLowCap", mad_dump_uint}, - - [IB_PORT_INIT_TYPE_REPLY_F] {BITSOFFS(328, 4), "InitReply", mad_dump_hex}, - [IB_PORT_MTU_CAP_F] {BITSOFFS(332, 4), "MtuCap", mad_dump_mtu}, - [IB_PORT_VL_STALL_COUNT_F] {BITSOFFS(336, 3), "VLStallCount", mad_dump_uint}, - [IB_PORT_HOQ_LIFE_F] {BITSOFFS(339, 5), "HoqLife", mad_dump_uint}, - [IB_PORT_OPER_VLS_F] {BITSOFFS(344, 4), "OperVLs", mad_dump_opervls}, - [IB_PORT_PART_EN_INB_F] {BITSOFFS(348, 1), "PartEnforceInb", mad_dump_uint}, - [IB_PORT_PART_EN_OUTB_F] {BITSOFFS(349, 1), "PartEnforceOutb", mad_dump_uint}, - [IB_PORT_FILTER_RAW_INB_F] {BITSOFFS(350, 1), "FilterRawInb", mad_dump_uint}, - [IB_PORT_FILTER_RAW_OUTB_F] {BITSOFFS(351, 1), "FilterRawOutb", mad_dump_uint}, - [IB_PORT_MKEY_VIOL_F] {BITSOFFS(352, 16), "MkeyViolations", mad_dump_uint}, - [IB_PORT_PKEY_VIOL_F] {BITSOFFS(368, 16), "PkeyViolations", mad_dump_uint}, - [IB_PORT_QKEY_VIOL_F] {BITSOFFS(384, 16), "QkeyViolations", mad_dump_uint}, - [IB_PORT_GUID_CAP_F] {BITSOFFS(400, 8), "GuidCap", mad_dump_uint}, - [IB_PORT_CLIENT_REREG_F] {BITSOFFS(408, 1), "ClientReregister", mad_dump_uint}, - [IB_PORT_SUBN_TIMEOUT_F] {BITSOFFS(411, 5), "SubnetTimeout", mad_dump_uint}, - [IB_PORT_RESP_TIME_VAL_F] {BITSOFFS(419, 5), "RespTimeVal", mad_dump_uint}, - [IB_PORT_LOCAL_PHYS_ERR_F] {BITSOFFS(424, 4), "LocalPhysErr", mad_dump_uint}, - [IB_PORT_OVERRUN_ERR_F] {BITSOFFS(428, 4), "OverrunErr", mad_dump_uint}, - [IB_PORT_MAX_CREDIT_HINT_F] {BITSOFFS(432, 16), "MaxCreditHint", mad_dump_uint}, - [IB_PORT_LINK_ROUND_TRIP_F] {BITSOFFS(456, 24), "RoundTrip", mad_dump_uint}, + {0, 64, "Mkey", mad_dump_hex}, + {64, 64, "GidPrefix", mad_dump_hex}, + {BITSOFFS(128, 16), "Lid", mad_dump_hex}, + {BITSOFFS(144, 16), "SMLid", mad_dump_hex}, + {160, 32, "CapMask", mad_dump_portcapmask}, + {BITSOFFS(192, 16), "DiagCode", mad_dump_hex}, + {BITSOFFS(208, 16), "MkeyLeasePeriod", mad_dump_uint}, + {BITSOFFS(224, 8), "LocalPort", mad_dump_uint}, + {BITSOFFS(232, 8), "LinkWidthEnabled", mad_dump_linkwidthen}, + {BITSOFFS(240, 8), "LinkWidthSupported", mad_dump_linkwidthsup}, + {BITSOFFS(248, 8), "LinkWidthActive", mad_dump_linkwidth}, + {BITSOFFS(256, 4), "LinkSpeedSupported", mad_dump_linkspeedsup}, + {BITSOFFS(260, 4), "LinkState", mad_dump_portstate}, + {BITSOFFS(264, 4), "PhysLinkState", mad_dump_physportstate}, + {BITSOFFS(268, 4), "LinkDownDefState", mad_dump_linkdowndefstate}, + {BITSOFFS(272, 2), "ProtectBits", mad_dump_uint}, + {BITSOFFS(277, 3), "LMC", mad_dump_uint}, + {BITSOFFS(280, 4), "LinkSpeedActive", mad_dump_linkspeed}, + {BITSOFFS(284, 4), "LinkSpeedEnabled", mad_dump_linkspeeden}, + {BITSOFFS(288, 4), "NeighborMTU", mad_dump_mtu}, + {BITSOFFS(292, 4), "SMSL", mad_dump_uint}, + {BITSOFFS(296, 4), "VLCap", mad_dump_vlcap}, + {BITSOFFS(300, 4), "InitType", mad_dump_hex}, + {BITSOFFS(304, 8), "VLHighLimit", mad_dump_uint}, + {BITSOFFS(312, 8), "VLArbHighCap", mad_dump_uint}, + {BITSOFFS(320, 8), "VLArbLowCap", mad_dump_uint}, + {BITSOFFS(328, 4), "InitReply", mad_dump_hex}, + {BITSOFFS(332, 4), "MtuCap", mad_dump_mtu}, + {BITSOFFS(336, 3), "VLStallCount", mad_dump_uint}, + {BITSOFFS(339, 5), "HoqLife", mad_dump_uint}, + {BITSOFFS(344, 4), "OperVLs", mad_dump_opervls}, + {BITSOFFS(348, 1), "PartEnforceInb", mad_dump_uint}, + {BITSOFFS(349, 1), "PartEnforceOutb", mad_dump_uint}, + {BITSOFFS(350, 1), "FilterRawInb", mad_dump_uint}, + {BITSOFFS(351, 1), "FilterRawOutb", mad_dump_uint}, + {BITSOFFS(352, 16), "MkeyViolations", mad_dump_uint}, + {BITSOFFS(368, 16), "PkeyViolations", mad_dump_uint}, + {BITSOFFS(384, 16), "QkeyViolations", mad_dump_uint}, + {BITSOFFS(400, 8), "GuidCap", mad_dump_uint}, + {BITSOFFS(408, 1), "ClientReregister", mad_dump_uint}, + {BITSOFFS(411, 5), "SubnetTimeout", mad_dump_uint}, + {BITSOFFS(419, 5), "RespTimeVal", mad_dump_uint}, + {BITSOFFS(424, 4), "LocalPhysErr", mad_dump_uint}, + {BITSOFFS(428, 4), "OverrunErr", mad_dump_uint}, + {BITSOFFS(432, 16), "MaxCreditHint", mad_dump_uint}, + {BITSOFFS(456, 24), "RoundTrip", mad_dump_uint}, + {0, 0}, /* IB_PORT_LAST_F */ /* * NodeInfo fields: */ - [IB_NODE_BASE_VERS_F] {BITSOFFS(0,8), "BaseVers", mad_dump_uint}, - [IB_NODE_CLASS_VERS_F] {BITSOFFS(8,8), "ClassVers", mad_dump_uint}, - [IB_NODE_TYPE_F] {BITSOFFS(16,8), "NodeType", mad_dump_node_type}, - [IB_NODE_NPORTS_F] {BITSOFFS(24,8), "NumPorts", mad_dump_uint}, - [IB_NODE_SYSTEM_GUID_F] {32, 64, "SystemGuid", mad_dump_hex}, - [IB_NODE_GUID_F] {96, 64, "Guid", mad_dump_hex}, - [IB_NODE_PORT_GUID_F] {160, 64, "PortGuid", mad_dump_hex}, - [IB_NODE_PARTITION_CAP_F] {BITSOFFS(224,16), "PartCap", mad_dump_uint}, - [IB_NODE_DEVID_F] {BITSOFFS(240,16), "DevId", mad_dump_hex}, - [IB_NODE_REVISION_F] {256, 32, "Revision", mad_dump_hex}, - [IB_NODE_LOCAL_PORT_F] {BITSOFFS(288,8), "LocalPort", mad_dump_uint}, - [IB_NODE_VENDORID_F] {BITSOFFS(296,24), "VendorId", mad_dump_hex}, + {BITSOFFS(0,8), "BaseVers", mad_dump_uint}, + {BITSOFFS(8,8), "ClassVers", mad_dump_uint}, + {BITSOFFS(16,8), "NodeType", mad_dump_node_type}, + {BITSOFFS(24,8), "NumPorts", mad_dump_uint}, + {32, 64, "SystemGuid", mad_dump_hex}, + {96, 64, "Guid", mad_dump_hex}, + {160, 64, "PortGuid", mad_dump_hex}, + {BITSOFFS(224,16), "PartCap", mad_dump_uint}, + {BITSOFFS(240,16), "DevId", mad_dump_hex}, + {256, 32, "Revision", mad_dump_hex}, + {BITSOFFS(288,8), "LocalPort", mad_dump_uint}, + {BITSOFFS(296,24), "VendorId", mad_dump_hex}, + {0, 0}, /* IB_NODE_LAST_F */ + /* * SwitchInfo fields: */ - [IB_SW_LINEAR_FDB_CAP_F] {BITSOFFS(0, 16), "LinearFdbCap", mad_dump_uint}, - [IB_SW_RANDOM_FDB_CAP_F] {BITSOFFS(16, 16), "RandomFdbCap", mad_dump_uint}, - [IB_SW_MCAST_FDB_CAP_F] {BITSOFFS(32, 16), "McastFdbCap", mad_dump_uint}, - [IB_SW_LINEAR_FDB_TOP_F] {BITSOFFS(48, 16), "LinearFdbTop", mad_dump_uint}, - [IB_SW_DEF_PORT_F] {BITSOFFS(64, 8), "DefPort", mad_dump_uint}, - [IB_SW_DEF_MCAST_PRIM_F] {BITSOFFS(72, 8), "DefMcastPrimPort", mad_dump_uint}, - [IB_SW_DEF_MCAST_NOT_PRIM_F] {BITSOFFS(80, 8), "DefMcastNotPrimPort", mad_dump_uint}, - [IB_SW_LIFE_TIME_F] {BITSOFFS(88, 5), "LifeTime", mad_dump_uint}, - [IB_SW_STATE_CHANGE_F] {BITSOFFS(93, 1), "StateChange", mad_dump_uint}, - [IB_SW_LIDS_PER_PORT_F] {BITSOFFS(96,16), "LidsPerPort", mad_dump_uint}, - [IB_SW_PARTITION_ENFORCE_CAP_F] {BITSOFFS(112, 16), "PartEnforceCap", mad_dump_uint}, - [IB_SW_PARTITION_ENF_INB_F] {BITSOFFS(128, 1), "InboundPartEnf", mad_dump_uint}, - [IB_SW_PARTITION_ENF_OUTB_F] {BITSOFFS(129, 1), "OutboundPartEnf", mad_dump_uint}, - [IB_SW_FILTER_RAW_INB_F] {BITSOFFS(130, 1), "FilterRawInbound", mad_dump_uint}, - [IB_SW_FILTER_RAW_OUTB_F] {BITSOFFS(131, 1), "FilterRawOutbound", mad_dump_uint}, - [IB_SW_ENHANCED_PORT0_F] {BITSOFFS(132, 1), "EnhancedPort0", mad_dump_uint}, + {BITSOFFS(0, 16), "LinearFdbCap", mad_dump_uint}, + {BITSOFFS(16, 16), "RandomFdbCap", mad_dump_uint}, + {BITSOFFS(32, 16), "McastFdbCap", mad_dump_uint}, + {BITSOFFS(48, 16), "LinearFdbTop", mad_dump_uint}, + {BITSOFFS(64, 8), "DefPort", mad_dump_uint}, + {BITSOFFS(72, 8), "DefMcastPrimPort", mad_dump_uint}, + {BITSOFFS(80, 8), "DefMcastNotPrimPort", mad_dump_uint}, + {BITSOFFS(88, 5), "LifeTime", mad_dump_uint}, + {BITSOFFS(93, 1), "StateChange", mad_dump_uint}, + {BITSOFFS(96,16), "LidsPerPort", mad_dump_uint}, + {BITSOFFS(112, 16), "PartEnforceCap", mad_dump_uint}, + {BITSOFFS(128, 1), "InboundPartEnf", mad_dump_uint}, + {BITSOFFS(129, 1), "OutboundPartEnf", mad_dump_uint}, + {BITSOFFS(130, 1), "FilterRawInbound", mad_dump_uint}, + {BITSOFFS(131, 1), "FilterRawOutbound", mad_dump_uint}, + {BITSOFFS(132, 1), "EnhancedPort0", mad_dump_uint}, + {0, 0}, /* IB_SW_LAST_F */ /* * SwitchLinearForwardingTable fields: */ - [IB_LINEAR_FORW_TBL_F] {0, 512, "LinearForwTbl", mad_dump_array}, + {0, 512, "LinearForwTbl", mad_dump_array}, /* * SwitchMulticastForwardingTable fields: */ - [IB_MULTICAST_FORW_TBL_F] {0, 512, "MulticastForwTbl", mad_dump_array}, + {0, 512, "MulticastForwTbl", mad_dump_array}, /* - * Notice/Trap fields + * NodeDescription fields: */ - [IB_NOTICE_IS_GENERIC_F] {BITSOFFS(0, 1), "NoticeIsGeneric", mad_dump_uint}, - [IB_NOTICE_TYPE_F] {BITSOFFS(1, 7), "NoticeType", mad_dump_uint}, - [IB_NOTICE_PRODUCER_F] {BITSOFFS(8, 24), "NoticeProducerType", mad_dump_node_type}, - [IB_NOTICE_TRAP_NUMBER_F] {BITSOFFS(32, 16), "NoticeTrapNumber", mad_dump_uint}, - [IB_NOTICE_ISSUER_LID_F] {BITSOFFS(48, 16), "NoticeIssuerLID", mad_dump_uint}, - [IB_NOTICE_TOGGLE_F] {BITSOFFS(64, 1), "NoticeToggle", mad_dump_uint}, - [IB_NOTICE_COUNT_F] {BITSOFFS(65, 15), "NoticeCount", mad_dump_uint}, - [IB_NOTICE_DATA_DETAILS_F] {80, 432, "NoticeDataDetails", mad_dump_array}, - [IB_NOTICE_DATA_LID_F] {BITSOFFS(80, 16), "NoticeDataLID", mad_dump_uint}, - [IB_NOTICE_DATA_144_LID_F] {BITSOFFS(96, 16), "NoticeDataTrap144LID", mad_dump_uint}, - [IB_NOTICE_DATA_144_CAPMASK_F] {BITSOFFS(128, 32), "NoticeDataTrap144CapMask", mad_dump_uint}, + {0, 64*8, "NodeDesc", mad_dump_string}, /* - * NodeDescription fields: + * Notice/Trap fields */ - [IB_NODE_DESC_F] {0, 64*8, "NodeDesc", mad_dump_string}, + {BITSOFFS(0, 1), "NoticeIsGeneric", mad_dump_uint}, + {BITSOFFS(1, 7), "NoticeType", mad_dump_uint}, + {BITSOFFS(8, 24), "NoticeProducerType", mad_dump_node_type}, + {BITSOFFS(32, 16), "NoticeTrapNumber", mad_dump_uint}, + {BITSOFFS(48, 16), "NoticeIssuerLID", mad_dump_uint}, + {BITSOFFS(64, 1), "NoticeToggle", mad_dump_uint}, + {BITSOFFS(65, 15), "NoticeCount", mad_dump_uint}, + {80, 432, "NoticeDataDetails", mad_dump_array}, + {BITSOFFS(80, 16), "NoticeDataLID", mad_dump_uint}, + {BITSOFFS(96, 16), "NoticeDataTrap144LID", mad_dump_uint}, + {BITSOFFS(128, 32), "NoticeDataTrap144CapMask", mad_dump_uint}, /* * Port counters */ - [IB_PC_PORT_SELECT_F] {BITSOFFS(8, 8), "PortSelect", mad_dump_uint}, - [IB_PC_COUNTER_SELECT_F] {BITSOFFS(16, 16), "CounterSelect", mad_dump_hex}, - [IB_PC_ERR_SYM_F] {BITSOFFS(32, 16), "SymbolErrors", mad_dump_uint}, - [IB_PC_LINK_RECOVERS_F] {BITSOFFS(48, 8), "LinkRecovers", mad_dump_uint}, - [IB_PC_LINK_DOWNED_F] {BITSOFFS(56, 8), "LinkDowned", mad_dump_uint}, - [IB_PC_ERR_RCV_F] {BITSOFFS(64, 16), "RcvErrors", mad_dump_uint}, - [IB_PC_ERR_PHYSRCV_F] {BITSOFFS(80, 16), "RcvRemotePhysErrors", mad_dump_uint}, - [IB_PC_ERR_SWITCH_REL_F] {BITSOFFS(96, 16), "RcvSwRelayErrors", mad_dump_uint}, - [IB_PC_XMT_DISCARDS_F] {BITSOFFS(112, 16), "XmtDiscards", mad_dump_uint}, - [IB_PC_ERR_XMTCONSTR_F] {BITSOFFS(128, 8), "XmtConstraintErrors", mad_dump_uint}, - [IB_PC_ERR_RCVCONSTR_F] {BITSOFFS(136, 8), "RcvConstraintErrors", mad_dump_uint}, - [IB_PC_ERR_LOCALINTEG_F] {BITSOFFS(152, 4), "LinkIntegrityErrors", mad_dump_uint}, - [IB_PC_ERR_EXCESS_OVR_F] {BITSOFFS(156, 4), "ExcBufOverrunErrors", mad_dump_uint}, - [IB_PC_VL15_DROPPED_F] {BITSOFFS(176, 16), "VL15Dropped", mad_dump_uint}, - [IB_PC_XMT_BYTES_F] {192, 32, "XmtData", mad_dump_uint}, - [IB_PC_RCV_BYTES_F] {224, 32, "RcvData", mad_dump_uint}, - [IB_PC_XMT_PKTS_F] {256, 32, "XmtPkts", mad_dump_uint}, - [IB_PC_RCV_PKTS_F] {288, 32, "RcvPkts", mad_dump_uint}, + {BITSOFFS(8, 8), "PortSelect", mad_dump_uint}, + {BITSOFFS(16, 16), "CounterSelect", mad_dump_hex}, + {BITSOFFS(32, 16), "SymbolErrors", mad_dump_uint}, + {BITSOFFS(48, 8), "LinkRecovers", mad_dump_uint}, + {BITSOFFS(56, 8), "LinkDowned", mad_dump_uint}, + {BITSOFFS(64, 16), "RcvErrors", mad_dump_uint}, + {BITSOFFS(80, 16), "RcvRemotePhysErrors", mad_dump_uint}, + {BITSOFFS(96, 16), "RcvSwRelayErrors", mad_dump_uint}, + {BITSOFFS(112, 16), "XmtDiscards", mad_dump_uint}, + {BITSOFFS(128, 8), "XmtConstraintErrors", mad_dump_uint}, + {BITSOFFS(136, 8), "RcvConstraintErrors", mad_dump_uint}, + {BITSOFFS(152, 4), "LinkIntegrityErrors", mad_dump_uint}, + {BITSOFFS(156, 4), "ExcBufOverrunErrors", mad_dump_uint}, + {BITSOFFS(176, 16), "VL15Dropped", mad_dump_uint}, + {192, 32, "XmtData", mad_dump_uint}, + {224, 32, "RcvData", mad_dump_uint}, + {256, 32, "XmtPkts", mad_dump_uint}, + {288, 32, "RcvPkts", mad_dump_uint}, + {0, 0}, /* IB_PC_LAST_F */ /* * SMInfo */ - [IB_SMINFO_GUID_F] {0, 64, "SmInfoGuid", mad_dump_hex}, - [IB_SMINFO_KEY_F] {64, 64, "SmInfoKey", mad_dump_hex}, - [IB_SMINFO_ACT_F] {128, 32, "SmActivity", mad_dump_uint}, - [IB_SMINFO_PRIO_F] {BITSOFFS(160, 4), "SmPriority", mad_dump_uint}, - [IB_SMINFO_STATE_F] {BITSOFFS(164, 4), "SmState", mad_dump_uint}, + {0, 64, "SmInfoGuid", mad_dump_hex}, + {64, 64, "SmInfoKey", mad_dump_hex}, + {128, 32, "SmActivity", mad_dump_uint}, + {BITSOFFS(160, 4), "SmPriority", mad_dump_uint}, + {BITSOFFS(164, 4), "SmState", mad_dump_uint}, /* * SA RMPP */ - [IB_SA_RMPP_VERS_F] {BE_OFFS(24*8+24, 8), "RmppVers", mad_dump_uint}, - [IB_SA_RMPP_TYPE_F] {BE_OFFS(24*8+16, 8), "RmppType", mad_dump_uint}, - [IB_SA_RMPP_RESP_F] {BE_OFFS(24*8+11, 5), "RmppResp", mad_dump_uint}, - [IB_SA_RMPP_FLAGS_F] {BE_OFFS(24*8+8, 3), "RmppFlags", mad_dump_hex}, - [IB_SA_RMPP_STATUS_F] {BE_OFFS(24*8+0, 8), "RmppStatus", mad_dump_hex}, + {BE_OFFS(24*8+24, 8), "RmppVers", mad_dump_uint}, + {BE_OFFS(24*8+16, 8), "RmppType", mad_dump_uint}, + {BE_OFFS(24*8+11, 5), "RmppResp", mad_dump_uint}, + {BE_OFFS(24*8+8, 3), "RmppFlags", mad_dump_hex}, + {BE_OFFS(24*8+0, 8), "RmppStatus", mad_dump_hex}, /* data1 */ - [IB_SA_RMPP_D1_F] {28*8, 32, "RmppData1", mad_dump_hex}, - [IB_SA_RMPP_SEGNUM_F] {28*8, 32, "RmppSegNum", mad_dump_uint}, + {28*8, 32, "RmppData1", mad_dump_hex}, + {28*8, 32, "RmppSegNum", mad_dump_uint}, /* data2 */ - [IB_SA_RMPP_D2_F] {32*8, 32, "RmppData2", mad_dump_hex}, - [IB_SA_RMPP_LEN_F] {32*8, 32, "RmppPayload", mad_dump_uint}, - [IB_SA_RMPP_NEWWIN_F] {32*8, 32, "RmppNewWin", mad_dump_uint}, - + {32*8, 32, "RmppData2", mad_dump_hex}, + {32*8, 32, "RmppPayload", mad_dump_uint}, + {32*8, 32, "RmppNewWin", mad_dump_uint}, + /* - * SA Path rec + * SA Get Multi Path */ - [IB_SA_PR_DGID_F] {64, 128, "PathRecDGid", mad_dump_array}, - [IB_SA_PR_SGID_F] {192, 128, "PathRecSGid", mad_dump_array}, - [IB_SA_PR_DLID_F] {BITSOFFS(320,16), "PathRecDLid", mad_dump_hex}, - [IB_SA_PR_SLID_F] {BITSOFFS(336,16), "PathRecSLid", mad_dump_hex}, - [IB_SA_PR_NPATH_F] {BITSOFFS(393,7), "PathRecNumPath", mad_dump_uint}, + {BITSOFFS(41,7), "MultiPathNumPath", mad_dump_uint}, + {BITSOFFS(120,8), "MultiPathNumSrc", mad_dump_uint}, + {BITSOFFS(128,8), "MultiPathNumDest", mad_dump_uint}, + {192, 128, "MultiPathGid", mad_dump_array}, /* - * SA Get Multi Path + * SA Path rec */ - [IB_SA_MP_NPATH_F] {BITSOFFS(41,7), "MultiPathNumPath", mad_dump_uint}, - [IB_SA_MP_NSRC_F] {BITSOFFS(120,8), "MultiPathNumSrc", mad_dump_uint}, - [IB_SA_MP_NDEST_F] {BITSOFFS(128,8), "MultiPathNumDest", mad_dump_uint}, - [IB_SA_MP_GID0_F] {192, 128, "MultiPathGid", mad_dump_array}, + {64, 128, "PathRecDGid", mad_dump_array}, + {192, 128, "PathRecSGid", mad_dump_array}, + {BITSOFFS(320,16), "PathRecDLid", mad_dump_hex}, + {BITSOFFS(336,16), "PathRecSLid", mad_dump_hex}, + {BITSOFFS(393,7), "PathRecNumPath", mad_dump_uint}, /* * MC Member rec */ - [IB_SA_MCM_MGID_F] {0, 128, "McastMemMGid", mad_dump_array}, - [IB_SA_MCM_PORTGID_F] {128, 128, "McastMemPortGid", mad_dump_array}, - [IB_SA_MCM_QKEY_F] {256, 32, "McastMemQkey", mad_dump_hex}, - [IB_SA_MCM_MLID_F] {BITSOFFS(288, 16), "McastMemMLid", mad_dump_hex}, - [IB_SA_MCM_MTU_F] {BITSOFFS(306, 6), "McastMemMTU", mad_dump_uint}, - [IB_SA_MCM_TCLASS_F] {BITSOFFS(312, 8), "McastMemTClass", mad_dump_uint}, - [IB_SA_MCM_PKEY_F] {BITSOFFS(320, 16), "McastMemPkey", mad_dump_uint}, - [IB_SA_MCM_RATE_F] {BITSOFFS(338, 6), "McastMemRate", mad_dump_uint}, - [IB_SA_MCM_SL_F] {BITSOFFS(352, 4), "McastMemSL", mad_dump_uint}, - [IB_SA_MCM_FLOW_LABEL_F] {BITSOFFS(356, 20), "McastMemFlowLbl", mad_dump_uint}, - [IB_SA_MCM_JOIN_STATE_F] {BITSOFFS(388, 4), "McastMemJoinState", mad_dump_uint}, - [IB_SA_MCM_PROXY_JOIN_F] {BITSOFFS(392, 1), "McastMemProxyJoin", mad_dump_uint}, + {0, 128, "McastMemMGid", mad_dump_array}, + {128, 128, "McastMemPortGid", mad_dump_array}, + {256, 32, "McastMemQkey", mad_dump_hex}, + {BITSOFFS(288, 16), "McastMemMLid", mad_dump_hex}, + {BITSOFFS(352, 4), "McastMemSL", mad_dump_uint}, + {BITSOFFS(306, 6), "McastMemMTU", mad_dump_uint}, + {BITSOFFS(338, 6), "McastMemRate", mad_dump_uint}, + {BITSOFFS(312, 8), "McastMemTClass", mad_dump_uint}, + {BITSOFFS(320, 16), "McastMemPkey", mad_dump_uint}, + {BITSOFFS(356, 20), "McastMemFlowLbl", mad_dump_uint}, + {BITSOFFS(388, 4), "McastMemJoinState", mad_dump_uint}, + {BITSOFFS(392, 1), "McastMemProxyJoin", mad_dump_uint}, /* * Service record */ - [IB_SA_SR_ID_F] {0, 64, "ServRecID", mad_dump_hex}, - [IB_SA_SR_GID_F] {64, 128, "ServRecGid", mad_dump_array}, - [IB_SA_SR_PKEY_F] {BITSOFFS(192, 16), "ServRecPkey", mad_dump_hex}, - [IB_SA_SR_LEASE_F] {224, 32, "ServRecLease", mad_dump_hex}, - [IB_SA_SR_KEY_F] {256, 128, "ServRecKey", mad_dump_hex}, - [IB_SA_SR_NAME_F] {384, 512, "ServRecName", mad_dump_string}, - [IB_SA_SR_DATA_F] {896, 512, "ServRecData", mad_dump_array}, /* ATS for example */ + {0, 64, "ServRecID", mad_dump_hex}, + {64, 128, "ServRecGid", mad_dump_array}, + {BITSOFFS(192, 16), "ServRecPkey", mad_dump_hex}, + {224, 32, "ServRecLease", mad_dump_hex}, + {256, 128, "ServRecKey", mad_dump_hex}, + {384, 512, "ServRecName", mad_dump_string}, + {896, 512, "ServRecData", mad_dump_array}, /* ATS for example */ /* * ATS SM record - within SA_SR_DATA */ - [IB_ATS_SM_NODE_ADDR_F] {12*8, 32, "ATSNodeAddr", mad_dump_hex}, - [IB_ATS_SM_MAGIC_KEY_F] {BITSOFFS(16*8, 16), "ATSMagicKey", mad_dump_hex}, - [IB_ATS_SM_NODE_TYPE_F] {BITSOFFS(18*8, 16), "ATSNodeType", mad_dump_hex}, - [IB_ATS_SM_NODE_NAME_F] {32*8, 32*8, "ATSNodeName", mad_dump_string}, + {12*8, 32, "ATSNodeAddr", mad_dump_hex}, + {BITSOFFS(16*8, 16), "ATSMagicKey", mad_dump_hex}, + {BITSOFFS(18*8, 16), "ATSNodeType", mad_dump_hex}, + {32*8, 32*8, "ATSNodeName", mad_dump_string}, /* * SLTOVL MAPPING TABLE */ - [IB_SLTOVL_MAPPING_TABLE_F] {0, 64, "SLToVLMap", mad_dump_hex}, + {0, 64, "SLToVLMap", mad_dump_hex}, /* * VL ARBITRATION TABLE */ - [IB_VL_ARBITRATION_TABLE_F] {0, 512, "VLArbTbl", mad_dump_array}, + {0, 512, "VLArbTbl", mad_dump_array}, /* * IB vendor classes range 2 */ - [IB_VEND2_OUI_F] {BE_OFFS(36*8, 24), "OUI", mad_dump_array}, - [IB_VEND2_DATA_F] {40*8, (256-40)*8, "Vendor2Data", mad_dump_array}, + {BE_OFFS(36*8, 24), "OUI", mad_dump_array}, + {40*8, (256-40)*8, "Vendor2Data", mad_dump_array}, /* * Extended port counters */ - [IB_PC_EXT_PORT_SELECT_F] {BITSOFFS(8, 8), "PortSelect", mad_dump_uint}, - [IB_PC_EXT_COUNTER_SELECT_F] {BITSOFFS(16, 16), "CounterSelect", mad_dump_hex}, - [IB_PC_EXT_XMT_BYTES_F] {64, 64, "PortXmitData", mad_dump_uint}, - [IB_PC_EXT_RCV_BYTES_F] {128, 64, "PortRcvData", mad_dump_uint}, - [IB_PC_EXT_XMT_PKTS_F] {192, 64, "PortXmitPkts", mad_dump_uint}, - [IB_PC_EXT_RCV_PKTS_F] {256, 64, "PortRcvPkts", mad_dump_uint}, - [IB_PC_EXT_XMT_UPKTS_F] {320, 64, "PortUnicastXmitPkts", mad_dump_uint}, - [IB_PC_EXT_RCV_UPKTS_F] {384, 64, "PortUnicastRcvPkts", mad_dump_uint}, - [IB_PC_EXT_XMT_MPKTS_F] {448, 64, "PortMulticastXmitPkts", mad_dump_uint}, - [IB_PC_EXT_RCV_MPKTS_F] {512, 64, "PortMulticastRcvPkts", mad_dump_uint}, + {BITSOFFS(8, 8), "PortSelect", mad_dump_uint}, + {BITSOFFS(16, 16), "CounterSelect", mad_dump_hex}, + {64, 64, "PortXmitData", mad_dump_uint}, + {128, 64, "PortRcvData", mad_dump_uint}, + {192, 64, "PortXmitPkts", mad_dump_uint}, + {256, 64, "PortRcvPkts", mad_dump_uint}, + {320, 64, "PortUnicastXmitPkts", mad_dump_uint}, + {384, 64, "PortUnicastRcvPkts", mad_dump_uint}, + {448, 64, "PortMulticastXmitPkts", mad_dump_uint}, + {512, 64, "PortMulticastRcvPkts", mad_dump_uint}, + {0, 0}, /* IB_PC_EXT_LAST_F */ /* * GUIDInfo fields */ - [IB_GUID_GUID0_F] {0, 64, "GUID0", mad_dump_hex}, + {0, 64, "GUID0", mad_dump_hex}, + {0, 0} /* IB_FIELD_LAST_ */ }; @@ -368,7 +389,7 @@ { uint64_t nval; - nval = htonll(val); + nval = cl_hton64(val); memcpy((char *)buf + base_offs + f->bitoffs / 8, &nval, sizeof(uint64_t)); } @@ -377,7 +398,7 @@ { uint64_t val; memcpy(&val, ((char *)buf + base_offs + f->bitoffs / 8), sizeof(uint64_t)); - return ntohll(val); + return cl_ntoh64(val); } void From arlin.r.davis at intel.com Thu Dec 18 15:37:57 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 18 Dec 2008 15:37:57 -0800 Subject: [ofa-general] PATCH[1/6] Windows port of libibmad - mad.h Message-ID: Port of libibmad to windows. Dependencies on libibumad port and complib (in lieu of libcommon). Removed dependency on libibcommon. Intent is to allow common mad code base for Windows and Linux to simplify maintainablity across OFED and WinOF. This patch set was built and tested on Windows and built on Linux (not tested yet). Patches separated as follow: 1/6 - mad.h 2/6 - dump.c 3/6 - fields.c 4/6 - gs.c, mad.c, portid.c, register.c, resolve.c 5/6 - rpc.c, sa.c, serv.c, smp.c, vendor.c 6/6 - new files for windows: dirs, src/Sources, src/ibmad_export.def, src/ibmad_exports.src, ibmad_main.cpp Signed-off by: Arlin Davis diff -aur libibmad-1.2.2/include/infiniband/mad.h libibmad/include/infiniband/mad.h --- libibmad-1.2.2/include/infiniband/mad.h 2008-08-31 07:15:05.000000000 -0700 +++ libibmad/include/infiniband/mad.h 2008-12-17 17:02:54.873046600 -0800 @@ -33,8 +33,10 @@ #ifndef _MAD_H_ #define _MAD_H_ -#include -#include +/* use complib for portability */ +#include +#include +#include #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { @@ -46,8 +48,14 @@ BEGIN_C_DECLS +#if defined(_WIN32) || defined(_WIN64) +#define MAD_EXPORT __declspec(dllexport) +#else +#define MAD_EXPORT extern +#endif + #define IB_SUBNET_PATH_HOPS_MAX 64 -#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000llu +#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000ULL #define IB_DEFAULT_QP1_QKEY 0x80010000 #define IB_MAD_SIZE 256 @@ -620,10 +628,10 @@ /******************************************************************************/ /* portid.c */ -char * portid2str(ib_portid_t *portid); -int portid2portnum(ib_portid_t *portid); -int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int drdlid); -char * drpath2str(ib_dr_path_t *path, char *dstr, size_t dstr_size); +MAD_EXPORT char * portid2str(ib_portid_t *portid); +MAD_EXPORT int portid2portnum(ib_portid_t *portid); +MAD_EXPORT int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int drdlid); +MAD_EXPORT char * drpath2str(ib_dr_path_t *path, char *dstr, size_t dstr_size); static inline int ib_portid_set(ib_portid_t *portid, int lid, int qp, int qkey) @@ -639,77 +647,49 @@ /* fields.c */ extern ib_field_t ib_mad_f[]; -void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); +void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); uint32_t _get_field(void *buf, int base_offs, ib_field_t *f); -void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); -void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); -void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); +void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); +void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); +void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); uint64_t _get_field64(void *buf, int base_offs, ib_field_t *f); /* mad.c */ -static inline uint32_t -mad_get_field(void *buf, int base_offs, int field) -{ - return _get_field(buf, base_offs, ib_mad_f + field); -} - -static inline void -mad_set_field(void *buf, int base_offs, int field, uint32_t val) -{ - _set_field(buf, base_offs, ib_mad_f + field, val); -} - +MAD_EXPORT uint32_t mad_get_field(void *buf, int base_offs, int field); +MAD_EXPORT void mad_set_field(void *buf, int base_offs, int field, uint32_t val); /* field must be byte aligned */ -static inline uint64_t -mad_get_field64(void *buf, int base_offs, int field) -{ - return _get_field64(buf, base_offs, ib_mad_f + field); -} - -static inline void -mad_set_field64(void *buf, int base_offs, int field, uint64_t val) -{ - _set_field64(buf, base_offs, ib_mad_f + field, val); -} - -static inline void -mad_set_array(void *buf, int base_offs, int field, void *val) -{ - _set_array(buf, base_offs, ib_mad_f + field, val); -} - -static inline void -mad_get_array(void *buf, int base_offs, int field, void *val) -{ - _get_array(buf, base_offs, ib_mad_f + field, val); -} - -void mad_decode_field(uint8_t *buf, int field, void *val); -void mad_encode_field(uint8_t *buf, int field, void *val); -void * mad_encode(void *buf, ib_rpc_t *rpc, ib_dr_path_t *drpath, void *data); -uint64_t mad_trid(void); -int mad_build_pkt(void *umad, ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, void *data); +MAD_EXPORT uint64_t mad_get_field64(void *buf, int base_offs, int field); +MAD_EXPORT void mad_set_field64(void *buf, int base_offs, int field, uint64_t val); +MAD_EXPORT void mad_set_array(void *buf, int base_offs, int field, void *val); +MAD_EXPORT void mad_get_array(void *buf, int base_offs, int field, void *val); +MAD_EXPORT void mad_decode_field(uint8_t *buf, int field, void *val); +MAD_EXPORT void mad_encode_field(uint8_t *buf, int field, void *val); +MAD_EXPORT void *mad_encode(void *buf, ib_rpc_t *rpc, ib_dr_path_t *drpath, void *data); +MAD_EXPORT uint64_t mad_trid(void); +MAD_EXPORT int mad_build_pkt(void *umad, ib_rpc_t *rpc, ib_portid_t *dport, + ib_rmpp_hdr_t *rmpp, void *data); /* register.c */ -int mad_register_port_client(int port_id, int mgmt, uint8_t rmpp_version); -int mad_register_client(int mgmt, uint8_t rmpp_version); -int mad_register_server(int mgmt, uint8_t rmpp_version, - long method_mask[16/sizeof(long)], - uint32_t class_oui); -int mad_class_agent(int mgmt); -int mad_agent_class(int agent); +MAD_EXPORT int mad_register_port_client(int port_id, int mgmt, + uint8_t rmpp_version); +MAD_EXPORT int mad_register_client(int mgmt, uint8_t rmpp_version); +MAD_EXPORT int mad_register_server(int mgmt, uint8_t rmpp_version, + long method_mask[16/sizeof(long)], + uint32_t class_oui); +MAD_EXPORT int mad_class_agent(int mgmt); +MAD_EXPORT int mad_agent_class(int agent); /* serv.c */ -int mad_send(ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, - void *data); -void * mad_receive(void *umad, int timeout); -int mad_respond(void *umad, ib_portid_t *portid, uint32_t rstatus); -void * mad_alloc(void); -void mad_free(void *umad); +MAD_EXPORT int mad_send(ib_rpc_t *rpc, ib_portid_t *dport, + ib_rmpp_hdr_t *rmpp, void *data); +MAD_EXPORT void * mad_receive(void *umad, int timeout); +MAD_EXPORT int mad_respond(void *umad, ib_portid_t *portid, uint32_t rstatus); +MAD_EXPORT void * mad_alloc(void); +MAD_EXPORT void mad_free(void *umad); /* vendor.c */ -uint8_t *ib_vendor_call(void *data, ib_portid_t *portid, - ib_vendor_call_t *call); +MAD_EXPORT uint8_t *ib_vendor_call(void *data, ib_portid_t *portid, + ib_vendor_call_t *call); static inline int mad_is_vendor_range1(int mgmt) @@ -718,38 +698,41 @@ } static inline int -mad_is_vendor_range2(int mgmt) +mad_is_vendor_range2(int mgmt) { return mgmt >= 0x30 && mgmt <= 0x4f; } /* rpc.c */ -int madrpc_portid(void); -int madrpc_set_retries(int retries); -int madrpc_set_timeout(int timeout); -void * madrpc(ib_rpc_t *rpc, ib_portid_t *dport, void *payload, void *rcvdata); -void * madrpc_rmpp(ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, +MAD_EXPORT int madrpc_portid(void); +MAD_EXPORT int madrpc_set_retries(int retries); +MAD_EXPORT int madrpc_set_timeout(int timeout); +MAD_EXPORT void madrpc_init(char *dev_name, int dev_port, + int *mgmt_classes, int num_classes); +MAD_EXPORT void madrpc_show_errors(int set); +void * madrpc(ib_rpc_t *rpc, ib_portid_t *dport, + void *payload, void *rcvdata); +void * madrpc_rmpp(ib_rpc_t *rpc, ib_portid_t *dport, + ib_rmpp_hdr_t *rmpp, void *data); + +void madrpc_save_mad(void *madbuf, int len); +void madrpc_lock(void); +void madrpc_unlock(void); +void * mad_rpc_open_port(char *dev_name, int dev_port, + int *mgmt_classes, int num_classes); +void mad_rpc_close_port(void *ibmad_port); +void * mad_rpc(const void *ibmad_port, ib_rpc_t *rpc, + ib_portid_t *dport, void *payload, + void *rcvdata); +void * mad_rpc_rmpp(const void *ibmad_port, ib_rpc_t *rpc, + ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, void *data); -void madrpc_init(char *dev_name, int dev_port, int *mgmt_classes, - int num_classes); -void madrpc_save_mad(void *madbuf, int len); -void madrpc_lock(void); -void madrpc_unlock(void); -void madrpc_show_errors(int set); - -void * mad_rpc_open_port(char *dev_name, int dev_port, int *mgmt_classes, - int num_classes); -void mad_rpc_close_port(void *ibmad_port); -void * mad_rpc(const void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport, - void *payload, void *rcvdata); -void * mad_rpc_rmpp(const void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport, - ib_rmpp_hdr_t *rmpp, void *data); /* smp.c */ -uint8_t * smp_query(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, - unsigned timeout); -uint8_t * smp_set(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, - unsigned timeout); +MAD_EXPORT uint8_t * smp_query(void *buf, ib_portid_t *id, unsigned attrid, + unsigned mod, unsigned timeout); +MAD_EXPORT uint8_t * smp_set(void *buf, ib_portid_t *id, unsigned attrid, + unsigned mod, unsigned timeout); uint8_t * smp_query_via(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, unsigned timeout, const void *srcport); uint8_t * smp_set_via(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, @@ -786,9 +769,9 @@ unsigned timeout); uint8_t * sa_rpc_call(const void *ibmad_port, void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, unsigned timeout); -int ib_path_query(ibmad_gid_t srcgid, ibmad_gid_t destgid, ib_portid_t *sm_id, - void *buf); /* returns lid */ -int ib_path_query_via(const void *srcport, ibmad_gid_t srcgid, +MAD_EXPORT int ib_path_query(ibmad_gid_t srcgid, ibmad_gid_t destgid, + ib_portid_t *sm_id, void *buf); /* returns lid */ +int ib_path_query_via(const void *srcport, ibmad_gid_t srcgid, ibmad_gid_t destgid, ib_portid_t *sm_id, void *buf); inline static uint8_t * @@ -805,38 +788,38 @@ } /* resolve.c */ -int ib_resolve_smlid(ib_portid_t *sm_id, int timeout); -int ib_resolve_guid(ib_portid_t *portid, uint64_t *guid, - ib_portid_t *sm_id, int timeout); -int ib_resolve_portid_str(ib_portid_t *portid, char *addr_str, - int dest_type, ib_portid_t *sm_id); -int ib_resolve_self(ib_portid_t *portid, int *portnum, ibmad_gid_t *gid); - -int ib_resolve_smlid_via(ib_portid_t *sm_id, int timeout, - const void *srcport); -int ib_resolve_guid_via(ib_portid_t *portid, uint64_t *guid, - ib_portid_t *sm_id, int timeout, - const void *srcport); -int ib_resolve_portid_str_via(ib_portid_t *portid, char *addr_str, - int dest_type, ib_portid_t *sm_id, - const void *srcport); -int ib_resolve_self_via(ib_portid_t *portid, int *portnum, ibmad_gid_t *gid, - const void *srcport); +MAD_EXPORT int ib_resolve_smlid(ib_portid_t *sm_id, int timeout); +MAD_EXPORT int ib_resolve_guid(ib_portid_t *portid, uint64_t *guid, + ib_portid_t *sm_id, int timeout); +MAD_EXPORT int ib_resolve_portid_str(ib_portid_t *portid, char *addr_str, + int dest_type, ib_portid_t *sm_id); +MAD_EXPORT int ib_resolve_self(ib_portid_t *portid, int *portnum, + ibmad_gid_t *gid); +int ib_resolve_smlid_via(ib_portid_t *sm_id, int timeout, + const void *srcport); +int ib_resolve_guid_via(ib_portid_t *portid, uint64_t *guid, + ib_portid_t *sm_id, int timeout, + const void *srcport); +int ib_resolve_portid_str_via(ib_portid_t *portid, char *addr_str, + int dest_type, ib_portid_t *sm_id, + const void *srcport); +int ib_resolve_self_via(ib_portid_t *portid, int *portnum, ibmad_gid_t *gid, + const void *srcport); /* gs.c */ -uint8_t *perf_classportinfo_query(void *rcvbuf, ib_portid_t *dest, int port, +MAD_EXPORT uint8_t *perf_classportinfo_query(void *rcvbuf, ib_portid_t *dest, int port, unsigned timeout); -uint8_t *port_performance_query(void *rcvbuf, ib_portid_t *dest, int port, +MAD_EXPORT uint8_t *port_performance_query(void *rcvbuf, ib_portid_t *dest, int port, unsigned timeout); -uint8_t *port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, +MAD_EXPORT uint8_t *port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, unsigned mask, unsigned timeout); -uint8_t *port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port, +MAD_EXPORT uint8_t *port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port, unsigned timeout); -uint8_t *port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port, +MAD_EXPORT uint8_t *port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port, unsigned mask, unsigned timeout); -uint8_t *port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port, +MAD_EXPORT uint8_t *port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port, unsigned timeout); -uint8_t *port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port, +MAD_EXPORT uint8_t *port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port, unsigned timeout); uint8_t *perf_classportinfo_query_via(void *rcvbuf, ib_portid_t *dest, int port, @@ -855,7 +838,7 @@ unsigned timeout, const void *srcport); /* dump.c */ ib_mad_dump_fn - mad_dump_int, mad_dump_uint, mad_dump_hex, mad_dump_rhex, + MAD_EXPORT mad_dump_int, mad_dump_uint, mad_dump_hex, mad_dump_rhex, mad_dump_bitfield, mad_dump_array, mad_dump_string, mad_dump_linkwidth, mad_dump_linkwidthsup, mad_dump_linkwidthen, mad_dump_linkdowndefstate, @@ -900,6 +883,34 @@ extern int ibdebug; +/* remove libibcommon dependencies, use complib */ + +/* dump.c */ +MAD_EXPORT void xdump(FILE *file, char *msg, void *p, int size); + +/** printf style debugging MACRO's, map to cl_msg_out */ +#if !defined(IBWARN) +#define IBWARN(fmt, ...) cl_msg_out(fmt, ## __VA_ARGS__) +#endif +#if !defined(IBPANIC) +#define IBPANIC(fmt, ...) \ +{ \ + cl_msg_out(fmt, ## __VA_ARGS__); \ + CL_ASSERT(0); \ +} +#endif + +/** align value \a l to \a size (ceil) */ +#if !defined(ALIGN) +#define ALIGN(l, size) (((l) + ((size) - 1)) / (size) * (size)) +#endif + +/** align value \a l to \a sizeof 32 bit int (ceil) */ +#if !defined(ALIGN32) +#define ALIGN32(l) (ALIGN((l), sizeof(uint32))) +#endif + + END_C_DECLS #endif /* _MAD_H_ */ From arlin.r.davis at intel.com Thu Dec 18 15:38:30 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 18 Dec 2008 15:38:30 -0800 Subject: [ofa-general] PATCH[4/6] Windows port of libibmad - gs.c, mad.c, portid.c, register.c, resolve.c Message-ID: 4/6 - gs.c, mad.c, portid.c, register.c, resolve.c Signed-off by: Arlin Davis diff -aur libibmad-1.2.2/src/gs.c libibmad/src/gs.c --- libibmad-1.2.2/src/gs.c 2008-08-31 07:15:05.000000000 -0700 +++ libibmad/src/gs.c 2008-12-17 17:02:40.987157576 -0800 @@ -37,10 +37,16 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#include +#else #include #include #include #include +#endif #include #include "mad.h" @@ -98,7 +104,7 @@ srcport); } -uint8_t * +MAD_EXPORT uint8_t * perf_classportinfo_query(void *rcvbuf, ib_portid_t *dest, int port, unsigned timeout) { return pma_query(rcvbuf, dest, port, timeout, CLASS_PORT_INFO); @@ -112,7 +118,7 @@ IB_GSI_PORT_COUNTERS, srcport); } -uint8_t * +MAD_EXPORT uint8_t * port_performance_query(void *rcvbuf, ib_portid_t *dest, int port, unsigned timeout) { return pma_query(rcvbuf, dest, port, timeout, IB_GSI_PORT_COUNTERS); @@ -175,7 +181,7 @@ IB_GSI_PORT_COUNTERS, srcport); } -uint8_t * +MAD_EXPORT uint8_t * port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, unsigned mask, unsigned timeout) { @@ -190,7 +196,7 @@ IB_GSI_PORT_COUNTERS_EXT, srcport); } -uint8_t * +MAD_EXPORT uint8_t * port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port, unsigned timeout) { return pma_query(rcvbuf, dest, port, timeout, IB_GSI_PORT_COUNTERS_EXT); @@ -205,7 +211,7 @@ IB_GSI_PORT_COUNTERS_EXT, srcport); } -uint8_t * +MAD_EXPORT uint8_t * port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port, unsigned mask, unsigned timeout) { @@ -220,7 +226,7 @@ IB_GSI_PORT_SAMPLES_CONTROL, srcport); } -uint8_t * +MAD_EXPORT uint8_t * port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port, unsigned timeout) { return pma_query(rcvbuf, dest, port, timeout, IB_GSI_PORT_SAMPLES_CONTROL); @@ -234,7 +240,7 @@ IB_GSI_PORT_SAMPLES_RESULT, srcport); } -uint8_t * +MAD_EXPORT uint8_t * port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port, unsigned timeout) { return pma_query(rcvbuf, dest, port, timeout, IB_GSI_PORT_SAMPLES_RESULT); diff -aur libibmad-1.2.2/src/mad.c libibmad/src/mad.c --- libibmad-1.2.2/src/mad.c 2008-01-15 11:21:52.000000000 -0800 +++ libibmad/src/mad.c 2008-12-17 17:02:41.004154992 -0800 @@ -37,19 +37,63 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#include +#include +#define getpid GetCurrentProcessId +#else #include #include #include #include +#endif -#include #include #include #undef DEBUG #define DEBUG if (ibdebug) IBWARN -void +MAD_EXPORT uint32_t +mad_get_field(void *buf, int base_offs, int field) +{ + return _get_field(buf, base_offs, ib_mad_f + field); +} + +MAD_EXPORT void +mad_set_field(void *buf, int base_offs, int field, uint32_t val) +{ + _set_field(buf, base_offs, ib_mad_f + field, val); +} + +/* field must be byte aligned */ +MAD_EXPORT uint64_t +mad_get_field64(void *buf, int base_offs, int field) +{ + return _get_field64(buf, base_offs, ib_mad_f + field); +} + +MAD_EXPORT void +mad_set_field64(void *buf, int base_offs, int field, uint64_t val) +{ + _set_field64(buf, base_offs, ib_mad_f + field, val); +} + +MAD_EXPORT void +mad_set_array(void *buf, int base_offs, int field, void *val) +{ + _set_array(buf, base_offs, ib_mad_f + field, val); +} + +MAD_EXPORT void +mad_get_array(void *buf, int base_offs, int field, void *val) +{ + _get_array(buf, base_offs, ib_mad_f + field, val); +} + +MAD_EXPORT void mad_decode_field(uint8_t *buf, int field, void *val) { ib_field_t *f = ib_mad_f + field; @@ -69,7 +113,7 @@ _get_array(buf, 0, f, val); } -void +MAD_EXPORT void mad_encode_field(uint8_t *buf, int field, void *val) { ib_field_t *f = ib_mad_f + field; @@ -89,7 +133,7 @@ _set_array(buf, 0, f, val); } -uint64_t +MAD_EXPORT uint64_t mad_trid(void) { static uint64_t base; @@ -97,15 +141,15 @@ uint64_t next; if (!base) { - srandom(time(0)*getpid()); - base = random(); - trid = random(); + srand((int)time(0)*getpid()); + base = rand(); + trid = rand(); } next = ++trid | (base << 32); return next; } -void * +MAD_EXPORT void * mad_encode(void *buf, ib_rpc_t *rpc, ib_dr_path_t *drpath, void *data) { int is_resp = rpc->method & IB_MAD_RESPONSE; @@ -139,8 +183,8 @@ mad_set_field(buf, 0, IB_MAD_ATTRMOD_F, rpc->attr.mod); /* words 7,8 */ - mad_set_field(buf, 0, IB_MAD_MKEY_F, rpc->mkey >> 32); - mad_set_field(buf, 4, IB_MAD_MKEY_F, rpc->mkey & 0xffffffff); + mad_set_field(buf, 0, IB_MAD_MKEY_F, (uint32_t)(rpc->mkey >> 32)); + mad_set_field(buf, 4, IB_MAD_MKEY_F, (uint32_t)(rpc->mkey & 0xffffffff)); if (rpc->mgtclass == IB_SMI_DIRECT_CLASS) { /* word 9 */ @@ -167,7 +211,7 @@ return (uint8_t *)buf + IB_MAD_SIZE; } -int +MAD_EXPORT int mad_build_pkt(void *umad, ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, void *data) { @@ -211,5 +255,5 @@ mad_set_field(mad, 0, IB_SA_RMPP_D2_F, rmpp->d2.u); } - return p - mad; + return ((int)(p - mad)); } diff -aur libibmad-1.2.2/src/portid.c libibmad/src/portid.c --- libibmad-1.2.2/src/portid.c 2008-10-19 11:34:41.000000000 -0700 +++ libibmad/src/portid.c 2008-12-17 17:02:41.022152256 -0800 @@ -1,5 +1,6 @@ /* * Copyright (c) 2004-2008 Voltaire Inc. All rights reserved. + * Copyright (c) 2008 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -37,20 +38,72 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#include +#include +#define snprintf _snprintf +#else #include #include #include #include #include #include +#endif #include -#include #undef DEBUG #define DEBUG if (ibdebug) IBWARN -int +#if defined(_WIN32) || defined(_WIN64) +const char * _inet_ntop(int family, const void *addr, char *dst, size_t len) +{ + if (family == AF_INET) + { + struct sockaddr_in in; + in.sin_family = AF_INET; + memcpy(&in.sin_addr, addr, 4); + if (getnameinfo((struct sockaddr *)&in, + (socklen_t) (sizeof(struct sockaddr_in)), + dst, len, NULL, 0, NI_NUMERICHOST)) + return NULL; + } + else if (family == AF_INET6) + { + struct sockaddr_in6 in6; + memset(&in6, 0, sizeof in6); + in6.sin6_family = AF_INET6; + memcpy(&in6.sin6_addr, addr, sizeof(struct in_addr6)); + + /* if no ipv6 support return simple IPv6 format rule: + * A series of "0's in a 16bit block can be represented by "0" + */ + if (getnameinfo((struct sockaddr *)&in6, (socklen_t) (sizeof in6), + dst, len, NULL, 0, NI_NUMERICHOST)) + { + char tmp[sizeof "ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff"]; + int i, n=0; + + if (len < sizeof(tmp)) + return NULL; + + for (i = 0; i < 8; i++) + n += sprintf(tmp+n, "%s%x", + i?":":"", + ntohs(((unsigned short*)addr)[i])); + tmp[n]='\0'; + strcpy(dst, tmp); + } + } + return dst; +} +#define inet_ntop _inet_ntop +#endif + +MAD_EXPORT int portid2portnum(ib_portid_t *portid) { if (portid->lid > 0) @@ -62,7 +115,7 @@ return portid->drpath.p[(portid->drpath.cnt-1)]; } -char * +MAD_EXPORT char * portid2str(ib_portid_t *portid) { static char buf[1024] = "local"; @@ -86,7 +139,7 @@ return buf; } -int +MAD_EXPORT int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int drdlid) { char *s, *str = routepath; @@ -97,7 +150,7 @@ while (str && *str) { if ((s = strchr(str, ','))) *s = 0; - path->p[++path->cnt] = atoi(str); + path->p[++path->cnt] = (uint8_t)atoi(str); if (!s) break; str = s+1; @@ -109,17 +162,17 @@ return path->cnt; } -char * +MAD_EXPORT char * drpath2str(ib_dr_path_t *path, char *dstr, size_t dstr_size) { int i = 0; int rc = snprintf(dstr, dstr_size, "slid %d; dlid %d; %d", path->drslid, path->drdlid, path->p[0]); - if (rc >= dstr_size) + if (rc >= (int) dstr_size) return dstr; for (i = 1; i <= path->cnt; i++) { rc += snprintf(dstr+rc, dstr_size-rc, ",%d", path->p[i]); - if (rc >= dstr_size) + if (rc >= (int) dstr_size) break; } return (dstr); diff -aur libibmad-1.2.2/src/register.c libibmad/src/register.c --- libibmad-1.2.2/src/register.c 2008-08-31 07:15:05.000000000 -0700 +++ libibmad/src/register.c 2008-12-17 17:02:41.040149520 -0800 @@ -37,11 +37,16 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#else #include #include #include #include #include +#endif #include #include "mad.h" @@ -104,7 +109,7 @@ return 0; } -int +MAD_EXPORT int mad_class_agent(int mgmt) { if (mgmt < 1 || mgmt > MAX_CLASS) @@ -112,7 +117,7 @@ return class_agent[mgmt]; } -int +MAD_EXPORT int mad_agent_class(int agent) { if (agent < 1 || agent > MAX_AGENTS) @@ -120,7 +125,7 @@ return agent_class[agent]; } -int +MAD_EXPORT int mad_register_port_client(int port_id, int mgmt, uint8_t rmpp_version) { int vers, agent; @@ -143,7 +148,7 @@ return agent; } -int +MAD_EXPORT int mad_register_client(int mgmt, uint8_t rmpp_version) { int agent; @@ -155,7 +160,7 @@ return register_agent(agent, mgmt); } -int +MAD_EXPORT int mad_register_server(int mgmt, uint8_t rmpp_version, long method_mask[], uint32_t class_oui) { diff -aur libibmad-1.2.2/src/resolve.c libibmad/src/resolve.c --- libibmad-1.2.2/src/resolve.c 2008-08-31 07:15:05.000000000 -0700 +++ libibmad/src/resolve.c 2008-12-17 17:02:41.058146784 -0800 @@ -37,12 +37,17 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#define strtoull _strtoui64 +#else #include #include #include #include +#endif -#include #include #include @@ -67,7 +72,7 @@ return ib_portid_set(sm_id, lid, 0, 0); } -int +MAD_EXPORT int ib_resolve_smlid(ib_portid_t *sm_id, int timeout) { return ib_resolve_smlid_via(sm_id, timeout, NULL); @@ -95,6 +100,12 @@ return 0; } +MAD_EXPORT int +ib_resolve_guid(ib_portid_t *portid, uint64_t *guid, ib_portid_t *sm_id, int timeout) +{ + return ib_resolve_guid_via(portid, guid, sm_id, timeout, NULL); +} + int ib_resolve_portid_str_via(ib_portid_t *portid, char *addr_str, int dest_type, ib_portid_t *sm_id, const void *srcport) { @@ -144,7 +155,7 @@ return -1; } -int +MAD_EXPORT int ib_resolve_portid_str(ib_portid_t *portid, char *addr_str, int dest_type, ib_portid_t *sm_id) { return ib_resolve_portid_str_via(portid, addr_str, dest_type, @@ -179,7 +190,7 @@ return 0; } -int +MAD_EXPORT int ib_resolve_self(ib_portid_t *portid, int *portnum, ibmad_gid_t *gid) { return ib_resolve_self_via (portid, portnum, gid, NULL); From arlin.r.davis at intel.com Thu Dec 18 15:38:41 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 18 Dec 2008 15:38:41 -0800 Subject: [ofa-general] PATCH[5/6] Windows port of libibmad - rpc.c, sa.c, serv.c, smp.c, vendor.c Message-ID: 5/6 - rpc.c, sa.c, serv.c, smp.c, vendor.c Signed-off by: Arlin Davis diff -aur libibmad-1.2.2/src/rpc.c libibmad/src/rpc.c --- libibmad-1.2.2/src/rpc.c 2008-08-31 07:15:05.000000000 -0700 +++ libibmad/src/rpc.c 2008-12-17 17:02:41.077143896 -0800 @@ -37,14 +37,21 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#include +#else #include #include #include #include #include +#endif #include #include "mad.h" +#include #define MAX_CLASS 256 @@ -69,7 +76,7 @@ #define MAD_TID(mad) (*((uint64_t *)((char *)(mad) + 8))) -void +MAD_EXPORT void madrpc_show_errors(int set) { iberrs = set; @@ -82,7 +89,7 @@ save_mad_len = len; } -int +MAD_EXPORT int madrpc_set_retries(int retries) { if (retries > 0) @@ -90,7 +97,7 @@ return madrpc_retries; } -int +MAD_EXPORT int madrpc_set_timeout(int timeout) { def_madrpc_timeout = timeout; @@ -103,7 +110,7 @@ return def_madrpc_timeout; } -int +MAD_EXPORT int madrpc_portid(void) { return mad_portid; @@ -131,7 +138,7 @@ save_mad = 0; } - trid = mad_get_field64(umad_get_mad(sndbuf), 0, IB_MAD_TRID_F); + trid = (uint32_t)mad_get_field64(umad_get_mad(sndbuf), 0, IB_MAD_TRID_F); for (retries = 0; retries < madrpc_retries; retries++) { if (retries) { @@ -287,21 +294,21 @@ return mad_rpc_rmpp(&port, rpc, dport, rmpp, data); } -static pthread_mutex_t rpclock = PTHREAD_MUTEX_INITIALIZER; +static cl_plock_t rpclock; void madrpc_lock(void) { - pthread_mutex_lock(&rpclock); + cl_plock_acquire(&rpclock); } void madrpc_unlock(void) { - pthread_mutex_unlock(&rpclock); + cl_plock_release(&rpclock); } -void +MAD_EXPORT void madrpc_init(char *dev_name, int dev_port, int *mgmt_classes, int num_classes) { if (umad_init() < 0) @@ -314,7 +321,7 @@ IBPANIC("too many classes %d requested", num_classes); while (num_classes--) { - int rmpp_version = 0; + uint8_t rmpp_version = 0; int mgmt = *mgmt_classes++; if (mgmt == IB_SA_CLASS) @@ -322,6 +329,7 @@ if (mad_register_client(mgmt, rmpp_version) < 0) IBPANIC("client_register for mgmt class %d failed", mgmt); } + cl_plock_init(&rpclock); } void * @@ -359,7 +367,7 @@ } while (num_classes--) { - int rmpp_version = 0; + uint8_t rmpp_version = 0; int mgmt = *mgmt_classes++; int agent; diff -aur libibmad-1.2.2/src/sa.c libibmad/src/sa.c --- libibmad-1.2.2/src/sa.c 2008-08-31 07:15:05.000000000 -0700 +++ libibmad/src/sa.c 2008-12-17 17:02:41.095141160 -0800 @@ -37,13 +37,17 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#else #include #include #include #include +#endif #include -#include #undef DEBUG #define DEBUG if (ibdebug) IBWARN @@ -145,7 +149,8 @@ mad_decode_field(p, IB_SA_PR_DLID_F, &dlid); return dlid; } -int + +MAD_EXPORT int ib_path_query(ibmad_gid_t srcgid, ibmad_gid_t destgid, ib_portid_t *sm_id, void *buf) { return ib_path_query_via (NULL, srcgid, destgid, sm_id, buf); diff -aur libibmad-1.2.2/src/serv.c libibmad/src/serv.c --- libibmad-1.2.2/src/serv.c 2008-01-15 11:21:52.000000000 -0800 +++ libibmad/src/serv.c 2008-12-17 17:02:41.113138424 -0800 @@ -37,20 +37,25 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#include +#else #include #include #include #include #include +#endif -#include #include #include #undef DEBUG #define DEBUG if (ibdebug) IBWARN -int +MAD_EXPORT int mad_send(ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, void *data) { uint8_t pktbuf[1024]; @@ -78,7 +83,7 @@ return 0; } -int +MAD_EXPORT int mad_respond(void *umad, ib_portid_t *portid, uint32_t rstatus) { uint8_t *mad = umad_get_mad(umad); @@ -152,7 +157,7 @@ return 0; } -void * +MAD_EXPORT void * mad_receive(void *umad, int timeout) { void *mad = umad ? umad : umad_alloc(1, umad_size() + IB_MAD_SIZE); @@ -170,13 +175,13 @@ return mad; } -void * +MAD_EXPORT void * mad_alloc(void) { return umad_alloc(1, umad_size() + IB_MAD_SIZE); } -void +MAD_EXPORT void mad_free(void *umad) { umad_free(umad); diff -aur libibmad-1.2.2/src/smp.c libibmad/src/smp.c --- libibmad-1.2.2/src/smp.c 2008-08-31 07:15:05.000000000 -0700 +++ libibmad/src/smp.c 2008-12-17 17:02:41.140134320 -0800 @@ -37,13 +37,18 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#include +#else #include #include #include #include +#endif #include -#include #undef DEBUG #define DEBUG if (ibdebug) IBWARN @@ -78,7 +83,7 @@ } } -uint8_t * +MAD_EXPORT uint8_t * smp_set(void *data, ib_portid_t *portid, unsigned attrid, unsigned mod, unsigned timeout) { return smp_set_via(data, portid, attrid, mod, timeout, NULL); @@ -115,7 +120,7 @@ } } -uint8_t * +MAD_EXPORT uint8_t * smp_query(void *rcvbuf, ib_portid_t *portid, unsigned attrid, unsigned mod, unsigned timeout) { Only in libibmad/src: Sources diff -aur libibmad-1.2.2/src/vendor.c libibmad/src/vendor.c --- libibmad-1.2.2/src/vendor.c 2007-07-15 14:27:05.000000000 -0700 +++ libibmad/src/vendor.c 2008-12-17 17:02:41.157131736 -0800 @@ -37,13 +37,17 @@ #include #include + +#if defined(_WIN32) || defined(_WIN64) +#include +#else #include #include #include #include +#endif #include -#include #undef DEBUG #define DEBUG if (ibdebug) IBWARN @@ -56,7 +60,7 @@ method == IB_MAD_METHOD_TRAP; } -uint8_t * +MAD_EXPORT uint8_t * ib_vendor_call(void *data, ib_portid_t *portid, ib_vendor_call_t *call) { ib_rpc_t rpc = {0}; From arlin.r.davis at intel.com Thu Dec 18 15:38:55 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 18 Dec 2008 15:38:55 -0800 Subject: [ofa-general] PATCH[6/6] Windows port of libibmad - dirs, src/Sources, src/ibmad_export.def, src/ibmad_exports.src, ibmad_main.cpp Message-ID: 6/6 - new files for windows: dirs, src/Sources, src/ibmad_export.def, src/ibmad_exports.src, ibmad_main.cpp Signed-off by: Arlin Davis diff -Naur libibmad-1.2.2/dirs libibmad/dirs --- libibmad-1.2.2/dirs 1969-12-31 16:00:00.000000000 -0800 +++ libibmad/dirs 2008-07-08 10:28:28.000000000 -0700 @@ -0,0 +1,2 @@ +DIRS = \ + src \ No newline at end of file diff -Naur libibmad-1.2.2/src/ibmad_export.def libibmad/src/ibmad_export.def --- libibmad-1.2.2/src/ibmad_export.def 1969-12-31 16:00:00.000000000 -0800 +++ libibmad/src/ibmad_export.def 2008-11-21 11:21:22.000000000 -0800 @@ -0,0 +1,34 @@ +/* + * Copyright (c) 2008 Intel Corporation. All rights reserved. + * + * This software is available to you under the OpenIB.org BSD license + * below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AWV + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +LIBRARY LIBIBMAD.DLL + +EXPORTS + DllCanUnloadNow PRIVATE + DllGetClassObject PRIVATE diff -Naur libibmad-1.2.2/src/ibmad_exports.src libibmad/src/ibmad_exports.src --- libibmad-1.2.2/src/ibmad_exports.src 1969-12-31 16:00:00.000000000 -0800 +++ libibmad/src/ibmad_exports.src 2008-12-12 16:53:52.000000000 -0800 @@ -0,0 +1,54 @@ +#if DBG +LIBRARY libibmadd.dll +#else +LIBRARY libibmad.dll +#endif + +#ifndef _WIN64 +EXPORTS + mad_set_field; + mad_get_field; + mad_set_array; + mad_get_array; + mad_set_field64; + mad_get_field64; + mad_decode_field; + mad_encode_field; + mad_encode; + mad_trid; + mad_build_pkt; + mad_register_port_client; + mad_register_client; + mad_register_server; + mad_class_agent; + mad_agent_class; + mad_send; + mad_receive; + mad_respond; + mad_alloc; + mad_free; + madrpc_portid; + madrpc_init; + madrpc_set_retries; + madrpc_set_timeout; + madrpc_show_errors; + smp_query; + smp_set; + ib_vendor_call; + ib_path_query; + ib_resolve_smlid; + ib_resolve_guid; + ib_resolve_portid_str; + ib_resolve_self; + perf_classportinfo_query; + port_performance_query; + port_performance_reset; + port_performance_ext_query; + port_performance_ext_reset; + port_samples_control_query; + port_samples_result_query; + portid2str; + portid2portnum; + str2drpath; + drpath2str; +#endif diff -Naur libibmad-1.2.2/src/ibmad_main.cpp libibmad/src/ibmad_main.cpp --- libibmad-1.2.2/src/ibmad_main.cpp 1969-12-31 16:00:00.000000000 -0800 +++ libibmad/src/ibmad_main.cpp 2008-07-08 10:28:28.000000000 -0700 @@ -0,0 +1,39 @@ +/* + * Copyright (c) 2008 Intel Corporation. All rights reserved. + * + * This software is available to you under the OpenIB.org BSD license + * below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AWV + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include + +BOOLEAN WINAPI DllMain(HINSTANCE hInstance, DWORD dwReason, LPVOID lpReserved) +{ + UNREFERENCED_PARAMETER(hInstance); + UNREFERENCED_PARAMETER(dwReason); + UNREFERENCED_PARAMETER(lpReserved); + + return TRUE; +} diff -Naur libibmad-1.2.2/src/Sources libibmad/src/Sources --- libibmad-1.2.2/src/Sources 1969-12-31 16:00:00.000000000 -0800 +++ libibmad/src/Sources 2008-12-03 09:59:42.000000000 -0800 @@ -0,0 +1,47 @@ +!if $(FREEBUILD) +TARGETNAME = libibmad +!else +TARGETNAME = libibmadd +!endif + +TARGETPATH = ..\..\..\bin\user\obj$(BUILD_ALT_DIR) +TARGETTYPE = DYNLINK + +DLLDEF = $(OBJ_PATH)\$O\ibmad_exports.def + +DLLENTRY = DllMain +USE_MSVCRT=1 + +SOURCES = \ + ibmad_main.cpp \ + dump.c \ + fields.c \ + gs.c \ + mad.c \ + portid.c \ + register.c \ + resolve.c \ + rpc.c \ + sa.c \ + serv.c \ + smp.c \ + vendor.c + +INCLUDES = ..\include\infiniband;..\..\libibverbs\include;..\..\libibumad\include;..\..\..\inc;..\..\..\inc\user; + +USER_C_FLAGS = $(USER_C_FLAGS) -DEXPORT_IBMAD_SYMBOLS + +TARGETLIBS = \ + $(SDK_LIB_PATH)\kernel32.lib \ + $(SDK_LIB_PATH)\uuid.lib \ + $(SDK_LIB_PATH)\ws2_32.lib \ + $(SDK_LIB_PATH)\advapi32.lib \ + $(SDK_LIB_PATH)\user32.lib \ + $(SDK_LIB_PATH)\ole32.lib \ +!if $(FREEBUILD) + $(TARGETPATH)\*\complib.lib \ + $(TARGETPATH)\*\libibumad.lib +!else + $(TARGETPATH)\*\complibd.lib \ + $(TARGETPATH)\*\libibumadd.lib +!endif From sean.hefty at intel.com Thu Dec 18 16:10:13 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 Dec 2008 16:10:13 -0800 Subject: [ofa-general] RE: PATCH[1/6] Windows port of libibmad - mad.h In-Reply-To: References: Message-ID: <000101c9616e$2a602290$435a180a@amr.corp.intel.com> >Port of libibmad to windows. Dependencies on libibumad port and complib (in >lieu of libcommon). Removed dependency on libibcommon. > >Intent is to allow common mad code base for Windows and Linux to simplify >maintainablity across OFED and WinOF. This patch set was built and tested on >Windows and built on Linux (not tested yet). Thanks for posting this. Seeing the actual changes helps understand the impact better. Note that the libib_u_mad implementations are not shared. Only the interface is maintained to simplify porting. Looking at the changes, do the management developers think that it makes sense to share the libibmad implementation, or should separate implementations be maintained, similar to libibumad? If the implementations are not shared, can the Linux side treat the API as an external interface, rather than a private interface? >+#if defined(_WIN32) || defined(_WIN64) >+#define MAD_EXPORT __declspec(dllexport) >+#else >+#define MAD_EXPORT extern I don't know that 'extern' is appropriate here. >+#endif >+ > #define IB_SUBNET_PATH_HOPS_MAX 64 >-#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000llu >+#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000ULL > #define IB_DEFAULT_QP1_QKEY 0x80010000 > > #define IB_MAD_SIZE 256 >@@ -620,10 +628,10 @@ > >/****************************************************************************** >/ > > /* portid.c */ >-char * portid2str(ib_portid_t *portid); >-int portid2portnum(ib_portid_t *portid); >-int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int >drdlid); >-char * drpath2str(ib_dr_path_t *path, char *dstr, size_t dstr_size); >+MAD_EXPORT char * portid2str(ib_portid_t *portid); >+MAD_EXPORT int portid2portnum(ib_portid_t *portid); >+MAD_EXPORT int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int >drdlid); >+MAD_EXPORT char * drpath2str(ib_dr_path_t *path, char *dstr, size_t >dstr_size); > > static inline int > ib_portid_set(ib_portid_t *portid, int lid, int qp, int qkey) >@@ -639,77 +647,49 @@ > /* fields.c */ > extern ib_field_t ib_mad_f[]; > >-void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); >+void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); > uint32_t _get_field(void *buf, int base_offs, ib_field_t *f); >-void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); >-void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); >-void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); >+void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); >+void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); >+void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); > uint64_t _get_field64(void *buf, int base_offs, ib_field_t *f); Are these really the functions that should be exported from the library or in the header file? (I'm probably missing some history here.) - Sean From sean.hefty at intel.com Thu Dec 18 16:19:37 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 Dec 2008 16:19:37 -0800 Subject: [ofa-general] RE: PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: References: Message-ID: <000201c9616f$7a860f40$435a180a@amr.corp.intel.com> >-void >+MAD_EXPORT void > mad_dump_hex(char *buf, int bufsz, void *val, int valsz) > { > switch (valsz) { >@@ -115,13 +151,13 @@ > snprintf(buf, bufsz, "0x%08x", *(uint32_t *)val); > break; > case 5: >- snprintf(buf, bufsz, "0x%010" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffllu); >+ snprintf(buf, bufsz, "0x%010" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffULL); > break; > case 6: >- snprintf(buf, bufsz, "0x%012" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffffllu); >+ snprintf(buf, bufsz, "0x%012" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffffULL); > break; > case 7: >- snprintf(buf, bufsz, "0x%014" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffffffllu); >+ snprintf(buf, bufsz, "0x%014" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffffffULL); > break; > case 8: > snprintf(buf, bufsz, "0x%016" PRIx64, *(uint64_t *)val); >@@ -132,7 +168,7 @@ > } > } > >-void >+MAD_EXPORT void > mad_dump_rhex(char *buf, int bufsz, void *val, int valsz) > { > switch (valsz) { >@@ -149,13 +185,13 @@ > snprintf(buf, bufsz, "%08x", *(uint32_t *)val); > break; > case 5: >- snprintf(buf, bufsz, "%010" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffllu); >+ snprintf(buf, bufsz, "%010" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffULL); > break; > case 6: >- snprintf(buf, bufsz, "%012" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffffllu); >+ snprintf(buf, bufsz, "%012" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffffULL); > break; > case 7: >- snprintf(buf, bufsz, "%014" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffffffllu); >+ snprintf(buf, bufsz, "%014" PRIx64, *(uint64_t *)val & (uint64_t) >0xffffffffffffffULL); > break; > case 8: > snprintf(buf, bufsz, "%016" PRIx64, *(uint64_t *)val); >@@ -166,7 +202,7 @@ > } > } I know this isn't part of this patch, but how about having mad_dump_hex call mad_dump_rhex? >-void >+MAD_EXPORT void > mad_dump_node_type(char *buf, int bufsz, void *val, int valsz) > { > int nodetype = *(int*)val; >@@ -603,7 +639,7 @@ > uint8_t res_vl; > uint8_t weight; > } vl_entry[IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]; >-} __attribute__((packed)) ib_vl_arb_table_t; >+} ib_vl_arb_table_t; Packing doesn't look needed here, but complib provides an OS independent method for structure packing. - Sean From sean.hefty at intel.com Thu Dec 18 16:31:49 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 Dec 2008 16:31:49 -0800 Subject: [ofa-general] PATCH[5/6] Windows port of libibmad - rpc.c, sa.c, serv.c, smp.c, vendor.c In-Reply-To: References: Message-ID: <000301c96171$2e579420$435a180a@amr.corp.intel.com> >-static pthread_mutex_t rpclock = PTHREAD_MUTEX_INITIALIZER; >+static cl_plock_t rpclock; There's a complib mutex implementation available. plock is a reader/writer lock. >- pthread_mutex_lock(&rpclock); >+ cl_plock_acquire(&rpclock); > } > > void > madrpc_unlock(void) > { >- pthread_mutex_unlock(&rpclock); >+ cl_plock_release(&rpclock); The lock is only acquired/released to serialize access. >-void >+MAD_EXPORT void > madrpc_init(char *dev_name, int dev_port, int *mgmt_classes, int num_classes) > { > if (umad_init() < 0) >@@ -314,7 +321,7 @@ > IBPANIC("too many classes %d requested", num_classes); > > while (num_classes--) { >- int rmpp_version = 0; >+ uint8_t rmpp_version = 0; > int mgmt = *mgmt_classes++; > > if (mgmt == IB_SA_CLASS) >@@ -322,6 +329,7 @@ > if (mad_register_client(mgmt, rmpp_version) < 0) > IBPANIC("client_register for mgmt class %d failed", mgmt); > } >+ cl_plock_init(&rpclock); Most complib objects also have construct and destroy calls that should be invoked. Maybe you can avoid the construct call here, but I'm not sure about destroy. - Sean From weiny2 at llnl.gov Thu Dec 18 16:48:13 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 18 Dec 2008 16:48:13 -0800 Subject: [ofa-general] [PATCH] OpenSM: update osmeventplugin example for the new TRAP event. Message-ID: <20081218164813.55696c45.weiny2@llnl.gov> It turns out that I already was using the "OSM_EVENT_ID_TRAP" in the example plugin. This makes the use work, Ira >From 7b744c38fc2aad67586ade81d65326a139a85681 Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Thu, 18 Dec 2008 16:16:37 -0800 Subject: [PATCH] OpenSM: update osmeventplugin example for the new TRAP event. Signed-off-by: Ira Weiny --- opensm/include/opensm/osm_event_plugin.h | 12 ------------ opensm/osmeventplugin/src/osmeventplugin.c | 28 ++++++++++++++++++++-------- 2 files changed, 20 insertions(+), 20 deletions(-) diff --git a/opensm/include/opensm/osm_event_plugin.h b/opensm/include/opensm/osm_event_plugin.h index 0922c65..41a5810 100644 --- a/opensm/include/opensm/osm_event_plugin.h +++ b/opensm/include/opensm/osm_event_plugin.h @@ -131,18 +131,6 @@ typedef struct osm_api_ps_event { } osm_epi_ps_event_t; /** ========================================================================= - * Trap events - */ -typedef struct osm_epi_trap_event { - osm_epi_port_id_t port_id; - uint8_t type; - uint32_t prod_type; - uint16_t trap_num; - uint16_t issuer_lid; - time_t time; -} osm_epi_trap_event_t; - -/** ========================================================================= * Plugin creators should allocate an object of this type * (named OSM_EVENT_PLUGIN_IMPL_NAME) * The version should be set to OSM_EVENT_PLUGIN_INTERFACE_VER diff --git a/opensm/osmeventplugin/src/osmeventplugin.c b/opensm/osmeventplugin/src/osmeventplugin.c index f0781eb..b4d9ce9 100644 --- a/opensm/osmeventplugin/src/osmeventplugin.c +++ b/opensm/osmeventplugin/src/osmeventplugin.c @@ -137,13 +137,21 @@ static void handle_port_select(_log_events_t * log, osm_epi_ps_event_t * ps) /** ========================================================================= */ -static void handle_trap_event(_log_events_t * log, osm_epi_trap_event_t * trap) +static void handle_trap_event(_log_events_t *log, ib_mad_notice_attr_t *p_ntc) { - fprintf(log->log_file, - "Trap event %d from 0x%" PRIx64 " (%s) port %d\n", - trap->trap_num, - trap->port_id.node_guid, - trap->port_id.node_name, trap->port_id.port_num); + if (ib_notice_is_generic(p_ntc)) { + fprintf(log->log_file, + "Generic trap type %d; event %d; from LID 0x%x\n", + ib_notice_get_type(p_ntc), + cl_ntoh16(p_ntc->g_or_v.generic.trap_num), + cl_ntoh16(p_ntc->issuer_lid)); + } else { + fprintf(log->log_file, + "Vendor trap type %d; from LID 0x%x\n", + ib_notice_get_type(p_ntc), + cl_ntoh16(p_ntc->issuer_lid)); + } + } /** ========================================================================= @@ -163,13 +171,17 @@ static void report(void *_log, osm_epi_event_id_t event_id, void *event_data) handle_port_select(log, (osm_epi_ps_event_t *) event_data); break; case OSM_EVENT_ID_TRAP: - handle_trap_event(log, (osm_epi_trap_event_t *) event_data); + handle_trap_event(log, (ib_mad_notice_attr_t *) event_data); + break; + case OSM_EVENT_ID_SUBNET_UP: + fprintf(log->log_file, "Subnet up reported\n"); break; case OSM_EVENT_ID_MAX: default: osm_log(log->osmlog, OSM_LOG_ERROR, - "Unknown event reported to plugin\n"); + "Unknown event (%d) reported to plugin\n", event_id); } + fflush(log->log_file); } /** ========================================================================= -- 1.5.4.5 From weiny2 at llnl.gov Thu Dec 18 17:21:53 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 18 Dec 2008 17:21:53 -0800 Subject: [ofa-general] PATCH[1/6] Windows port of libibmad - mad.h In-Reply-To: References: Message-ID: <20081218172153.37bb18ed.weiny2@llnl.gov> On Thu, 18 Dec 2008 15:37:57 -0800 "Davis, Arlin R" wrote: > > Port of libibmad to windows. Dependencies on libibumad port and complib (in lieu of libcommon). Removed dependency on libibcommon. > > Intent is to allow common mad code base for Windows and Linux to simplify maintainablity across OFED and WinOF. This patch set was built and tested on Windows and built on Linux (not tested yet). > > Patches separated as follow: > > 1/6 - mad.h > 2/6 - dump.c > 3/6 - fields.c > 4/6 - gs.c, mad.c, portid.c, register.c, resolve.c > 5/6 - rpc.c, sa.c, serv.c, smp.c, vendor.c > 6/6 - new files for windows: dirs, src/Sources, src/ibmad_export.def, src/ibmad_exports.src, ibmad_main.cpp > > Signed-off by: Arlin Davis > > diff -aur libibmad-1.2.2/include/infiniband/mad.h libibmad/include/infiniband/mad.h > --- libibmad-1.2.2/include/infiniband/mad.h 2008-08-31 07:15:05.000000000 -0700 > +++ libibmad/include/infiniband/mad.h 2008-12-17 17:02:54.873046600 -0800 > @@ -33,8 +33,10 @@ > #ifndef _MAD_H_ > #define _MAD_H_ > > -#include > -#include > +/* use complib for portability */ > +#include > +#include > +#include > > #ifdef __cplusplus > # define BEGIN_C_DECLS extern "C" { > @@ -46,8 +48,14 @@ > > BEGIN_C_DECLS > > +#if defined(_WIN32) || defined(_WIN64) > +#define MAD_EXPORT __declspec(dllexport) > +#else > +#define MAD_EXPORT extern > +#endif > + Could this be put in the cl_ headers somewhere and be called CL_EXPORT or CL_EXTERN (my preference if this is going to cause functions on the Linux side to be declared extern)? > #define IB_SUBNET_PATH_HOPS_MAX 64 > -#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000llu > +#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000ULL > #define IB_DEFAULT_QP1_QKEY 0x80010000 > > #define IB_MAD_SIZE 256 > @@ -620,10 +628,10 @@ > /******************************************************************************/ > > /* portid.c */ > -char * portid2str(ib_portid_t *portid); > -int portid2portnum(ib_portid_t *portid); > -int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int drdlid); > -char * drpath2str(ib_dr_path_t *path, char *dstr, size_t dstr_size); > +MAD_EXPORT char * portid2str(ib_portid_t *portid); > +MAD_EXPORT int portid2portnum(ib_portid_t *portid); > +MAD_EXPORT int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int drdlid); > +MAD_EXPORT char * drpath2str(ib_dr_path_t *path, char *dstr, size_t dstr_size); > > static inline int > ib_portid_set(ib_portid_t *portid, int lid, int qp, int qkey) > @@ -639,77 +647,49 @@ > /* fields.c */ > extern ib_field_t ib_mad_f[]; > > -void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); > +void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); > uint32_t _get_field(void *buf, int base_offs, ib_field_t *f); > -void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); > -void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); > -void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); > +void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); > +void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); > +void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); > uint64_t _get_field64(void *buf, int base_offs, ib_field_t *f); > > /* mad.c */ > -static inline uint32_t > -mad_get_field(void *buf, int base_offs, int field) > -{ > - return _get_field(buf, base_offs, ib_mad_f + field); > -} > - > -static inline void > -mad_set_field(void *buf, int base_offs, int field, uint32_t val) > -{ > - _set_field(buf, base_offs, ib_mad_f + field, val); > -} > - > +MAD_EXPORT uint32_t mad_get_field(void *buf, int base_offs, int field); > +MAD_EXPORT void mad_set_field(void *buf, int base_offs, int field, uint32_t val); > /* field must be byte aligned */ > -static inline uint64_t > -mad_get_field64(void *buf, int base_offs, int field) > -{ > - return _get_field64(buf, base_offs, ib_mad_f + field); > -} > - > -static inline void > -mad_set_field64(void *buf, int base_offs, int field, uint64_t val) > -{ > - _set_field64(buf, base_offs, ib_mad_f + field, val); > -} > - > -static inline void > -mad_set_array(void *buf, int base_offs, int field, void *val) > -{ > - _set_array(buf, base_offs, ib_mad_f + field, val); > -} > - > -static inline void > -mad_get_array(void *buf, int base_offs, int field, void *val) > -{ > - _get_array(buf, base_offs, ib_mad_f + field, val); > -} > - > -void mad_decode_field(uint8_t *buf, int field, void *val); > -void mad_encode_field(uint8_t *buf, int field, void *val); > -void * mad_encode(void *buf, ib_rpc_t *rpc, ib_dr_path_t *drpath, void *data); > -uint64_t mad_trid(void); > -int mad_build_pkt(void *umad, ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, void *data); > +MAD_EXPORT uint64_t mad_get_field64(void *buf, int base_offs, int field); > +MAD_EXPORT void mad_set_field64(void *buf, int base_offs, int field, uint64_t val); > +MAD_EXPORT void mad_set_array(void *buf, int base_offs, int field, void *val); > +MAD_EXPORT void mad_get_array(void *buf, int base_offs, int field, void *val); > +MAD_EXPORT void mad_decode_field(uint8_t *buf, int field, void *val); > +MAD_EXPORT void mad_encode_field(uint8_t *buf, int field, void *val); > +MAD_EXPORT void *mad_encode(void *buf, ib_rpc_t *rpc, ib_dr_path_t *drpath, void *data); > +MAD_EXPORT uint64_t mad_trid(void); > +MAD_EXPORT int mad_build_pkt(void *umad, ib_rpc_t *rpc, ib_portid_t *dport, > + ib_rmpp_hdr_t *rmpp, void *data); > > /* register.c */ > -int mad_register_port_client(int port_id, int mgmt, uint8_t rmpp_version); > -int mad_register_client(int mgmt, uint8_t rmpp_version); > -int mad_register_server(int mgmt, uint8_t rmpp_version, > - long method_mask[16/sizeof(long)], > - uint32_t class_oui); > -int mad_class_agent(int mgmt); > -int mad_agent_class(int agent); > +MAD_EXPORT int mad_register_port_client(int port_id, int mgmt, > + uint8_t rmpp_version); > +MAD_EXPORT int mad_register_client(int mgmt, uint8_t rmpp_version); > +MAD_EXPORT int mad_register_server(int mgmt, uint8_t rmpp_version, > + long method_mask[16/sizeof(long)], > + uint32_t class_oui); > +MAD_EXPORT int mad_class_agent(int mgmt); > +MAD_EXPORT int mad_agent_class(int agent); > > /* serv.c */ > -int mad_send(ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, > - void *data); > -void * mad_receive(void *umad, int timeout); > -int mad_respond(void *umad, ib_portid_t *portid, uint32_t rstatus); > -void * mad_alloc(void); > -void mad_free(void *umad); > +MAD_EXPORT int mad_send(ib_rpc_t *rpc, ib_portid_t *dport, > + ib_rmpp_hdr_t *rmpp, void *data); > +MAD_EXPORT void * mad_receive(void *umad, int timeout); > +MAD_EXPORT int mad_respond(void *umad, ib_portid_t *portid, uint32_t rstatus); > +MAD_EXPORT void * mad_alloc(void); > +MAD_EXPORT void mad_free(void *umad); > > /* vendor.c */ > -uint8_t *ib_vendor_call(void *data, ib_portid_t *portid, > - ib_vendor_call_t *call); > +MAD_EXPORT uint8_t *ib_vendor_call(void *data, ib_portid_t *portid, > + ib_vendor_call_t *call); > > static inline int > mad_is_vendor_range1(int mgmt) > @@ -718,38 +698,41 @@ > } > > static inline int > -mad_is_vendor_range2(int mgmt) > +mad_is_vendor_range2(int mgmt) > { > return mgmt >= 0x30 && mgmt <= 0x4f; > } > > /* rpc.c */ > -int madrpc_portid(void); > -int madrpc_set_retries(int retries); > -int madrpc_set_timeout(int timeout); > -void * madrpc(ib_rpc_t *rpc, ib_portid_t *dport, void *payload, void *rcvdata); > -void * madrpc_rmpp(ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, > +MAD_EXPORT int madrpc_portid(void); > +MAD_EXPORT int madrpc_set_retries(int retries); > +MAD_EXPORT int madrpc_set_timeout(int timeout); > +MAD_EXPORT void madrpc_init(char *dev_name, int dev_port, > + int *mgmt_classes, int num_classes); > +MAD_EXPORT void madrpc_show_errors(int set); > +void * madrpc(ib_rpc_t *rpc, ib_portid_t *dport, > + void *payload, void *rcvdata); > +void * madrpc_rmpp(ib_rpc_t *rpc, ib_portid_t *dport, > + ib_rmpp_hdr_t *rmpp, void *data); > + > +void madrpc_save_mad(void *madbuf, int len); > +void madrpc_lock(void); > +void madrpc_unlock(void); > +void * mad_rpc_open_port(char *dev_name, int dev_port, > + int *mgmt_classes, int num_classes); > +void mad_rpc_close_port(void *ibmad_port); > +void * mad_rpc(const void *ibmad_port, ib_rpc_t *rpc, > + ib_portid_t *dport, void *payload, > + void *rcvdata); > +void * mad_rpc_rmpp(const void *ibmad_port, ib_rpc_t *rpc, > + ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, > void *data); > -void madrpc_init(char *dev_name, int dev_port, int *mgmt_classes, > - int num_classes); > -void madrpc_save_mad(void *madbuf, int len); > -void madrpc_lock(void); > -void madrpc_unlock(void); > -void madrpc_show_errors(int set); > - > -void * mad_rpc_open_port(char *dev_name, int dev_port, int *mgmt_classes, > - int num_classes); > -void mad_rpc_close_port(void *ibmad_port); > -void * mad_rpc(const void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport, > - void *payload, void *rcvdata); > -void * mad_rpc_rmpp(const void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport, > - ib_rmpp_hdr_t *rmpp, void *data); > > /* smp.c */ > -uint8_t * smp_query(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, > - unsigned timeout); > -uint8_t * smp_set(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, > - unsigned timeout); > +MAD_EXPORT uint8_t * smp_query(void *buf, ib_portid_t *id, unsigned attrid, > + unsigned mod, unsigned timeout); > +MAD_EXPORT uint8_t * smp_set(void *buf, ib_portid_t *id, unsigned attrid, > + unsigned mod, unsigned timeout); > uint8_t * smp_query_via(void *buf, ib_portid_t *id, unsigned attrid, > unsigned mod, unsigned timeout, const void *srcport); > uint8_t * smp_set_via(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, > @@ -786,9 +769,9 @@ > unsigned timeout); > uint8_t * sa_rpc_call(const void *ibmad_port, void *rcvbuf, ib_portid_t *portid, > ib_sa_call_t *sa, unsigned timeout); > -int ib_path_query(ibmad_gid_t srcgid, ibmad_gid_t destgid, ib_portid_t *sm_id, > - void *buf); /* returns lid */ > -int ib_path_query_via(const void *srcport, ibmad_gid_t srcgid, > +MAD_EXPORT int ib_path_query(ibmad_gid_t srcgid, ibmad_gid_t destgid, > + ib_portid_t *sm_id, void *buf); /* returns lid */ > +int ib_path_query_via(const void *srcport, ibmad_gid_t srcgid, > ibmad_gid_t destgid, ib_portid_t *sm_id, void *buf); > > inline static uint8_t * > @@ -805,38 +788,38 @@ > } > > /* resolve.c */ > -int ib_resolve_smlid(ib_portid_t *sm_id, int timeout); > -int ib_resolve_guid(ib_portid_t *portid, uint64_t *guid, > - ib_portid_t *sm_id, int timeout); > -int ib_resolve_portid_str(ib_portid_t *portid, char *addr_str, > - int dest_type, ib_portid_t *sm_id); > -int ib_resolve_self(ib_portid_t *portid, int *portnum, ibmad_gid_t *gid); > - > -int ib_resolve_smlid_via(ib_portid_t *sm_id, int timeout, > - const void *srcport); > -int ib_resolve_guid_via(ib_portid_t *portid, uint64_t *guid, > - ib_portid_t *sm_id, int timeout, > - const void *srcport); > -int ib_resolve_portid_str_via(ib_portid_t *portid, char *addr_str, > - int dest_type, ib_portid_t *sm_id, > - const void *srcport); > -int ib_resolve_self_via(ib_portid_t *portid, int *portnum, ibmad_gid_t *gid, > - const void *srcport); > +MAD_EXPORT int ib_resolve_smlid(ib_portid_t *sm_id, int timeout); > +MAD_EXPORT int ib_resolve_guid(ib_portid_t *portid, uint64_t *guid, > + ib_portid_t *sm_id, int timeout); > +MAD_EXPORT int ib_resolve_portid_str(ib_portid_t *portid, char *addr_str, > + int dest_type, ib_portid_t *sm_id); > +MAD_EXPORT int ib_resolve_self(ib_portid_t *portid, int *portnum, > + ibmad_gid_t *gid); > +int ib_resolve_smlid_via(ib_portid_t *sm_id, int timeout, > + const void *srcport); > +int ib_resolve_guid_via(ib_portid_t *portid, uint64_t *guid, > + ib_portid_t *sm_id, int timeout, > + const void *srcport); > +int ib_resolve_portid_str_via(ib_portid_t *portid, char *addr_str, > + int dest_type, ib_portid_t *sm_id, > + const void *srcport); > +int ib_resolve_self_via(ib_portid_t *portid, int *portnum, ibmad_gid_t *gid, > + const void *srcport); > > /* gs.c */ > -uint8_t *perf_classportinfo_query(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *perf_classportinfo_query(void *rcvbuf, ib_portid_t *dest, int port, > unsigned timeout); > -uint8_t *port_performance_query(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_performance_query(void *rcvbuf, ib_portid_t *dest, int port, > unsigned timeout); > -uint8_t *port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, > unsigned mask, unsigned timeout); > -uint8_t *port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port, > unsigned timeout); > -uint8_t *port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port, > unsigned mask, unsigned timeout); > -uint8_t *port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port, > unsigned timeout); > -uint8_t *port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port, > unsigned timeout); > > uint8_t *perf_classportinfo_query_via(void *rcvbuf, ib_portid_t *dest, int port, > @@ -855,7 +838,7 @@ > unsigned timeout, const void *srcport); > /* dump.c */ > ib_mad_dump_fn > - mad_dump_int, mad_dump_uint, mad_dump_hex, mad_dump_rhex, > + MAD_EXPORT mad_dump_int, mad_dump_uint, mad_dump_hex, mad_dump_rhex, > mad_dump_bitfield, mad_dump_array, mad_dump_string, > mad_dump_linkwidth, mad_dump_linkwidthsup, mad_dump_linkwidthen, > mad_dump_linkdowndefstate, Is this a mistake? Should it be: MAD_EXPORT ib_mad_dump_fn ... Ira > @@ -900,6 +883,34 @@ > > extern int ibdebug; > > +/* remove libibcommon dependencies, use complib */ > + > +/* dump.c */ > +MAD_EXPORT void xdump(FILE *file, char *msg, void *p, int size); > + > +/** printf style debugging MACRO's, map to cl_msg_out */ > +#if !defined(IBWARN) > +#define IBWARN(fmt, ...) cl_msg_out(fmt, ## __VA_ARGS__) > +#endif > +#if !defined(IBPANIC) > +#define IBPANIC(fmt, ...) \ > +{ \ > + cl_msg_out(fmt, ## __VA_ARGS__); \ > + CL_ASSERT(0); \ > +} > +#endif > + > +/** align value \a l to \a size (ceil) */ > +#if !defined(ALIGN) > +#define ALIGN(l, size) (((l) + ((size) - 1)) / (size) * (size)) > +#endif > + > +/** align value \a l to \a sizeof 32 bit int (ceil) */ > +#if !defined(ALIGN32) > +#define ALIGN32(l) (ALIGN((l), sizeof(uint32))) > +#endif > + > + > END_C_DECLS > > #endif /* _MAD_H_ */ > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > From weiny2 at llnl.gov Thu Dec 18 17:22:01 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 18 Dec 2008 17:22:01 -0800 Subject: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: References: Message-ID: <20081218172201.3a7bddae.weiny2@llnl.gov> On Thu, 18 Dec 2008 15:38:06 -0800 "Davis, Arlin R" wrote: > > Patch 2/6 - dump.c > > Signed-off by: Arlin Davis > > diff -aur libibmad-1.2.2/src/dump.c libibmad/src/dump.c > --- libibmad-1.2.2/src/dump.c 2008-10-19 11:34:41.000000000 -0700 > +++ libibmad/src/dump.c 2008-12-17 17:02:40.947163656 -0800 > @@ -38,15 +38,51 @@ > > #include > #include > -#include > #include > + > +#if defined(_WIN32) || defined(_WIN64) > +#include > +#include > +#define snprintf _snprintf > +#else > +#include > +#include > #include > #include > +#endif > > #include > -#include > +#include > + > +MAD_EXPORT void > +xdump(FILE *file, char *msg, void *p, int size) > +{ > +#define HEX(x) ((x) < 10 ? '0' + (x) : 'a' + ((x) -10)) > + uint8_t *cp = p; > + int i; > + > + if (msg) > + fputs(msg, file); > + > + for (i = 0; i < size;) { > + fputc(HEX(*cp >> 4), file); > + fputc(HEX(*cp & 0xf), file); > + if (++i >= size) > + break; > + fputc(HEX(cp[1] >> 4), file); > + fputc(HEX(cp[1] & 0xf), file); > + if ((++i) % 16) > + fputc(' ', file); > + else > + fputc('\n', file); > + cp += 2; > + } > + if (i % 16) { > + fputc('\n', file); > + } > +} > Where is xdump used? Ira > -void > +MAD_EXPORT void > mad_dump_int(char *buf, int bufsz, void *val, int valsz) > { > switch (valsz) { > @@ -72,7 +108,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_uint(char *buf, int bufsz, void *val, int valsz) > { > switch (valsz) { > @@ -98,7 +134,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_hex(char *buf, int bufsz, void *val, int valsz) > { > switch (valsz) { > @@ -115,13 +151,13 @@ > snprintf(buf, bufsz, "0x%08x", *(uint32_t *)val); > break; > case 5: > - snprintf(buf, bufsz, "0x%010" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffllu); > + snprintf(buf, bufsz, "0x%010" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffULL); > break; > case 6: > - snprintf(buf, bufsz, "0x%012" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffllu); > + snprintf(buf, bufsz, "0x%012" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffULL); > break; > case 7: > - snprintf(buf, bufsz, "0x%014" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffffllu); > + snprintf(buf, bufsz, "0x%014" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffffULL); > break; > case 8: > snprintf(buf, bufsz, "0x%016" PRIx64, *(uint64_t *)val); > @@ -132,7 +168,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_rhex(char *buf, int bufsz, void *val, int valsz) > { > switch (valsz) { > @@ -149,13 +185,13 @@ > snprintf(buf, bufsz, "%08x", *(uint32_t *)val); > break; > case 5: > - snprintf(buf, bufsz, "%010" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffllu); > + snprintf(buf, bufsz, "%010" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffULL); > break; > case 6: > - snprintf(buf, bufsz, "%012" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffllu); > + snprintf(buf, bufsz, "%012" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffULL); > break; > case 7: > - snprintf(buf, bufsz, "%014" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffffllu); > + snprintf(buf, bufsz, "%014" PRIx64, *(uint64_t *)val & (uint64_t) 0xffffffffffffffULL); > break; > case 8: > snprintf(buf, bufsz, "%016" PRIx64, *(uint64_t *)val); > @@ -166,7 +202,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_linkwidth(char *buf, int bufsz, void *val, int valsz) > { > int width = *(int *)val; > @@ -212,7 +248,7 @@ > buf[n-4] = '\0'; > } > > -void > +MAD_EXPORT void > mad_dump_linkwidthsup(char *buf, int bufsz, void *val, int valsz) > { > int width = *(int *)val; > @@ -235,7 +271,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_linkwidthen(char *buf, int bufsz, void *val, int valsz) > { > int width = *(int *)val; > @@ -243,7 +279,7 @@ > dump_linkwidth(buf, bufsz, width); > } > > -void > +MAD_EXPORT void > mad_dump_linkspeed(char *buf, int bufsz, void *val, int valsz) > { > int speed = *(int *)val; > @@ -300,7 +336,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_linkspeedsup(char *buf, int bufsz, void *val, int valsz) > { > int speed = *(int *)val; > @@ -308,7 +344,7 @@ > dump_linkspeed(buf, bufsz, speed); > } > > -void > +MAD_EXPORT void > mad_dump_linkspeeden(char *buf, int bufsz, void *val, int valsz) > { > int speed = *(int *)val; > @@ -316,7 +352,7 @@ > dump_linkspeed(buf, bufsz, speed); > } > > -void > +MAD_EXPORT void > mad_dump_portstate(char *buf, int bufsz, void *val, int valsz) > { > int state = *(int *)val; > @@ -342,7 +378,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_linkdowndefstate(char *buf, int bufsz, void *val, int valsz) > { > int state = *(int *)val; > @@ -363,7 +399,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_physportstate(char *buf, int bufsz, void *val, int valsz) > { > int state = *(int *)val; > @@ -398,7 +434,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_mtu(char *buf, int bufsz, void *val, int valsz) > { > int mtu = *(int *)val; > @@ -425,7 +461,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_vlcap(char *buf, int bufsz, void *val, int valsz) > { > int vlcap = *(int *)val; > @@ -451,7 +487,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_opervls(char *buf, int bufsz, void *val, int valsz) > { > int opervls = *(int *)val; > @@ -480,7 +516,7 @@ > } > } > > -void > +MAD_EXPORT void > mad_dump_portcapmask(char *buf, int bufsz, void *val, int valsz) > { > unsigned mask = *(unsigned *)val; > @@ -534,13 +570,13 @@ > *(--s) = 0; > } > > -void > +MAD_EXPORT void > mad_dump_bitfield(char *buf, int bufsz, void *val, int valsz) > { > snprintf(buf, bufsz, "0x%x", *(uint32_t *)val); > } > > -void > +MAD_EXPORT void > mad_dump_array(char *buf, int bufsz, void *val, int valsz) > { > uint8_t *p = val, *e; > @@ -553,7 +589,7 @@ > sprintf(s, "%02x", *p); > } > > -void > +MAD_EXPORT void > mad_dump_string(char *buf, int bufsz, void *val, int valsz) > { > if (bufsz < valsz) > @@ -562,7 +598,7 @@ > snprintf(buf, valsz, "'%s'", (char *)val); > } > > -void > +MAD_EXPORT void > mad_dump_node_type(char *buf, int bufsz, void *val, int valsz) > { > int nodetype = *(int*)val; > @@ -603,7 +639,7 @@ > uint8_t res_vl; > uint8_t weight; > } vl_entry[IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK]; > -} __attribute__((packed)) ib_vl_arb_table_t; > +} ib_vl_arb_table_t; > > static inline void > ib_vl_arb_get_vl(uint8_t res_vl, uint8_t *const vl ) > @@ -611,7 +647,7 @@ > *vl = res_vl & 0x0F; > } > > -void > +MAD_EXPORT void > mad_dump_sltovl(char *buf, int bufsz, void *val, int valsz) > { > ib_slvl_table_t* p_slvl_tbl = val; > @@ -627,11 +663,11 @@ > snprintf(buf + n, bufsz - n, "\n"); > } > > -void > +MAD_EXPORT void > mad_dump_vlarbitration(char *buf, int bufsz, void *val, int num) > { > ib_vl_arb_table_t* p_vla_tbl = val; > - unsigned i, n; > + int i, n; > uint8_t vl; > > num /= sizeof(p_vla_tbl->vl_entry[0]); > @@ -678,10 +714,10 @@ > bufsz -= n; > } > > - return s - buf; > + return (int)(s - buf); > } > > -void > +MAD_EXPORT void > mad_dump_nodedesc(char *buf, int bufsz, void *val, int valsz) > { > strncpy(buf, val, bufsz); > @@ -690,37 +726,37 @@ > buf[valsz] = 0; > } > > -void > +MAD_EXPORT void > mad_dump_nodeinfo(char *buf, int bufsz, void *val, int valsz) > { > _dump_fields(buf, bufsz, val, IB_NODE_FIRST_F, IB_NODE_LAST_F); > } > > -void > +MAD_EXPORT void > mad_dump_portinfo(char *buf, int bufsz, void *val, int valsz) > { > _dump_fields(buf, bufsz, val, IB_PORT_FIRST_F, IB_PORT_LAST_F); > } > > -void > +MAD_EXPORT void > mad_dump_portstates(char *buf, int bufsz, void *val, int valsz) > { > _dump_fields(buf, bufsz, val, IB_PORT_STATE_F, IB_PORT_LINK_DOWN_DEF_F); > } > > -void > +MAD_EXPORT void > mad_dump_switchinfo(char *buf, int bufsz, void *val, int valsz) > { > _dump_fields(buf, bufsz, val, IB_SW_FIRST_F, IB_SW_LAST_F); > } > > -void > +MAD_EXPORT void > mad_dump_perfcounters(char *buf, int bufsz, void *val, int valsz) > { > _dump_fields(buf, bufsz, val, IB_PC_FIRST_F, IB_PC_LAST_F); > } > > -void > +MAD_EXPORT void > mad_dump_perfcounters_ext(char *buf, int bufsz, void *val, int valsz) > { > _dump_fields(buf, bufsz, val, IB_PC_EXT_FIRST_F, IB_PC_EXT_LAST_F); > @@ -765,9 +801,13 @@ > int > _mad_dump(ib_mad_dump_fn *fn, char *name, void *val, int valsz) > { > - ib_field_t f = { .def_dump_fn = fn, .bitlen = valsz * 8}; > + ib_field_t f; > char buf[512]; > > + memset(&f, 0, sizeof(f)); > + f.def_dump_fn = fn; > + f.bitlen = valsz * 8; > + > return printf("%s\n", _mad_dump_field(&f, name, buf, sizeof buf, val)); > } > > @@ -776,3 +816,4 @@ > { > return _mad_dump(f->def_dump_fn, name ? name : f->name, val, valsz ? valsz : ALIGN(f->bitlen, 8) / 8); > } > + > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > From weiny2 at llnl.gov Thu Dec 18 17:22:13 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 18 Dec 2008 17:22:13 -0800 Subject: [ofa-general] PATCH[6/6] Windows port of libibmad - dirs, src/Sources, src/ibmad_export.def, src/ibmad_exports.src, ibmad_main.cpp In-Reply-To: References: Message-ID: <20081218172213.41c4eb1a.weiny2@llnl.gov> Would it be better to put these in a subdir "win" (or pick some name) instead of "src"? If you did that could you also store a script which takes libibmad.map and creates your ibmad_exports.src file? This would be easier for you if we change the map file. Ira On Thu, 18 Dec 2008 15:38:55 -0800 "Davis, Arlin R" wrote: > > 6/6 - new files for windows: dirs, src/Sources, src/ibmad_export.def, src/ibmad_exports.src, ibmad_main.cpp > > Signed-off by: Arlin Davis > > diff -Naur libibmad-1.2.2/dirs libibmad/dirs > --- libibmad-1.2.2/dirs 1969-12-31 16:00:00.000000000 -0800 > +++ libibmad/dirs 2008-07-08 10:28:28.000000000 -0700 > @@ -0,0 +1,2 @@ > +DIRS = \ > + src > \ No newline at end of file > diff -Naur libibmad-1.2.2/src/ibmad_export.def libibmad/src/ibmad_export.def > --- libibmad-1.2.2/src/ibmad_export.def 1969-12-31 16:00:00.000000000 -0800 > +++ libibmad/src/ibmad_export.def 2008-11-21 11:21:22.000000000 -0800 > @@ -0,0 +1,34 @@ > +/* > + * Copyright (c) 2008 Intel Corporation. All rights reserved. > + * > + * This software is available to you under the OpenIB.org BSD license > + * below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AWV > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + */ > + > +LIBRARY LIBIBMAD.DLL > + > +EXPORTS > + DllCanUnloadNow PRIVATE > + DllGetClassObject PRIVATE > diff -Naur libibmad-1.2.2/src/ibmad_exports.src libibmad/src/ibmad_exports.src > --- libibmad-1.2.2/src/ibmad_exports.src 1969-12-31 16:00:00.000000000 -0800 > +++ libibmad/src/ibmad_exports.src 2008-12-12 16:53:52.000000000 -0800 > @@ -0,0 +1,54 @@ > +#if DBG > +LIBRARY libibmadd.dll > +#else > +LIBRARY libibmad.dll > +#endif > + > +#ifndef _WIN64 > +EXPORTS > + mad_set_field; > + mad_get_field; > + mad_set_array; > + mad_get_array; > + mad_set_field64; > + mad_get_field64; > + mad_decode_field; > + mad_encode_field; > + mad_encode; > + mad_trid; > + mad_build_pkt; > + mad_register_port_client; > + mad_register_client; > + mad_register_server; > + mad_class_agent; > + mad_agent_class; > + mad_send; > + mad_receive; > + mad_respond; > + mad_alloc; > + mad_free; > + madrpc_portid; > + madrpc_init; > + madrpc_set_retries; > + madrpc_set_timeout; > + madrpc_show_errors; > + smp_query; > + smp_set; > + ib_vendor_call; > + ib_path_query; > + ib_resolve_smlid; > + ib_resolve_guid; > + ib_resolve_portid_str; > + ib_resolve_self; > + perf_classportinfo_query; > + port_performance_query; > + port_performance_reset; > + port_performance_ext_query; > + port_performance_ext_reset; > + port_samples_control_query; > + port_samples_result_query; > + portid2str; > + portid2portnum; > + str2drpath; > + drpath2str; > +#endif > diff -Naur libibmad-1.2.2/src/ibmad_main.cpp libibmad/src/ibmad_main.cpp > --- libibmad-1.2.2/src/ibmad_main.cpp 1969-12-31 16:00:00.000000000 -0800 > +++ libibmad/src/ibmad_main.cpp 2008-07-08 10:28:28.000000000 -0700 > @@ -0,0 +1,39 @@ > +/* > + * Copyright (c) 2008 Intel Corporation. All rights reserved. > + * > + * This software is available to you under the OpenIB.org BSD license > + * below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AWV > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + */ > + > +#include > + > +BOOLEAN WINAPI DllMain(HINSTANCE hInstance, DWORD dwReason, LPVOID lpReserved) > +{ > + UNREFERENCED_PARAMETER(hInstance); > + UNREFERENCED_PARAMETER(dwReason); > + UNREFERENCED_PARAMETER(lpReserved); > + > + return TRUE; > +} > diff -Naur libibmad-1.2.2/src/Sources libibmad/src/Sources > --- libibmad-1.2.2/src/Sources 1969-12-31 16:00:00.000000000 -0800 > +++ libibmad/src/Sources 2008-12-03 09:59:42.000000000 -0800 > @@ -0,0 +1,47 @@ > +!if $(FREEBUILD) > +TARGETNAME = libibmad > +!else > +TARGETNAME = libibmadd > +!endif > + > +TARGETPATH = ..\..\..\bin\user\obj$(BUILD_ALT_DIR) > +TARGETTYPE = DYNLINK > + > +DLLDEF = $(OBJ_PATH)\$O\ibmad_exports.def > + > +DLLENTRY = DllMain > +USE_MSVCRT=1 > + > +SOURCES = \ > + ibmad_main.cpp \ > + dump.c \ > + fields.c \ > + gs.c \ > + mad.c \ > + portid.c \ > + register.c \ > + resolve.c \ > + rpc.c \ > + sa.c \ > + serv.c \ > + smp.c \ > + vendor.c > + > +INCLUDES = ..\include\infiniband;..\..\libibverbs\include;..\..\libibumad\include;..\..\..\inc;..\..\..\inc\user; > + > +USER_C_FLAGS = $(USER_C_FLAGS) -DEXPORT_IBMAD_SYMBOLS > + > +TARGETLIBS = \ > + $(SDK_LIB_PATH)\kernel32.lib \ > + $(SDK_LIB_PATH)\uuid.lib \ > + $(SDK_LIB_PATH)\ws2_32.lib \ > + $(SDK_LIB_PATH)\advapi32.lib \ > + $(SDK_LIB_PATH)\user32.lib \ > + $(SDK_LIB_PATH)\ole32.lib \ > +!if $(FREEBUILD) > + $(TARGETPATH)\*\complib.lib \ > + $(TARGETPATH)\*\libibumad.lib > +!else > + $(TARGETPATH)\*\complibd.lib \ > + $(TARGETPATH)\*\libibumadd.lib > +!endif > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http:// lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http:// openib.org/mailman/listinfo/openib-general > From sean.hefty at intel.com Thu Dec 18 17:26:50 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 Dec 2008 17:26:50 -0800 Subject: [ofw] Re: [ofa-general] PATCH[1/6] Windows port of libibmad - mad.h In-Reply-To: <20081218172153.37bb18ed.weiny2@llnl.gov> References: <20081218172153.37bb18ed.weiny2@llnl.gov> Message-ID: <000401c96178$ddf99160$435a180a@amr.corp.intel.com> >Could this be put in the cl_ headers somewhere and be called CL_EXPORT >or CL_EXTERN (my preference if this is going to cause functions on the >Linux side to be declared extern)? That's a good point, there's already a CL_EXPORT that's defined. - Sean From sean.hefty at intel.com Thu Dec 18 17:29:53 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 Dec 2008 17:29:53 -0800 Subject: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: <20081218172201.3a7bddae.weiny2@llnl.gov> References: <20081218172201.3a7bddae.weiny2@llnl.gov> Message-ID: <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> >Where is xdump used? dump.c, rpc.c, and serv.c call it. It looks like it was a call implemented by libibcommon, and I think Arlin's patches remove libibcommon from being used by libibmad or the diags. - Sean From sean.hefty at intel.com Thu Dec 18 17:49:10 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 Dec 2008 17:49:10 -0800 Subject: [ofw] Re: [ofa-general] PATCH[6/6] Windows port of libibmad - dirs, src/Sources, src/ibmad_export.def, src/ibmad_exports.src, ibmad_main.cpp In-Reply-To: <20081218172213.41c4eb1a.weiny2@llnl.gov> References: <20081218172213.41c4eb1a.weiny2@llnl.gov> Message-ID: <000601c9617b$fcc6abc0$435a180a@amr.corp.intel.com> >Would it be better to put these in a subdir "win" (or pick some name) instead >of "src"? I'm not sure what to do with the build files. What we've been doing to test is dropping the libibmad directory into the Windows source tree (under the 'ulp' directory). We then 'libibmad' to the 'ulp/dirs' file. The libibmad/dirs includes the 'src' directory, which contains the 'sources' file that actually builds. It does not appear that dirs files are smart enough to handle paths, so the ulp/dirs file is unable to reference anything like libibmad/src/win. :( And unfortunately with the diags it gets a little worse. There can only be one sources file per directory, which builds a single library or executable. So, we end up with src/sminfo, src/perftest, etc. one directory for each executable that we want to build. But, at least the sources files understand paths, so the .c files do not relocate. >If you did that could you also store a script which takes libibmad.map and >creates your ibmad_exports.src file? This would be easier for you if we change >the map file. Using scripts (should I call it a batch file?) to configure the build for Windows is worth looking into, and may be the best solution. - Sean From weiny2 at llnl.gov Thu Dec 18 18:25:04 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Thu, 18 Dec 2008 18:25:04 -0800 Subject: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> Message-ID: <20081218182504.47edc451.weiny2@llnl.gov> On Thu, 18 Dec 2008 17:29:53 -0800 "Sean Hefty" wrote: > >Where is xdump used? > > dump.c, rpc.c, and serv.c call it. > > It looks like it was a call implemented by libibcommon, and I think Arlin's > patches remove libibcommon from being used by libibmad or the diags. > Did he reimplement IBWARN, IBPANIC, and all the sys_read_* functions? I think those come from ibcommon and are used by the diags. I might have missed this, Ira From arlin.r.davis at intel.com Thu Dec 18 18:55:30 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 18 Dec 2008 18:55:30 -0800 Subject: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: <20081218182504.47edc451.weiny2@llnl.gov> References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> Message-ID: >> >Where is xdump used? >> >> dump.c, rpc.c, and serv.c call it. >> >> It looks like it was a call implemented by libibcommon, and >I think Arlin's >> patches remove libibcommon from being used by libibmad or the diags. >> > >Did he reimplement IBWARN, IBPANIC, and all the sys_read_* >functions? I think >those come from ibcommon and are used by the diags. Yes, I did reimplement IBWARN and IBPANIC but the sys_read_* was not used by anything in libibmad or infiniband_diags so I am holding off on that. If we just need gid or guid I would rather just pick that up with verbs for portability. -arlin From sean.hefty at intel.com Thu Dec 18 19:00:00 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 18 Dec 2008 19:00:00 -0800 Subject: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: <20081218182504.47edc451.weiny2@llnl.gov> References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> Message-ID: <000701c96185$e1c68c00$435a180a@amr.corp.intel.com> >Did he reimplement IBWARN, IBPANIC, and all the sys_read_* functions? I think >those come from ibcommon and are used by the diags. He added this to mad.h: #if !defined(IBWARN) #define IBWARN(fmt, ...) cl_msg_out(fmt, ## __VA_ARGS__) #endif #if !defined(IBPANIC) #define IBPANIC(fmt, ...) \ { \ cl_msg_out(fmt, ## __VA_ARGS__); \ CL_ASSERT(0); \ } #endif This allows libibcommon to exist on Linux, but eliminates it for Windows - for the diags. (The diags don't use sys_read_*.) I think this implementation requires a specific include order of header files though. - Sean From arlin.r.davis at intel.com Thu Dec 18 19:01:50 2008 From: arlin.r.davis at intel.com (Davis, Arlin R) Date: Thu, 18 Dec 2008 19:01:50 -0800 Subject: [ofa-general] PATCH[1/6] Windows port of libibmad - mad.h In-Reply-To: <20081218172153.37bb18ed.weiny2@llnl.gov> References: <20081218172153.37bb18ed.weiny2@llnl.gov> Message-ID: >> /* dump.c */ >> ib_mad_dump_fn >> - mad_dump_int, mad_dump_uint, mad_dump_hex, mad_dump_rhex, >> + MAD_EXPORT mad_dump_int, mad_dump_uint, mad_dump_hex, >mad_dump_rhex, >> mad_dump_bitfield, mad_dump_array, mad_dump_string, >> mad_dump_linkwidth, mad_dump_linkwidthsup, mad_dump_linkwidthen, >> mad_dump_linkdowndefstate, > >Is this a mistake? Should it be: > >MAD_EXPORT ib_mad_dump_fn Yes, good catch. From nicolas.morey-chaisemartin at ext.bull.net Fri Dec 19 01:18:39 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Fri, 19 Dec 2008 10:18:39 +0100 Subject: [ofa-general] [PATCH][OpenSM] Added documentation for io_guid_file and max_reverse_hop feature In-Reply-To: <494A5339.9030304@ext.bull.net> References: <494A5339.9030304@ext.bull.net> Message-ID: <494B66EF.2050802@ext.bull.net> Added entries for io_guid_file and max_reverse_hop in opensm man page and in opensm/doc/current-routing.txt Signed-off-by: Nicolas Morey-Chaisemartin --- opensm/doc/current-routing.txt | 32 ++++++++++++++++++++++++++++++++ opensm/man/opensm.8.in | 27 +++++++++++++++++++++++++++ 2 files changed, 59 insertions(+), 0 deletions(-) -------------- next part -------------- A non-text attachment was scrubbed... Name: 07c5207945a86cb933484db0739380ac3bac5610.diff Type: text/x-patch Size: 4826 bytes Desc: not available URL: From vlad at lists.openfabrics.org Fri Dec 19 03:25:06 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 19 Dec 2008 03:25:06 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081219-0200 daily build status Message-ID: <20081219112506.541BCE60F11@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From hal.rosenstock at gmail.com Fri Dec 19 10:13:49 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 19 Dec 2008 13:13:49 -0500 Subject: [ofw] RE: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> Message-ID: On Thu, Dec 18, 2008 at 8:29 PM, Sean Hefty wrote: >>Where is xdump used? > > dump.c, rpc.c, and serv.c call it. > > It looks like it was a call implemented by libibcommon, and I think Arlin's > patches remove libibcommon from being used by libibmad or the diags. libibmad (rpc.c and serv.c) call xdump. dump.c is where xdump is implemented for Windows. It's also used in some of the diags (ibroute and smpquery) but maybe these are not ported as yet. -- Hal > - Sean > > _______________________________________________ > ofw mailing list > ofw at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw > From alex.estrin at qlogic.com Fri Dec 19 10:14:49 2008 From: alex.estrin at qlogic.com (Alex Estrin) Date: Fri, 19 Dec 2008 12:14:49 -0600 Subject: [ofa-general] [ipoib]patch for ipoib failure during startup with non-default pkey set. Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB3E746246B9@MNEXMB1.qlogic.org> Proposed patch allows ipoib interface to pickup correct value from pkey-table first entry and update broadcast mgid before it start joining multicast groups. Please review. Signed-off-by: Alex Estrin diff --git a/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 784c291..459e2b9 100644 --- a/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -719,7 +719,25 @@ int ipoib_ib_dev_open(struct net_device *dev) static void ipoib_pkey_dev_check_presence(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_port_attr port_attr; u16 pkey_index = 0; + + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { + + clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); + if (ib_query_port(priv->ca, priv->port, &port_attr)) { + ipoib_warn(priv, "Query port attrs failed\n"); + return; + } + if (port_attr.state != IB_PORT_ACTIVE) { + return; + } + if (ib_query_pkey(priv->ca, priv->port, 0, &priv->pkey)) { + ipoib_warn(priv, "Query P_Key table entry 0 failed\n"); + return; + } + set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); + } if (ib_find_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); diff --git a/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 016a057..4d270e2 100644 --- a/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -556,6 +556,13 @@ void ipoib_mcast_join_task(struct work_struct *work) } spin_lock_irq(&priv->lock); + + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { + /* fix broadcast gid in case if pkey was changed */ + priv->pkey |= 0x8000; + priv->dev->broadcast[8] = priv->pkey >> 8; + priv->dev->broadcast[9] = priv->pkey & 0xff; + } memcpy(broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4, sizeof (union ib_gid)); priv->broadcast = broadcast; -------------- next part -------------- A non-text attachment was scrubbed... Name: ipoib_pkey_bootup_race.patch Type: application/octet-stream Size: 1926 bytes Desc: ipoib_pkey_bootup_race.patch URL: From hal.rosenstock at gmail.com Fri Dec 19 10:16:21 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 19 Dec 2008 13:16:21 -0500 Subject: [ofw] RE: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: <000701c96185$e1c68c00$435a180a@amr.corp.intel.com> References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> <000701c96185$e1c68c00$435a180a@amr.corp.intel.com> Message-ID: On Thu, Dec 18, 2008 at 10:00 PM, Sean Hefty wrote: >>Did he reimplement IBWARN, IBPANIC, and all the sys_read_* functions? I think >>those come from ibcommon and are used by the diags. > > He added this to mad.h: > > #if !defined(IBWARN) > #define IBWARN(fmt, ...) cl_msg_out(fmt, ## __VA_ARGS__) > #endif > #if !defined(IBPANIC) > #define IBPANIC(fmt, ...) \ > { \ > cl_msg_out(fmt, ## __VA_ARGS__); \ > CL_ASSERT(0); \ > } > #endif > > This allows libibcommon to exist on Linux, Sasha has written some comments on the general list indicating that libibcommon may disappear in the future. It could be combined with libibumad (as libibmad (and diags) both use it. -- Hal > but eliminates it for Windows - for > the diags. (The diags don't use sys_read_*.) I think this implementation > requires a specific include order of header files though. > > - Sean > > _______________________________________________ > ofw mailing list > ofw at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw > From hal.rosenstock at gmail.com Fri Dec 19 10:18:23 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 19 Dec 2008 13:18:23 -0500 Subject: [ofw] RE: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> Message-ID: On Fri, Dec 19, 2008 at 1:13 PM, Hal Rosenstock wrote: > On Thu, Dec 18, 2008 at 8:29 PM, Sean Hefty wrote: >>>Where is xdump used? >> >> dump.c, rpc.c, and serv.c call it. >> >> It looks like it was a call implemented by libibcommon, and I think Arlin's >> patches remove libibcommon from being used by libibmad or the diags. > > libibmad (rpc.c and serv.c) call xdump. dump.c is where xdump is > implemented for Windows. It's also used in some of the diags (ibroute > and smpquery) but maybe these are not ported as yet. It was arbitrary putting xdump in libibcommon as the only in tree uses are in libibmad and the diags. Where xdump lives might best be consistent with the direction taken in Linux (if/when libibcommon is finally eliminated). -- Hal > -- Hal > >> - Sean >> >> _______________________________________________ >> ofw mailing list >> ofw at lists.openfabrics.org >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw >> > From sean.hefty at intel.com Fri Dec 19 10:24:16 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 19 Dec 2008 10:24:16 -0800 Subject: [ofw] RE: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> <000701c96185$e1c68c00$435a180a@amr.corp.intel.com> Message-ID: <000001c96207$00f55120$ae58180a@amr.corp.intel.com> >Sasha has written some comments on the general list indicating that >libibcommon may disappear in the future. It could be combined with >libibumad (as libibmad (and diags) both use it. I'd much rather see it integrated with libibmad than libibumad. libibumad is fairly self-contained as the interface into the kernel. - Sean From hal.rosenstock at gmail.com Fri Dec 19 10:26:31 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 19 Dec 2008 13:26:31 -0500 Subject: [ofw] RE: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> Message-ID: On Thu, Dec 18, 2008 at 9:55 PM, Davis, Arlin R wrote: > >>> >Where is xdump used? >>> >>> dump.c, rpc.c, and serv.c call it. >>> >>> It looks like it was a call implemented by libibcommon, and >>I think Arlin's >>> patches remove libibcommon from being used by libibmad or the diags. >>> >> >>Did he reimplement IBWARN, IBPANIC, and all the sys_read_* >>functions? I think >>those come from ibcommon and are used by the diags. > > Yes, I did reimplement IBWARN and IBPANIC but the sys_read_* was > not used by anything in libibmad or infiniband_diags You mean directly, right ? > so I am holding off on that. If we just need gid or guid It's used for much more than just getting the port guid. > I would rather just pick that up with verbs for portability. In Linux, management does not (currently) require verbs but that was a design and history choice. Much of it was done prior to verbs and there have been discussions on and off over time about using verbs but that bridge was never crossed. I think that most of the information needed is available by verbs. There is also some automatic port selection when user doesn't specify this. This is all off the top of my head. I'd need to review things again to see what else is done. -- Hal > -arlin > _______________________________________________ > ofw mailing list > ofw at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw > From hal.rosenstock at gmail.com Fri Dec 19 10:28:49 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 19 Dec 2008 13:28:49 -0500 Subject: [ofw] RE: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: <000001c96207$00f55120$ae58180a@amr.corp.intel.com> References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> <000701c96185$e1c68c00$435a180a@amr.corp.intel.com> <000001c96207$00f55120$ae58180a@amr.corp.intel.com> Message-ID: On Fri, Dec 19, 2008 at 1:24 PM, Sean Hefty wrote: >>Sasha has written some comments on the general list indicating that >>libibcommon may disappear in the future. It could be combined with >>libibumad (as libibmad (and diags) both use it. > > I'd much rather see it integrated with libibmad than libibumad. libibumad is > fairly self-contained as the interface into the kernel. That's fine as long as things that use libibumad don't now require libibmad as an additional dependency for something that was in libibcommon. I don't know if there's anything like that. xdump wouldn't cause that. I haven't looked at the rest of libibcommon and the implications. -- Hal > - Sean > > From sean.hefty at intel.com Fri Dec 19 10:37:51 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 19 Dec 2008 10:37:51 -0800 Subject: [ofw] RE: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> Message-ID: <000101c96208$e63a2340$ae58180a@amr.corp.intel.com> >> Yes, I did reimplement IBWARN and IBPANIC but the sys_read_* was >> not used by anything in libibmad or infiniband_diags > >You mean directly, right ? Yes. Nothing in the libibmad or infiniband_diags source code calls the sys_read_* functions. It's only used by libibumad. (Hmm... I take back what I said about merging libibcommon with libibumad then.) >In Linux, management does not (currently) require verbs but that was a >design and history choice. Much of it was done prior to verbs and >there have been discussions on and off over time about using verbs but >that bridge was never crossed. On Windows, libibumad uses libibverbs to obtain any information that it needs. Because libibumad uses a Windows specific MAD layer interface, the libibumad implementation does not share any code between Windows and Linux. So, if the sys_read_* calls were merged directly into libibumad on Linux and kept internal, then all problems with sys_read_* go away. - Sean From hal.rosenstock at gmail.com Fri Dec 19 10:56:56 2008 From: hal.rosenstock at gmail.com (Hal Rosenstock) Date: Fri, 19 Dec 2008 13:56:56 -0500 Subject: [ofa-general] Re: [ofw] RE: PATCH[1/6] Windows port of libibmad - mad.h In-Reply-To: <000101c9616e$2a602290$435a180a@amr.corp.intel.com> References: <000101c9616e$2a602290$435a180a@amr.corp.intel.com> Message-ID: On Thu, Dec 18, 2008 at 7:10 PM, Sean Hefty wrote: >>Port of libibmad to windows. Dependencies on libibumad port and complib (in >>lieu of libcommon). Removed dependency on libibcommon. >> >>Intent is to allow common mad code base for Windows and Linux to simplify >>maintainablity across OFED and WinOF. This patch set was built and tested on >>Windows and built on Linux (not tested yet). > > Thanks for posting this. Seeing the actual changes helps understand the impact > better. > > Note that the libib_u_mad implementations are not shared. Only the interface is > maintained to simplify porting. > > Looking at the changes, do the management developers think that it makes sense > to share the libibmad implementation, or should separate implementations be > maintained, similar to libibumad? If the implementations are not shared, can > the Linux side treat the API as an external interface, rather than a private > interface? > >>+#if defined(_WIN32) || defined(_WIN64) >>+#define MAD_EXPORT __declspec(dllexport) >>+#else >>+#define MAD_EXPORT extern > > I don't know that 'extern' is appropriate here. > >>+#endif >>+ >> #define IB_SUBNET_PATH_HOPS_MAX 64 >>-#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000llu >>+#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000ULL >> #define IB_DEFAULT_QP1_QKEY 0x80010000 >> >> #define IB_MAD_SIZE 256 >>@@ -620,10 +628,10 @@ >> >>/****************************************************************************** >>/ >> >> /* portid.c */ >>-char * portid2str(ib_portid_t *portid); >>-int portid2portnum(ib_portid_t *portid); >>-int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int >>drdlid); >>-char * drpath2str(ib_dr_path_t *path, char *dstr, size_t dstr_size); >>+MAD_EXPORT char * portid2str(ib_portid_t *portid); >>+MAD_EXPORT int portid2portnum(ib_portid_t *portid); >>+MAD_EXPORT int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int >>drdlid); >>+MAD_EXPORT char * drpath2str(ib_dr_path_t *path, char *dstr, size_t >>dstr_size); >> >> static inline int >> ib_portid_set(ib_portid_t *portid, int lid, int qp, int qkey) >>@@ -639,77 +647,49 @@ >> /* fields.c */ >> extern ib_field_t ib_mad_f[]; >> >>-void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); >>+void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); >> uint32_t _get_field(void *buf, int base_offs, ib_field_t *f); >>-void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); >>-void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); >>-void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); >>+void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); >>+void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); >>+void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); >> uint64_t _get_field64(void *buf, int base_offs, ib_field_t *f); > > Are these really the functions that should be exported from the library or in > the header file? (I'm probably missing some history here.) For one thing, mad.h currently inlines a number of functions used by (diag) applications which invoke these. I think it's largely historical; libibmad is the most non standard of the management libraries in terms of the conventions and a cleanup hasn't yet occurred. -- Hal > - Sean > > _______________________________________________ > ofw mailing list > ofw at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw > From sean.hefty at intel.com Fri Dec 19 11:00:04 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 19 Dec 2008 11:00:04 -0800 Subject: [ofa-general] RE: [ofw] RE: PATCH[1/6] Windows port of libibmad - mad.h In-Reply-To: References: <000101c9616e$2a602290$435a180a@amr.corp.intel.com> Message-ID: <000201c9620c$00cdb2a0$ae58180a@amr.corp.intel.com> >>>-void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); >>>+void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); >>> uint32_t _get_field(void *buf, int base_offs, ib_field_t *f); >>>-void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); >>>-void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); >>>-void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); >>>+void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); >>>+void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); >>>+void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); >>> uint64_t _get_field64(void *buf, int base_offs, ib_field_t *f); >> >> Are these really the functions that should be exported from the library or in >> the header file? (I'm probably missing some history here.) > >For one thing, mad.h currently inlines a number of functions used by >(diag) applications which invoke these. I did see that. I was thinking more of renaming _set_field to mad_set_field and removing the existing implementation of mad_set_field. - Sean From alex.estrin at qlogic.com Fri Dec 19 11:27:41 2008 From: alex.estrin at qlogic.com (Alex Estrin) Date: Fri, 19 Dec 2008 13:27:41 -0600 Subject: [ofa-general] [ipoib]patch for ipoib failure during startup with non-default pkey set. In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB3E746246B9@MNEXMB1.qlogic.org> References: <4C2744E8AD2982428C5BFE523DF8CDCB3E746246B9@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB3E746246C0@MNEXMB1.qlogic.org> This is the same patch, appropriately generated. Signed-off-by: Alex Estrin diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 784c291..459e2b9 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -719,7 +719,25 @@ int ipoib_ib_dev_open(struct net_device *dev) static void ipoib_pkey_dev_check_presence(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); + struct ib_port_attr port_attr; u16 pkey_index = 0; + + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { + + clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); + if (ib_query_port(priv->ca, priv->port, &port_attr)) { + ipoib_warn(priv, "Query port attrs failed\n"); + return; + } + if (port_attr.state != IB_PORT_ACTIVE) { + return; + } + if (ib_query_pkey(priv->ca, priv->port, 0, &priv->pkey)) { + ipoib_warn(priv, "Query P_Key table entry 0 failed\n"); + return; + } + set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); + } if (ib_find_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 016a057..4d270e2 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -556,6 +556,13 @@ void ipoib_mcast_join_task(struct work_struct *work) } spin_lock_irq(&priv->lock); + + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { + /* fix broadcast gid in case if pkey was changed */ + priv->pkey |= 0x8000; + priv->dev->broadcast[8] = priv->pkey >> 8; + priv->dev->broadcast[9] = priv->pkey & 0xff; + } memcpy(broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4, sizeof (union ib_gid)); priv->broadcast = broadcast; > Proposed patch allows ipoib interface to pickup correct value > from pkey-table first entry and update broadcast mgid before > it start joining multicast groups. > Please review. > > Signed-off-by: Alex Estrin > > diff --git > a/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c > b/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c > index 784c291..459e2b9 100644 > --- a/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c > +++ b/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_ib.c > @@ -719,7 +719,25 @@ int ipoib_ib_dev_open(struct net_device *dev) > static void ipoib_pkey_dev_check_presence(struct net_device *dev) > { > struct ipoib_dev_priv *priv = netdev_priv(dev); > + struct ib_port_attr port_attr; > u16 pkey_index = 0; > + > + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { > + > + clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); > + if (ib_query_port(priv->ca, priv->port, &port_attr)) { > + ipoib_warn(priv, "Query port attrs failed\n"); > + return; > + } > + if (port_attr.state != IB_PORT_ACTIVE) { > + return; > + } > + if (ib_query_pkey(priv->ca, priv->port, 0, > &priv->pkey)) { > + ipoib_warn(priv, "Query P_Key table > entry 0 failed\n"); > + return; > + } > + set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); > + } > > if (ib_find_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) > clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags); > diff --git > a/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_multicast. > c b/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > index 016a057..4d270e2 100644 > --- a/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > +++ b/ofa_kernel-1.4/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > @@ -556,6 +556,13 @@ void ipoib_mcast_join_task(struct > work_struct *work) > } > > spin_lock_irq(&priv->lock); > + > + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { > + /* fix broadcast gid in case if pkey > was changed */ > + priv->pkey |= 0x8000; > + priv->dev->broadcast[8] = priv->pkey >> 8; > + priv->dev->broadcast[9] = priv->pkey & 0xff; > + } > memcpy(broadcast->mcmember.mgid.raw, > priv->dev->broadcast + 4, > sizeof (union ib_gid)); > priv->broadcast = broadcast; > -------------- next part -------------- A non-text attachment was scrubbed... Name: ipoib_pkey_bootup_race.patch Type: application/octet-stream Size: 1805 bytes Desc: ipoib_pkey_bootup_race.patch URL: From rdreier at cisco.com Fri Dec 19 15:50:30 2008 From: rdreier at cisco.com (Roland Dreier) Date: Fri, 19 Dec 2008 15:50:30 -0800 Subject: [ofa-general] [ipoib]patch for ipoib failure during startup with non-default pkey set. In-Reply-To: <4C2744E8AD2982428C5BFE523DF8CDCB3E746246C0@MNEXMB1.qlogic.org> (Alex Estrin's message of "Fri, 19 Dec 2008 13:27:41 -0600") References: <4C2744E8AD2982428C5BFE523DF8CDCB3E746246B9@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E746246C0@MNEXMB1.qlogic.org> Message-ID: Can you provide some detail about what this patch is doing? What exactly is the bug, and how does this fix it? (Please always provide that with all patches -- it's too hard to reverse engineer every patch I see before I apply it) From vlad at lists.openfabrics.org Sat Dec 20 03:26:49 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 20 Dec 2008 03:26:49 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081220-0200 daily build status Message-ID: <20081220112649.E80F9E60C5C@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From alex.estrin at qlogic.com Sat Dec 20 04:46:27 2008 From: alex.estrin at qlogic.com (Alex Estrin) Date: Sat, 20 Dec 2008 06:46:27 -0600 Subject: [ofa-general] [ipoib]patch for ipoib failure during startup with non-default pkey set. In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB3E746246B9@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E746246C0@MNEXMB1.qlogic.org> Message-ID: <4C2744E8AD2982428C5BFE523DF8CDCB3E746246F6@MNEXMB1.qlogic.org> Hello, Ipoib uses the first pkey in the pkey table for its ib0 interface. Race is possible during normal boot, when ipoib starts before the port is Active and has a default pkey table with 0xffff as the only pkey. SM can program the pkey table differently when moves port to Active, bit at this point ipoib already started using default pkey. However there is no race, if ipoib started after the port is Active, then ipoib will find the first pkey as the SM programmed it. Proposed patch will delay ib0 initialization until port is Active. After port is Active, patch code will pickup correct pkey and fix broadcast mgid (earlier generated using default pkey) before interface start joining multicast groups. Please note, the patch is not intended to touch sub-interfaces with locally programmed pkeys. Thanks, Alex. > -----Original Message----- > From: Roland Dreier [mailto:rdreier at cisco.com] > Sent: Friday, December 19, 2008 6:51 PM > To: Alex Estrin > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] [ipoib]patch for ipoib failure > during startup with non-default pkey set. > > Can you provide some detail about what this patch is doing? What > exactly is the bug, and how does this fix it? (Please always provide > that with all patches -- it's too hard to reverse engineer > every patch I > see before I apply it) > From sashak at voltaire.com Sat Dec 20 12:11:53 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 20 Dec 2008 22:11:53 +0200 Subject: [ofa-general] Re: [PATCH] [2 of 10] [REVISED] mesh analysis - mesh_t data structure In-Reply-To: <006701c95af1$2b7cff00$8276fd00$@com> References: <006701c95af1$2b7cff00$8276fd00$@com> Message-ID: <20081220201153.GA25208@sashak.voltaire.com> Hi Bob, On 12:00 Wed 10 Dec , Robert Pearson wrote: > > This patch: > > - creates a data structure, mesh_t, that holds per mesh information > > - adds a pointer to this structure in lash_t > > - creates methods to create, cleanup and destroy the object. > > - adds calls in osm_ucast_lash.c to call these. osm_mesh_create() is called when lash->num_switches still be zero - as result a memory for mesh buffers is not allocated actually. I reworked this patch as below and rebased the rest. Sasha >From 304739705a93201ed6f0c95f279b57962b71c4f8 Mon Sep 17 00:00:00 2001 From: Robert Pearson Date: Wed, 10 Dec 2008 12:00:21 -0600 Subject: [PATCH] mesh analysis - mesh_t data structure Sasha, Here is a revised mesh patch #2 that incorporates changes based on your comments. The purpose of this patch is to create a per fabric data structure and methods. This patch: - creates a data structure, mesh_t, that holds per mesh information - adds a pointer to this structure in lash_t - creates methods to create, cleanup and destroy the object. - adds calls in osm_ucast_lash.c to call these. Regards, Bob Pearson Signed-off-by: Bob Pearson Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_mesh.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 60 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index 486e6b4..0786701 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -41,6 +41,7 @@ #endif /* HAVE_CONFIG_H */ #include +#include #include #include #include @@ -48,14 +49,73 @@ #include /* + * per fabric mesh info + */ +typedef struct _mesh { + int num_class; /* number of switch classes */ + int *class_type; /* index of first switch found for each class */ + int *class_count; /* population of each class */ + int dimension; /* mesh dimension */ + int *size; /* an array to hold size of mesh */ +} mesh_t; + +/* + * osm_mesh_delete - free per mesh resources + */ +static void mesh_delete(mesh_t *mesh) +{ + if (mesh) { + if (mesh->class_type) + free(mesh->class_type); + + if (mesh->class_count) + free(mesh->class_count); + + free(mesh); + } +} + +/* + * osm_mesh_create - allocate per mesh resources + */ +static mesh_t *mesh_create(lash_t *p_lash) +{ + osm_log_t *p_log = &p_lash->p_osm->log; + mesh_t *mesh; + + if(!(mesh = calloc(1, sizeof(mesh_t)))) + goto err; + + if (!(mesh->class_type = calloc(p_lash->num_switches, sizeof(int)))) + goto err; + + if (!(mesh->class_count = calloc(p_lash->num_switches, sizeof(int)))) + goto err; + + return mesh; + +err: + mesh_delete(mesh); + OSM_LOG(p_log, OSM_LOG_ERROR, "Failed allocating mesh - out of memory\n"); + return NULL; +} + +/* * osm_do_mesh_analysis */ int osm_do_mesh_analysis(lash_t *p_lash) { osm_log_t *p_log = &p_lash->p_osm->log; + mesh_t *mesh; OSM_LOG_ENTER(p_log); + mesh = mesh_create(p_lash); + if (!mesh) + return -1; + + mesh_delete(mesh); + OSM_LOG_EXIT(p_log); return 0; } -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 20 12:13:28 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 20 Dec 2008 22:13:28 +0200 Subject: [ofa-general] [PATCH] opensm/man/opensm.8.in: add descrition for --do_mesh_analysis option In-Reply-To: <005d01c95aef$aa4fcd50$feef67f0$@com> References: <005d01c95aef$aa4fcd50$feef67f0$@com> Message-ID: <20081220201328.GB25208@sashak.voltaire.com> Add description for --do_mesh_analysis option. Signed-off-by: Sasha Khapyorsky --- opensm/man/opensm.8.in | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in index 51d782f..eedd317 100644 --- a/opensm/man/opensm.8.in +++ b/opensm/man/opensm.8.in @@ -139,6 +139,11 @@ separated by commas so that specific ordering of routing algorithms will be tried if earlier routing engines fail. Supported engines: minhop, updn, file, ftree, lash, dor .TP +\fB\-\-do_mesh_analysis\fR +This option enables additional analysis for the lash routing engine to +precondition switch port assignments in regular cartesian meshes which +may reduce the number of SLs required to give a deadlock free routing. +.TP \fB\-A\fR, \fB\-\-ucast_cache\fR This option enables unicast routing cache and prevents routing recalculation (which is a heavy task in a large cluster) when -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 20 12:14:40 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 20 Dec 2008 22:14:40 +0200 Subject: [ofa-general] [PATCH] opensm: add do_mesh_analysis configuration parameter In-Reply-To: <005d01c95aef$aa4fcd50$feef67f0$@com> References: <005d01c95aef$aa4fcd50$feef67f0$@com> Message-ID: <20081220201440.GC25208@sashak.voltaire.com> Add do_mesh_analysis as configuration parameter loadable from OpenSM config file. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_subnet.h | 2 +- opensm/opensm/osm_subnet.c | 9 +++++++++ 2 files changed, 10 insertions(+), 1 deletions(-) diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h index 56b957f..8863e47 100644 --- a/opensm/include/opensm/osm_subnet.h +++ b/opensm/include/opensm/osm_subnet.h @@ -193,6 +193,7 @@ typedef struct osm_subn_opt { char *ids_guid_file; char *guid_routing_order_file; char *sa_db_file; + boolean_t do_mesh_analysis; boolean_t exit_on_fatal; boolean_t honor_guid2lid_file; boolean_t daemon; @@ -216,7 +217,6 @@ typedef struct osm_subn_opt { char *node_name_map_name; char *prefix_routes_file; boolean_t consolidate_ipv6_snm_req; - boolean_t do_mesh_analysis; } osm_subn_opt_t; /* * FIELDS diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 909c29e..122d4dd 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -413,6 +413,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) p_opt->ids_guid_file = NULL; p_opt->guid_routing_order_file = NULL; p_opt->sa_db_file = NULL; + p_opt->do_mesh_analysis = FALSE; p_opt->exit_on_fatal = TRUE; p_opt->enable_quirks = FALSE; p_opt->no_clients_rereg = FALSE; @@ -1168,6 +1169,9 @@ int osm_subn_parse_conf_file(char *file_name, osm_subn_opt_t * const p_opts) opts_unpack_charp("sa_db_file", p_key, p_val, &p_opts->sa_db_file); + opts_unpack_boolean("do_mesh_analysis", + p_key, p_val, &p_opts->do_mesh_analysis); + opts_unpack_boolean("exit_on_fatal", p_key, p_val, &p_opts->exit_on_fatal); @@ -1472,6 +1476,11 @@ int osm_subn_output_conf(FILE *out, IN osm_subn_opt_t *const p_opts) p_opts->guid_routing_order_file ? p_opts->guid_routing_order_file : null_str); fprintf(out, + "# Do mesh topology analysis (for LASH algorithm)\n" + "do_mesh_analysis %s\n\n", + p_opts->do_mesh_analysis ? "TRUE" : "FALSE"); + + fprintf(out, "# SA database file name\nsa_db_file %s\n\n", p_opts->sa_db_file ? p_opts->sa_db_file : null_str); -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 20 12:18:28 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 20 Dec 2008 22:18:28 +0200 Subject: [ofa-general] Re: [PATCH] [8 of 10] [REVISED] mesh analysis - reorder links In-Reply-To: <00b901c95af7$5cea7120$16bf5360$@com> References: <00b901c95af7$5cea7120$16bf5360$@com> Message-ID: <20081220201828.GD25208@sashak.voltaire.com> On 12:44 Wed 10 Dec , Robert Pearson wrote: > > This patch implements > > - routine to reorder links and measure the size of the mesh There are memory leaks - it allocates but doesn't free mesh->size and node->coord. So I;m adding also the patch below. Sasha >From ec9daf997f22a4d63121a11a8dcde0ca1171b877 Mon Sep 17 00:00:00 2001 From: Sasha Khapyorsky Date: Sat, 20 Dec 2008 21:26:58 +0200 Subject: [PATCH] opensm/mesh: fix memory leaks Fix memory leaks. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_mesh.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index 4deb004..a0b6e1d 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -1172,6 +1172,9 @@ static void mesh_delete(mesh_t *mesh) if (mesh->class_count) free(mesh->class_count); + if (mesh->size) + free(mesh->size); + free(mesh); } } @@ -1239,6 +1242,9 @@ void osm_mesh_node_delete(lash_t *p_lash, switch_t *sw) if (node->axes) free(node->axes); + if (node->coord) + free(node->coord); + free(node); sw->node = NULL; -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 20 12:19:46 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 20 Dec 2008 22:19:46 +0200 Subject: [ofa-general] [PATCH] opensm/lash: fix memory leaks Message-ID: <20081220201946.GE25208@sashak.voltaire.com> Fix memory leaks in LASH. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_ucast_lash.c | 57 +++++++++++++++------------------------ 1 files changed, 22 insertions(+), 35 deletions(-) diff --git a/opensm/opensm/osm_ucast_lash.c b/opensm/opensm/osm_ucast_lash.c index df928c5..5381412 100644 --- a/opensm/opensm/osm_ucast_lash.c +++ b/opensm/opensm/osm_ucast_lash.c @@ -57,11 +57,24 @@ static cdg_vertex_t *create_cdg_vertex(unsigned num_switches) { - cdg_vertex_t *cdg_vertex = (cdg_vertex_t *) malloc(sizeof(cdg_vertex_t)); + cdg_vertex_t *v = (cdg_vertex_t *) malloc(sizeof(cdg_vertex_t)); - cdg_vertex->dependency = malloc((num_switches - 1) * sizeof(cdg_vertex_t *)); - cdg_vertex->num_using_this_depend = (int *)malloc((num_switches - 1) * sizeof(int)); - return cdg_vertex; + memset(v, 0, sizeof(*v)); + v->dependency = malloc((num_switches - 1) * sizeof(cdg_vertex_t *)); + v->num_using_this_depend = malloc((num_switches - 1) * sizeof(int)); + memset(v->dependency, 0, (num_switches - 1) * sizeof(cdg_vertex_t *)); + memset(v->num_using_this_depend, 0, (num_switches - 1) * sizeof(int)); + + return v; +} + +static void delete_cdg_vertex(cdg_vertex_t *v) +{ + if (v->dependency) + free(v->dependency); + if (v->num_using_this_depend) + free(v->num_using_this_depend); + free(v); } static void connect_switches(lash_t * p_lash, int sw1, int sw2, int phy_port_1) @@ -209,7 +222,7 @@ static void remove_semipermanent_depend_for_sp(lash_t * p_lash, int sw, cdg_vertex_matrix[lane][sw][i_next_switch] = NULL; - free(v); + delete_cdg_vertex(v); } else { v->num_using_vertex--; if (i_next_switch != dest_switch) { @@ -353,24 +366,10 @@ static void generate_cdg_for_sp(lash_t * p_lash, int sw, int dest_switch, while (sw != dest_switch) { if (cdg_vertex_matrix[lane][sw][next_switch] == NULL) { - unsigned i; v = create_cdg_vertex(num_switches); - - for (i = 0; i < num_switches - 1; i++) { - v->dependency[i] = NULL; - v->num_using_this_depend[i] = 0; - } - - v->num_using_vertex = 0; - v->num_dependencies = 0; v->from = sw; v->to = next_switch; - v->seen = 0; - v->visiting_number = 0; - v->next = NULL; v->temp = 1; - v->num_temp_depend = 0; - cdg_vertex_matrix[lane][sw][next_switch] = v; } else v = cdg_vertex_matrix[lane][sw][next_switch]; @@ -457,7 +456,7 @@ static void remove_temp_depend_for_sp(lash_t * p_lash, int sw, int dest_switch, if (v->temp == 1) { cdg_vertex_matrix[lane][sw][next_switch] = NULL; - free(v); + delete_cdg_vertex(v); } else { CL_ASSERT(v->num_temp_depend <= v->num_dependencies); v->num_dependencies = @@ -701,21 +700,9 @@ static void free_lash_structures(lash_t * p_lash) // free cdg_vertex_matrix for (i = 0; i < p_lash->vl_min; i++) { for (j = 0; j < num_switches; j++) { - for (k = 0; k < num_switches; k++) { - if (p_lash->cdg_vertex_matrix[i][j][k]) { - - if (p_lash->cdg_vertex_matrix[i][j][k]->dependency) - free(p_lash->cdg_vertex_matrix[i][j][k]-> - dependency); - - if (p_lash->cdg_vertex_matrix[i][j][k]-> - num_using_this_depend) - free(p_lash->cdg_vertex_matrix[i][j][k]-> - num_using_this_depend); - - free(p_lash->cdg_vertex_matrix[i][j][k]); - } - } + for (k = 0; k < num_switches; k++) + if (p_lash->cdg_vertex_matrix[i][j][k]) + delete_cdg_vertex(p_lash->cdg_vertex_matrix[i][j][k]); if (p_lash->cdg_vertex_matrix[i][j]) free(p_lash->cdg_vertex_matrix[i][j]); } -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sat Dec 20 12:24:04 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 20 Dec 2008 22:24:04 +0200 Subject: [ofa-general] Re: FW: [PATCH] [10 of 10] [REVISED] mesh analysis - integrate into lash core In-Reply-To: <022e01c95eff$e52fdb40$af8f91c0$@com> References: <022e01c95eff$e52fdb40$af8f91c0$@com> Message-ID: <20081220202344.GF25208@sashak.voltaire.com> On 15:55 Mon 15 Dec , Robert Pearson wrote: > > The p3 fix impacted this one as well. The rest are unchanged. I applied all ten mesh analysis patches with noted changes. Thanks. Sasha From sashak at voltaire.com Sat Dec 20 12:36:35 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sat, 20 Dec 2008 22:36:35 +0200 Subject: [ofa-general] Re: [PATCH] OpenSM: update osmeventplugin example for the new TRAP event. In-Reply-To: <20081218164813.55696c45.weiny2@llnl.gov> References: <20081218164813.55696c45.weiny2@llnl.gov> Message-ID: <20081220203624.GG25208@sashak.voltaire.com> On 16:48 Thu 18 Dec , Ira Weiny wrote: > It turns out that I already was using the "OSM_EVENT_ID_TRAP" in the example > plugin. > > This makes the use work, > Ira > > > From 7b744c38fc2aad67586ade81d65326a139a85681 Mon Sep 17 00:00:00 2001 > From: Ira Weiny > Date: Thu, 18 Dec 2008 16:16:37 -0800 > Subject: [PATCH] OpenSM: update osmeventplugin example for the new TRAP event. > > > Signed-off-by: Ira Weiny Applied. Thanks. Sasha From rdreier at cisco.com Sat Dec 20 14:31:18 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sat, 20 Dec 2008 14:31:18 -0800 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: <4907348E.7060508@mellanox.co.il> (Yevgeny Petrilin's message of "Tue, 28 Oct 2008 17:49:34 +0200") References: <4907348E.7060508@mellanox.co.il> Message-ID: Thanks, applied with some stylistic changes as below. Let me know if I broke things in the process. commit 197bf2a025543f4c43c100ad10f1231ca52e6975 Author: Yevgeny Petrilin Date: Sat Dec 20 13:55:34 2008 -0800 mlx4_core: Add support for multiple completion event vectors When using MSI-X mode, create a completion event queue for each CPU. Report the number of completion EQs in a new struct mlx4_caps member, num_comp_vectors, and extend the mlx4_cq_alloc() interface with a vector parameter so that consumers can specify which completion EQ should be used to report events for the CQ being created. Signed-off-by: Yevgeny Petrilin Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/infiniband/hw/mlx4/main.c | 2 +- drivers/net/mlx4/cq.c | 11 +++- drivers/net/mlx4/en_cq.c | 9 ++- drivers/net/mlx4/en_main.c | 4 +- drivers/net/mlx4/eq.c | 117 ++++++++++++++++++++++++++++--------- drivers/net/mlx4/main.c | 50 +++++++++++----- drivers/net/mlx4/mlx4.h | 14 ++--- drivers/net/mlx4/profile.c | 4 +- include/linux/mlx4/device.h | 4 +- 10 files changed, 154 insertions(+), 63 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 1830849..2198753 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -222,7 +222,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector } err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, - cq->db.dma, &cq->mcq, 0); + cq->db.dma, &cq->mcq, vector, 0); if (err) goto err_dbmap; diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 2e80f8f..dcefe1f 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -578,7 +578,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB) ibdev->num_ports++; ibdev->ib_dev.phys_port_cnt = ibdev->num_ports; - ibdev->ib_dev.num_comp_vectors = 1; + ibdev->ib_dev.num_comp_vectors = dev->caps.num_comp_vectors; ibdev->ib_dev.dma_device = &dev->pdev->dev; ibdev->ib_dev.uverbs_abi_ver = MLX4_IB_UVERBS_ABI_VERSION; diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index b7ad282..ac57b6a 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -189,7 +189,7 @@ EXPORT_SYMBOL_GPL(mlx4_cq_resize); int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, - int collapsed) + unsigned vector, int collapsed) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_cq_table *cq_table = &priv->cq_table; @@ -198,6 +198,11 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, u64 mtt_addr; int err; + if (vector >= dev->caps.num_comp_vectors) + return -EINVAL; + + cq->vector = vector; + cq->cqn = mlx4_bitmap_alloc(&cq_table->bitmap); if (cq->cqn == -1) return -ENOMEM; @@ -227,7 +232,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, cq_context->flags = cpu_to_be32(!!collapsed << 18); cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index); - cq_context->comp_eqn = priv->eq_table.eq[MLX4_EQ_COMP].eqn; + cq_context->comp_eqn = priv->eq_table.eq[vector].eqn; cq_context->log_page_size = mtt->page_shift - MLX4_ICM_PAGE_SHIFT; mtt_addr = mlx4_mtt_addr(dev, mtt); @@ -276,7 +281,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq) if (err) mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn); - synchronize_irq(priv->eq_table.eq[MLX4_EQ_COMP].irq); + synchronize_irq(priv->eq_table.eq[cq->vector].irq); spin_lock_irq(&cq_table->lock); radix_tree_delete(&cq_table->tree, cq->cqn); diff --git a/drivers/net/mlx4/en_cq.c b/drivers/net/mlx4/en_cq.c index 1368a80..674f836 100644 --- a/drivers/net/mlx4/en_cq.c +++ b/drivers/net/mlx4/en_cq.c @@ -51,10 +51,13 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, int err; cq->size = entries; - if (mode == RX) + if (mode == RX) { cq->buf_size = cq->size * sizeof(struct mlx4_cqe); - else + cq->vector = ring % mdev->dev->caps.num_comp_vectors; + } else { cq->buf_size = sizeof(struct mlx4_cqe); + cq->vector = 0; + } cq->ring = ring; cq->is_tx = mode; @@ -86,7 +89,7 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq) memset(cq->buf, 0, cq->buf_size); err = mlx4_cq_alloc(mdev->dev, cq->size, &cq->wqres.mtt, &mdev->priv_uar, - cq->wqres.db.dma, &cq->mcq, cq->is_tx); + cq->wqres.db.dma, &cq->mcq, cq->vector, cq->is_tx); if (err) return err; diff --git a/drivers/net/mlx4/en_main.c b/drivers/net/mlx4/en_main.c index 4b9794e..e44e018 100644 --- a/drivers/net/mlx4/en_main.c +++ b/drivers/net/mlx4/en_main.c @@ -170,9 +170,9 @@ static void *mlx4_en_add(struct mlx4_dev *dev) mlx4_info(mdev, "Using %d tx rings for port:%d\n", mdev->profile.prof[i].tx_ring_num, i); if (!mdev->profile.prof[i].rx_ring_num) { - mdev->profile.prof[i].rx_ring_num = 1; + mdev->profile.prof[i].rx_ring_num = dev->caps.num_comp_vectors;; mlx4_info(mdev, "Defaulting to %d rx rings for port:%d\n", - 1, i); + mdev->profile.prof[i].rx_ring_num, i); } else mlx4_info(mdev, "Using %d rx rings for port:%d\n", mdev->profile.prof[i].rx_ring_num, i); diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c index de16933..5d867eb 100644 --- a/drivers/net/mlx4/eq.c +++ b/drivers/net/mlx4/eq.c @@ -266,7 +266,7 @@ static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr) writel(priv->eq_table.clr_mask, priv->eq_table.clr_int); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) work |= mlx4_eq_int(dev, &priv->eq_table.eq[i]); return IRQ_RETVAL(work); @@ -304,6 +304,17 @@ static int mlx4_HW2SW_EQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox, MLX4_CMD_TIME_CLASS_A); } +static int mlx4_num_eq_uar(struct mlx4_dev *dev) +{ + /* + * Each UAR holds 4 EQ doorbells. To figure out how many UARs + * we need to map, take the difference of highest index and + * the lowest index we'll use and add 1. + */ + return (dev->caps.num_comp_vectors + 1 + dev->caps.reserved_eqs) / 4 - + dev->caps.reserved_eqs / 4 + 1; +} + static void __iomem *mlx4_get_eq_uar(struct mlx4_dev *dev, struct mlx4_eq *eq) { struct mlx4_priv *priv = mlx4_priv(dev); @@ -483,9 +494,11 @@ static void mlx4_free_irqs(struct mlx4_dev *dev) if (eq_table->have_irq) free_irq(dev->pdev->irq, dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) if (eq_table->eq[i].have_irq) free_irq(eq_table->eq[i].irq, eq_table->eq + i); + + kfree(eq_table->irq_names); } static int mlx4_map_clr_int(struct mlx4_dev *dev) @@ -551,57 +564,93 @@ void mlx4_unmap_eq_icm(struct mlx4_dev *dev) __free_page(priv->eq_table.icm_page); } +int mlx4_alloc_eq_table(struct mlx4_dev *dev) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + + priv->eq_table.eq = kcalloc(dev->caps.num_eqs - dev->caps.reserved_eqs, + sizeof *priv->eq_table.eq, GFP_KERNEL); + if (!priv->eq_table.eq) + return -ENOMEM; + + return 0; +} + +void mlx4_free_eq_table(struct mlx4_dev *dev) +{ + kfree(mlx4_priv(dev)->eq_table.eq); +} + int mlx4_init_eq_table(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); int err; int i; + priv->eq_table.uar_map = kcalloc(sizeof *priv->eq_table.uar_map, + mlx4_num_eq_uar(dev), GFP_KERNEL); + if (!priv->eq_table.uar_map) { + err = -ENOMEM; + goto err_out_free; + } + err = mlx4_bitmap_init(&priv->eq_table.bitmap, dev->caps.num_eqs, dev->caps.num_eqs - 1, dev->caps.reserved_eqs, 0); if (err) - return err; + goto err_out_free; - for (i = 0; i < ARRAY_SIZE(priv->eq_table.uar_map); ++i) + for (i = 0; i < mlx4_num_eq_uar(dev); ++i) priv->eq_table.uar_map[i] = NULL; err = mlx4_map_clr_int(dev); if (err) - goto err_out_free; + goto err_out_bitmap; priv->eq_table.clr_mask = swab32(1 << (priv->eq_table.inta_pin & 31)); priv->eq_table.clr_int = priv->clr_base + (priv->eq_table.inta_pin < 32 ? 4 : 0); - err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, - (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_COMP : 0, - &priv->eq_table.eq[MLX4_EQ_COMP]); - if (err) - goto err_out_unmap; + priv->eq_table.irq_names = kmalloc(16 * dev->caps.num_comp_vectors, GFP_KERNEL); + if (!priv->eq_table.irq_names) { + err = -ENOMEM; + goto err_out_bitmap; + } + + for (i = 0; i < dev->caps.num_comp_vectors; ++i) { + err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, + (dev->flags & MLX4_FLAG_MSI_X) ? i : 0, + &priv->eq_table.eq[i]); + if (err) + goto err_out_unmap; + } err = mlx4_create_eq(dev, MLX4_NUM_ASYNC_EQE + MLX4_NUM_SPARE_EQE, - (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_ASYNC : 0, - &priv->eq_table.eq[MLX4_EQ_ASYNC]); + (dev->flags & MLX4_FLAG_MSI_X) ? dev->caps.num_comp_vectors : 0, + &priv->eq_table.eq[dev->caps.num_comp_vectors]); if (err) goto err_out_comp; if (dev->flags & MLX4_FLAG_MSI_X) { - static const char *eq_name[] = { - [MLX4_EQ_COMP] = DRV_NAME " (comp)", - [MLX4_EQ_ASYNC] = DRV_NAME " (async)" - }; + static const char async_eq_name[] = "mlx4-async"; + const char *eq_name; + + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) { + if (i < dev->caps.num_comp_vectors) { + snprintf(priv->eq_table.irq_names + i * 16, 16, + "mlx4-comp-%d", i); + eq_name = priv->eq_table.irq_names + i * 16; + } else + eq_name = async_eq_name; - for (i = 0; i < MLX4_NUM_EQ; ++i) { err = request_irq(priv->eq_table.eq[i].irq, - mlx4_msi_x_interrupt, - 0, eq_name[i], priv->eq_table.eq + i); + mlx4_msi_x_interrupt, 0, eq_name, + priv->eq_table.eq + i); if (err) goto err_out_async; priv->eq_table.eq[i].have_irq = 1; } - } else { err = request_irq(dev->pdev->irq, mlx4_interrupt, IRQF_SHARED, DRV_NAME, dev); @@ -612,28 +661,36 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) } err = mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0, - priv->eq_table.eq[MLX4_EQ_ASYNC].eqn); + priv->eq_table.eq[dev->caps.num_comp_vectors].eqn); if (err) mlx4_warn(dev, "MAP_EQ for async EQ %d failed (%d)\n", - priv->eq_table.eq[MLX4_EQ_ASYNC].eqn, err); + priv->eq_table.eq[dev->caps.num_comp_vectors].eqn, err); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) eq_set_ci(&priv->eq_table.eq[i], 1); return 0; err_out_async: - mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_ASYNC]); + mlx4_free_eq(dev, &priv->eq_table.eq[dev->caps.num_comp_vectors]); err_out_comp: - mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP]); + i = dev->caps.num_comp_vectors - 1; err_out_unmap: + while (i >= 0) { + mlx4_free_eq(dev, &priv->eq_table.eq[i]); + --i; + } mlx4_unmap_clr_int(dev); mlx4_free_irqs(dev); -err_out_free: +err_out_bitmap: mlx4_bitmap_cleanup(&priv->eq_table.bitmap); + +err_out_free: + kfree(priv->eq_table.uar_map); + return err; } @@ -643,18 +700,20 @@ void mlx4_cleanup_eq_table(struct mlx4_dev *dev) int i; mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 1, - priv->eq_table.eq[MLX4_EQ_ASYNC].eqn); + priv->eq_table.eq[dev->caps.num_comp_vectors].eqn); mlx4_free_irqs(dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) mlx4_free_eq(dev, &priv->eq_table.eq[i]); mlx4_unmap_clr_int(dev); - for (i = 0; i < ARRAY_SIZE(priv->eq_table.uar_map); ++i) + for (i = 0; i < mlx4_num_eq_uar(dev); ++i) if (priv->eq_table.uar_map[i]) iounmap(priv->eq_table.uar_map[i]); mlx4_bitmap_cleanup(&priv->eq_table.bitmap); + + kfree(priv->eq_table.uar_map); } diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 90a0281..a69ed57 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -421,9 +421,7 @@ static int mlx4_init_cmpt_table(struct mlx4_dev *dev, u64 cmpt_base, ((u64) (MLX4_CMPT_TYPE_EQ * cmpt_entry_sz) << MLX4_CMPT_SHIFT), cmpt_entry_sz, - roundup_pow_of_two(MLX4_NUM_EQ + - dev->caps.reserved_eqs), - MLX4_NUM_EQ + dev->caps.reserved_eqs, 0, 0); + dev->caps.num_eqs, dev->caps.num_eqs, 0, 0); if (err) goto err_cq; @@ -810,12 +808,12 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) if (dev->flags & MLX4_FLAG_MSI_X) { mlx4_warn(dev, "NOP command failed to generate MSI-X " "interrupt IRQ %d).\n", - priv->eq_table.eq[MLX4_EQ_ASYNC].irq); + priv->eq_table.eq[dev->caps.num_comp_vectors].irq); mlx4_warn(dev, "Trying again without MSI-X.\n"); } else { mlx4_err(dev, "NOP command failed to generate interrupt " "(IRQ %d), aborting.\n", - priv->eq_table.eq[MLX4_EQ_ASYNC].irq); + priv->eq_table.eq[dev->caps.num_comp_vectors].irq); mlx4_err(dev, "BIOS or ACPI interrupt routing problem?\n"); } @@ -908,31 +906,47 @@ err_uar_table_free: static void mlx4_enable_msi_x(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); - struct msix_entry entries[MLX4_NUM_EQ]; + struct msix_entry *entries; + int nreq; int err; int i; if (msi_x) { - for (i = 0; i < MLX4_NUM_EQ; ++i) + nreq = min(dev->caps.num_eqs - dev->caps.reserved_eqs, + num_possible_cpus() + 1); + entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL); + if (!entries) + goto no_msi; + + for (i = 0; i < nreq; ++i) entries[i].entry = i; - err = pci_enable_msix(dev->pdev, entries, ARRAY_SIZE(entries)); + retry: + err = pci_enable_msix(dev->pdev, entries, nreq); if (err) { - if (err > 0) - mlx4_info(dev, "Only %d MSI-X vectors available, " - "not using MSI-X\n", err); + if (err > 0) { + mlx4_info(dev, "Requested %d vectors, " + "but only %d MSI-X vectors available, " + "trying again\n", nreq, err); + goto retry; + } goto no_msi; } - for (i = 0; i < MLX4_NUM_EQ; ++i) + dev->caps.num_comp_vectors = nreq - 1; + for (i = 0; i < nreq; ++i) priv->eq_table.eq[i].irq = entries[i].vector; dev->flags |= MLX4_FLAG_MSI_X; + + kfree(entries); return; } no_msi: - for (i = 0; i < MLX4_NUM_EQ; ++i) + dev->caps.num_comp_vectors = 1; + + for (i = 0; i < 2; ++i) priv->eq_table.eq[i].irq = dev->pdev->irq; } @@ -1074,6 +1088,10 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) if (err) goto err_cmd; + err = mlx4_alloc_eq_table(dev); + if (err) + goto err_close; + mlx4_enable_msi_x(dev); err = mlx4_setup_hca(dev); @@ -1084,7 +1102,7 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) } if (err) - goto err_close; + goto err_free_eq; for (port = 1; port <= dev->caps.num_ports; port++) { err = mlx4_init_port_info(dev, port); @@ -1114,6 +1132,9 @@ err_port: mlx4_cleanup_pd_table(dev); mlx4_cleanup_uar_table(dev); +err_free_eq: + mlx4_free_eq_table(dev); + err_close: if (dev->flags & MLX4_FLAG_MSI_X) pci_disable_msix(pdev); @@ -1177,6 +1198,7 @@ static void mlx4_remove_one(struct pci_dev *pdev) iounmap(priv->kar); mlx4_uar_free(dev, &priv->driver_uar); mlx4_cleanup_uar_table(dev); + mlx4_free_eq_table(dev); mlx4_close_hca(dev); mlx4_cmd_cleanup(dev); diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 34c909d..e0213ba 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -63,12 +63,6 @@ enum { }; enum { - MLX4_EQ_ASYNC, - MLX4_EQ_COMP, - MLX4_NUM_EQ -}; - -enum { MLX4_NUM_PDS = 1 << 15 }; @@ -205,10 +199,11 @@ struct mlx4_cq_table { struct mlx4_eq_table { struct mlx4_bitmap bitmap; + char *irq_names; void __iomem *clr_int; - void __iomem *uar_map[(MLX4_NUM_EQ + 6) / 4]; + void __iomem **uar_map; u32 clr_mask; - struct mlx4_eq eq[MLX4_NUM_EQ]; + struct mlx4_eq *eq; u64 icm_virt; struct page *icm_page; dma_addr_t icm_dma; @@ -328,6 +323,9 @@ void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap); int mlx4_reset(struct mlx4_dev *dev); +int mlx4_alloc_eq_table(struct mlx4_dev *dev); +void mlx4_free_eq_table(struct mlx4_dev *dev); + int mlx4_init_pd_table(struct mlx4_dev *dev); int mlx4_init_uar_table(struct mlx4_dev *dev); int mlx4_init_mr_table(struct mlx4_dev *dev); diff --git a/drivers/net/mlx4/profile.c b/drivers/net/mlx4/profile.c index 9ca42b2..919fb9e 100644 --- a/drivers/net/mlx4/profile.c +++ b/drivers/net/mlx4/profile.c @@ -107,7 +107,9 @@ u64 mlx4_make_profile(struct mlx4_dev *dev, profile[MLX4_RES_AUXC].num = request->num_qp; profile[MLX4_RES_SRQ].num = request->num_srq; profile[MLX4_RES_CQ].num = request->num_cq; - profile[MLX4_RES_EQ].num = MLX4_NUM_EQ + dev_cap->reserved_eqs; + profile[MLX4_RES_EQ].num = min(dev_cap->max_eqs, + dev_cap->reserved_eqs + + num_possible_cpus() + 1); profile[MLX4_RES_DMPT].num = request->num_mpt; profile[MLX4_RES_CMPT].num = MLX4_NUM_CMPTS; profile[MLX4_RES_MTT].num = request->num_mtt; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 371086f..8f659cc 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -206,6 +206,7 @@ struct mlx4_caps { int reserved_cqs; int num_eqs; int reserved_eqs; + int num_comp_vectors; int num_mpts; int num_mtt_segs; int fmr_reserved_mtts; @@ -328,6 +329,7 @@ struct mlx4_cq { int arm_sn; int cqn; + unsigned vector; atomic_t refcount; struct completion free; @@ -437,7 +439,7 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, - int collapsed); + unsigned vector, int collapsed); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); From dorfman.eli at gmail.com Sun Dec 21 02:58:24 2008 From: dorfman.eli at gmail.com (Eli Dorfman) Date: Sun, 21 Dec 2008 12:58:24 +0200 Subject: [ofa-general] [PATCH] opensm/osm_subnet.c Fix memory leak for QOS string parameters Message-ID: <494E2150.1030000@gmail.com> Fix memory leak for QOS string parameters. Signed-off-by: Slava Strebkov --- opensm/opensm/osm_subnet.c | 21 +++++++++++++++++++++ 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 122d4dd..f8b29f8 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -331,6 +331,21 @@ static void subn_init_qos_options(IN osm_qos_options_t * opt) opt->sl2vl = NULL; } +static void subn_free_qos_options(IN osm_qos_options_t * opt) +{ + if ((opt->vlarb_high) && (opt->vlarb_high != OSM_DEFAULT_QOS_VLARB_HIGH)) { + free(opt->vlarb_high); + } + + if ((opt->vlarb_low) && (opt->vlarb_low != OSM_DEFAULT_QOS_VLARB_LOW)) { + free(opt->vlarb_low); + } + + if ((opt->sl2vl) && (opt->sl2vl != OSM_DEFAULT_QOS_SL2VL)) { + free(opt->sl2vl); + } +} + /********************************************************************** **********************************************************************/ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) @@ -1263,6 +1278,12 @@ int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) return -1; } + subn_free_qos_options(&p_subn->opt.qos_options); + subn_free_qos_options(&p_subn->opt.qos_ca_options); + subn_free_qos_options(&p_subn->opt.qos_sw0_options); + subn_free_qos_options(&p_subn->opt.qos_swe_options); + subn_free_qos_options(&p_subn->opt.qos_rtr_options); + subn_init_qos_options(&p_subn->opt.qos_options); subn_init_qos_options(&p_subn->opt.qos_ca_options); subn_init_qos_options(&p_subn->opt.qos_sw0_options); -- 1.5.6 From vlad at lists.openfabrics.org Sun Dec 21 03:23:03 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 21 Dec 2008 03:23:03 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081221-0200 daily build status Message-ID: <20081221112304.18F87E60C44@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From dorfman.eli at gmail.com Sun Dec 21 03:54:02 2008 From: dorfman.eli at gmail.com (Eli Dorfman (Voltaire)) Date: Sun, 21 Dec 2008 13:54:02 +0200 Subject: [ofa-general] ***SPAM*** [PATCH] opensm/osm_subnet.c Fix memory leak for QOS string parameters. Message-ID: <494E2E5A.8050008@gmail.com> Fix memory leak for QOS string parameters. Signed-off-by: Slava Strebkov --- opensm/opensm/osm_subnet.c | 21 +++++++++++++++++++++ 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c index 122d4dd..f8b29f8 100644 --- a/opensm/opensm/osm_subnet.c +++ b/opensm/opensm/osm_subnet.c @@ -331,6 +331,21 @@ static void subn_init_qos_options(IN osm_qos_options_t * opt) opt->sl2vl = NULL; } +static void subn_free_qos_options(IN osm_qos_options_t * opt) +{ + if ((opt->vlarb_high) && (opt->vlarb_high != OSM_DEFAULT_QOS_VLARB_HIGH)) { + free(opt->vlarb_high); + } + + if ((opt->vlarb_low) && (opt->vlarb_low != OSM_DEFAULT_QOS_VLARB_LOW)) { + free(opt->vlarb_low); + } + + if ((opt->sl2vl) && (opt->sl2vl != OSM_DEFAULT_QOS_SL2VL)) { + free(opt->sl2vl); + } +} + /********************************************************************** **********************************************************************/ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt) @@ -1263,6 +1278,12 @@ int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn) return -1; } + subn_free_qos_options(&p_subn->opt.qos_options); + subn_free_qos_options(&p_subn->opt.qos_ca_options); + subn_free_qos_options(&p_subn->opt.qos_sw0_options); + subn_free_qos_options(&p_subn->opt.qos_swe_options); + subn_free_qos_options(&p_subn->opt.qos_rtr_options); + subn_init_qos_options(&p_subn->opt.qos_options); subn_init_qos_options(&p_subn->opt.qos_ca_options); subn_init_qos_options(&p_subn->opt.qos_sw0_options); -- 1.5.6 From ronli.voltaire at gmail.com Sun Dec 21 04:14:30 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Sun, 21 Dec 2008 14:14:30 +0200 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: References: <4907348E.7060508@mellanox.co.il> Message-ID: <3b5e77ad0812210414n73765c2iccf2fc98c492c07c@mail.gmail.com> Roland, > + retry: > + err = pci_enable_msix(dev->pdev, entries, nreq); > if (err) { > - if (err > 0) > - mlx4_info(dev, "Only %d MSI-X vectors available, " > - "not using MSI-X\n", err); > + if (err > 0) { > + mlx4_info(dev, "Requested %d vectors, " > + "but only %d MSI-X vectors available, " > + "trying again\n", nreq, err); > + goto retry; > + } > goto no_msi; > } Wouldn't going to retry with the same nreq num instead of the err value might produce an infinite loop? Ron On Sun, Dec 21, 2008 at 12:31 AM, Roland Dreier wrote: > Thanks, applied with some stylistic changes as below. Let me know if I > broke things in the process. > > commit 197bf2a025543f4c43c100ad10f1231ca52e6975 > Author: Yevgeny Petrilin > Date: Sat Dec 20 13:55:34 2008 -0800 > > mlx4_core: Add support for multiple completion event vectors > > When using MSI-X mode, create a completion event queue for each CPU. > Report the number of completion EQs in a new struct mlx4_caps member, > num_comp_vectors, and extend the mlx4_cq_alloc() interface with a > vector parameter so that consumers can specify which completion EQ > should be used to report events for the CQ being created. > > Signed-off-by: Yevgeny Petrilin > Signed-off-by: Roland Dreier > --- > drivers/infiniband/hw/mlx4/cq.c | 2 +- > drivers/infiniband/hw/mlx4/main.c | 2 +- > drivers/net/mlx4/cq.c | 11 +++- > drivers/net/mlx4/en_cq.c | 9 ++- > drivers/net/mlx4/en_main.c | 4 +- > drivers/net/mlx4/eq.c | 117 ++++++++++++++++++++++++++++--------- > drivers/net/mlx4/main.c | 50 +++++++++++----- > drivers/net/mlx4/mlx4.h | 14 ++--- > drivers/net/mlx4/profile.c | 4 +- > include/linux/mlx4/device.h | 4 +- > 10 files changed, 154 insertions(+), 63 deletions(-) > > diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c > index 1830849..2198753 100644 > --- a/drivers/infiniband/hw/mlx4/cq.c > +++ b/drivers/infiniband/hw/mlx4/cq.c > @@ -222,7 +222,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector > } > > err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, > - cq->db.dma, &cq->mcq, 0); > + cq->db.dma, &cq->mcq, vector, 0); > if (err) > goto err_dbmap; > > diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c > index 2e80f8f..dcefe1f 100644 > --- a/drivers/infiniband/hw/mlx4/main.c > +++ b/drivers/infiniband/hw/mlx4/main.c > @@ -578,7 +578,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) > mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB) > ibdev->num_ports++; > ibdev->ib_dev.phys_port_cnt = ibdev->num_ports; > - ibdev->ib_dev.num_comp_vectors = 1; > + ibdev->ib_dev.num_comp_vectors = dev->caps.num_comp_vectors; > ibdev->ib_dev.dma_device = &dev->pdev->dev; > > ibdev->ib_dev.uverbs_abi_ver = MLX4_IB_UVERBS_ABI_VERSION; > diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c > index b7ad282..ac57b6a 100644 > --- a/drivers/net/mlx4/cq.c > +++ b/drivers/net/mlx4/cq.c > @@ -189,7 +189,7 @@ EXPORT_SYMBOL_GPL(mlx4_cq_resize); > > int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, > struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, > - int collapsed) > + unsigned vector, int collapsed) > { > struct mlx4_priv *priv = mlx4_priv(dev); > struct mlx4_cq_table *cq_table = &priv->cq_table; > @@ -198,6 +198,11 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, > u64 mtt_addr; > int err; > > + if (vector >= dev->caps.num_comp_vectors) > + return -EINVAL; > + > + cq->vector = vector; > + > cq->cqn = mlx4_bitmap_alloc(&cq_table->bitmap); > if (cq->cqn == -1) > return -ENOMEM; > @@ -227,7 +232,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, > > cq_context->flags = cpu_to_be32(!!collapsed << 18); > cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index); > - cq_context->comp_eqn = priv->eq_table.eq[MLX4_EQ_COMP].eqn; > + cq_context->comp_eqn = priv->eq_table.eq[vector].eqn; > cq_context->log_page_size = mtt->page_shift - MLX4_ICM_PAGE_SHIFT; > > mtt_addr = mlx4_mtt_addr(dev, mtt); > @@ -276,7 +281,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq) > if (err) > mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn); > > - synchronize_irq(priv->eq_table.eq[MLX4_EQ_COMP].irq); > + synchronize_irq(priv->eq_table.eq[cq->vector].irq); > > spin_lock_irq(&cq_table->lock); > radix_tree_delete(&cq_table->tree, cq->cqn); > diff --git a/drivers/net/mlx4/en_cq.c b/drivers/net/mlx4/en_cq.c > index 1368a80..674f836 100644 > --- a/drivers/net/mlx4/en_cq.c > +++ b/drivers/net/mlx4/en_cq.c > @@ -51,10 +51,13 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, > int err; > > cq->size = entries; > - if (mode == RX) > + if (mode == RX) { > cq->buf_size = cq->size * sizeof(struct mlx4_cqe); > - else > + cq->vector = ring % mdev->dev->caps.num_comp_vectors; > + } else { > cq->buf_size = sizeof(struct mlx4_cqe); > + cq->vector = 0; > + } > > cq->ring = ring; > cq->is_tx = mode; > @@ -86,7 +89,7 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq) > memset(cq->buf, 0, cq->buf_size); > > err = mlx4_cq_alloc(mdev->dev, cq->size, &cq->wqres.mtt, &mdev->priv_uar, > - cq->wqres.db.dma, &cq->mcq, cq->is_tx); > + cq->wqres.db.dma, &cq->mcq, cq->vector, cq->is_tx); > if (err) > return err; > > diff --git a/drivers/net/mlx4/en_main.c b/drivers/net/mlx4/en_main.c > index 4b9794e..e44e018 100644 > --- a/drivers/net/mlx4/en_main.c > +++ b/drivers/net/mlx4/en_main.c > @@ -170,9 +170,9 @@ static void *mlx4_en_add(struct mlx4_dev *dev) > mlx4_info(mdev, "Using %d tx rings for port:%d\n", > mdev->profile.prof[i].tx_ring_num, i); > if (!mdev->profile.prof[i].rx_ring_num) { > - mdev->profile.prof[i].rx_ring_num = 1; > + mdev->profile.prof[i].rx_ring_num = dev->caps.num_comp_vectors;; > mlx4_info(mdev, "Defaulting to %d rx rings for port:%d\n", > - 1, i); > + mdev->profile.prof[i].rx_ring_num, i); > } else > mlx4_info(mdev, "Using %d rx rings for port:%d\n", > mdev->profile.prof[i].rx_ring_num, i); > diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c > index de16933..5d867eb 100644 > --- a/drivers/net/mlx4/eq.c > +++ b/drivers/net/mlx4/eq.c > @@ -266,7 +266,7 @@ static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr) > > writel(priv->eq_table.clr_mask, priv->eq_table.clr_int); > > - for (i = 0; i < MLX4_NUM_EQ; ++i) > + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) > work |= mlx4_eq_int(dev, &priv->eq_table.eq[i]); > > return IRQ_RETVAL(work); > @@ -304,6 +304,17 @@ static int mlx4_HW2SW_EQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox, > MLX4_CMD_TIME_CLASS_A); > } > > +static int mlx4_num_eq_uar(struct mlx4_dev *dev) > +{ > + /* > + * Each UAR holds 4 EQ doorbells. To figure out how many UARs > + * we need to map, take the difference of highest index and > + * the lowest index we'll use and add 1. > + */ > + return (dev->caps.num_comp_vectors + 1 + dev->caps.reserved_eqs) / 4 - > + dev->caps.reserved_eqs / 4 + 1; > +} > + > static void __iomem *mlx4_get_eq_uar(struct mlx4_dev *dev, struct mlx4_eq *eq) > { > struct mlx4_priv *priv = mlx4_priv(dev); > @@ -483,9 +494,11 @@ static void mlx4_free_irqs(struct mlx4_dev *dev) > > if (eq_table->have_irq) > free_irq(dev->pdev->irq, dev); > - for (i = 0; i < MLX4_NUM_EQ; ++i) > + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) > if (eq_table->eq[i].have_irq) > free_irq(eq_table->eq[i].irq, eq_table->eq + i); > + > + kfree(eq_table->irq_names); > } > > static int mlx4_map_clr_int(struct mlx4_dev *dev) > @@ -551,57 +564,93 @@ void mlx4_unmap_eq_icm(struct mlx4_dev *dev) > __free_page(priv->eq_table.icm_page); > } > > +int mlx4_alloc_eq_table(struct mlx4_dev *dev) > +{ > + struct mlx4_priv *priv = mlx4_priv(dev); > + > + priv->eq_table.eq = kcalloc(dev->caps.num_eqs - dev->caps.reserved_eqs, > + sizeof *priv->eq_table.eq, GFP_KERNEL); > + if (!priv->eq_table.eq) > + return -ENOMEM; > + > + return 0; > +} > + > +void mlx4_free_eq_table(struct mlx4_dev *dev) > +{ > + kfree(mlx4_priv(dev)->eq_table.eq); > +} > + > int mlx4_init_eq_table(struct mlx4_dev *dev) > { > struct mlx4_priv *priv = mlx4_priv(dev); > int err; > int i; > > + priv->eq_table.uar_map = kcalloc(sizeof *priv->eq_table.uar_map, > + mlx4_num_eq_uar(dev), GFP_KERNEL); > + if (!priv->eq_table.uar_map) { > + err = -ENOMEM; > + goto err_out_free; > + } > + > err = mlx4_bitmap_init(&priv->eq_table.bitmap, dev->caps.num_eqs, > dev->caps.num_eqs - 1, dev->caps.reserved_eqs, 0); > if (err) > - return err; > + goto err_out_free; > > - for (i = 0; i < ARRAY_SIZE(priv->eq_table.uar_map); ++i) > + for (i = 0; i < mlx4_num_eq_uar(dev); ++i) > priv->eq_table.uar_map[i] = NULL; > > err = mlx4_map_clr_int(dev); > if (err) > - goto err_out_free; > + goto err_out_bitmap; > > priv->eq_table.clr_mask = > swab32(1 << (priv->eq_table.inta_pin & 31)); > priv->eq_table.clr_int = priv->clr_base + > (priv->eq_table.inta_pin < 32 ? 4 : 0); > > - err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, > - (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_COMP : 0, > - &priv->eq_table.eq[MLX4_EQ_COMP]); > - if (err) > - goto err_out_unmap; > + priv->eq_table.irq_names = kmalloc(16 * dev->caps.num_comp_vectors, GFP_KERNEL); > + if (!priv->eq_table.irq_names) { > + err = -ENOMEM; > + goto err_out_bitmap; > + } > + > + for (i = 0; i < dev->caps.num_comp_vectors; ++i) { > + err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, > + (dev->flags & MLX4_FLAG_MSI_X) ? i : 0, > + &priv->eq_table.eq[i]); > + if (err) > + goto err_out_unmap; > + } > > err = mlx4_create_eq(dev, MLX4_NUM_ASYNC_EQE + MLX4_NUM_SPARE_EQE, > - (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_ASYNC : 0, > - &priv->eq_table.eq[MLX4_EQ_ASYNC]); > + (dev->flags & MLX4_FLAG_MSI_X) ? dev->caps.num_comp_vectors : 0, > + &priv->eq_table.eq[dev->caps.num_comp_vectors]); > if (err) > goto err_out_comp; > > if (dev->flags & MLX4_FLAG_MSI_X) { > - static const char *eq_name[] = { > - [MLX4_EQ_COMP] = DRV_NAME " (comp)", > - [MLX4_EQ_ASYNC] = DRV_NAME " (async)" > - }; > + static const char async_eq_name[] = "mlx4-async"; > + const char *eq_name; > + > + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) { > + if (i < dev->caps.num_comp_vectors) { > + snprintf(priv->eq_table.irq_names + i * 16, 16, > + "mlx4-comp-%d", i); > + eq_name = priv->eq_table.irq_names + i * 16; > + } else > + eq_name = async_eq_name; > > - for (i = 0; i < MLX4_NUM_EQ; ++i) { > err = request_irq(priv->eq_table.eq[i].irq, > - mlx4_msi_x_interrupt, > - 0, eq_name[i], priv->eq_table.eq + i); > + mlx4_msi_x_interrupt, 0, eq_name, > + priv->eq_table.eq + i); > if (err) > goto err_out_async; > > priv->eq_table.eq[i].have_irq = 1; > } > - > } else { > err = request_irq(dev->pdev->irq, mlx4_interrupt, > IRQF_SHARED, DRV_NAME, dev); > @@ -612,28 +661,36 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) > } > > err = mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0, > - priv->eq_table.eq[MLX4_EQ_ASYNC].eqn); > + priv->eq_table.eq[dev->caps.num_comp_vectors].eqn); > if (err) > mlx4_warn(dev, "MAP_EQ for async EQ %d failed (%d)\n", > - priv->eq_table.eq[MLX4_EQ_ASYNC].eqn, err); > + priv->eq_table.eq[dev->caps.num_comp_vectors].eqn, err); > > - for (i = 0; i < MLX4_NUM_EQ; ++i) > + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) > eq_set_ci(&priv->eq_table.eq[i], 1); > > return 0; > > err_out_async: > - mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_ASYNC]); > + mlx4_free_eq(dev, &priv->eq_table.eq[dev->caps.num_comp_vectors]); > > err_out_comp: > - mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP]); > + i = dev->caps.num_comp_vectors - 1; > > err_out_unmap: > + while (i >= 0) { > + mlx4_free_eq(dev, &priv->eq_table.eq[i]); > + --i; > + } > mlx4_unmap_clr_int(dev); > mlx4_free_irqs(dev); > > -err_out_free: > +err_out_bitmap: > mlx4_bitmap_cleanup(&priv->eq_table.bitmap); > + > +err_out_free: > + kfree(priv->eq_table.uar_map); > + > return err; > } > > @@ -643,18 +700,20 @@ void mlx4_cleanup_eq_table(struct mlx4_dev *dev) > int i; > > mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 1, > - priv->eq_table.eq[MLX4_EQ_ASYNC].eqn); > + priv->eq_table.eq[dev->caps.num_comp_vectors].eqn); > > mlx4_free_irqs(dev); > > - for (i = 0; i < MLX4_NUM_EQ; ++i) > + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) > mlx4_free_eq(dev, &priv->eq_table.eq[i]); > > mlx4_unmap_clr_int(dev); > > - for (i = 0; i < ARRAY_SIZE(priv->eq_table.uar_map); ++i) > + for (i = 0; i < mlx4_num_eq_uar(dev); ++i) > if (priv->eq_table.uar_map[i]) > iounmap(priv->eq_table.uar_map[i]); > > mlx4_bitmap_cleanup(&priv->eq_table.bitmap); > + > + kfree(priv->eq_table.uar_map); > } > diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c > index 90a0281..a69ed57 100644 > --- a/drivers/net/mlx4/main.c > +++ b/drivers/net/mlx4/main.c > @@ -421,9 +421,7 @@ static int mlx4_init_cmpt_table(struct mlx4_dev *dev, u64 cmpt_base, > ((u64) (MLX4_CMPT_TYPE_EQ * > cmpt_entry_sz) << MLX4_CMPT_SHIFT), > cmpt_entry_sz, > - roundup_pow_of_two(MLX4_NUM_EQ + > - dev->caps.reserved_eqs), > - MLX4_NUM_EQ + dev->caps.reserved_eqs, 0, 0); > + dev->caps.num_eqs, dev->caps.num_eqs, 0, 0); > if (err) > goto err_cq; > > @@ -810,12 +808,12 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) > if (dev->flags & MLX4_FLAG_MSI_X) { > mlx4_warn(dev, "NOP command failed to generate MSI-X " > "interrupt IRQ %d).\n", > - priv->eq_table.eq[MLX4_EQ_ASYNC].irq); > + priv->eq_table.eq[dev->caps.num_comp_vectors].irq); > mlx4_warn(dev, "Trying again without MSI-X.\n"); > } else { > mlx4_err(dev, "NOP command failed to generate interrupt " > "(IRQ %d), aborting.\n", > - priv->eq_table.eq[MLX4_EQ_ASYNC].irq); > + priv->eq_table.eq[dev->caps.num_comp_vectors].irq); > mlx4_err(dev, "BIOS or ACPI interrupt routing problem?\n"); > } > > @@ -908,31 +906,47 @@ err_uar_table_free: > static void mlx4_enable_msi_x(struct mlx4_dev *dev) > { > struct mlx4_priv *priv = mlx4_priv(dev); > - struct msix_entry entries[MLX4_NUM_EQ]; > + struct msix_entry *entries; > + int nreq; > int err; > int i; > > if (msi_x) { > - for (i = 0; i < MLX4_NUM_EQ; ++i) > + nreq = min(dev->caps.num_eqs - dev->caps.reserved_eqs, > + num_possible_cpus() + 1); > + entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL); > + if (!entries) > + goto no_msi; > + > + for (i = 0; i < nreq; ++i) > entries[i].entry = i; > > - err = pci_enable_msix(dev->pdev, entries, ARRAY_SIZE(entries)); > + retry: > + err = pci_enable_msix(dev->pdev, entries, nreq); > if (err) { > - if (err > 0) > - mlx4_info(dev, "Only %d MSI-X vectors available, " > - "not using MSI-X\n", err); > + if (err > 0) { > + mlx4_info(dev, "Requested %d vectors, " > + "but only %d MSI-X vectors available, " > + "trying again\n", nreq, err); > + goto retry; > + } > goto no_msi; > } > > - for (i = 0; i < MLX4_NUM_EQ; ++i) > + dev->caps.num_comp_vectors = nreq - 1; > + for (i = 0; i < nreq; ++i) > priv->eq_table.eq[i].irq = entries[i].vector; > > dev->flags |= MLX4_FLAG_MSI_X; > + > + kfree(entries); > return; > } > > no_msi: > - for (i = 0; i < MLX4_NUM_EQ; ++i) > + dev->caps.num_comp_vectors = 1; > + > + for (i = 0; i < 2; ++i) > priv->eq_table.eq[i].irq = dev->pdev->irq; > } > > @@ -1074,6 +1088,10 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) > if (err) > goto err_cmd; > > + err = mlx4_alloc_eq_table(dev); > + if (err) > + goto err_close; > + > mlx4_enable_msi_x(dev); > > err = mlx4_setup_hca(dev); > @@ -1084,7 +1102,7 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) > } > > if (err) > - goto err_close; > + goto err_free_eq; > > for (port = 1; port <= dev->caps.num_ports; port++) { > err = mlx4_init_port_info(dev, port); > @@ -1114,6 +1132,9 @@ err_port: > mlx4_cleanup_pd_table(dev); > mlx4_cleanup_uar_table(dev); > > +err_free_eq: > + mlx4_free_eq_table(dev); > + > err_close: > if (dev->flags & MLX4_FLAG_MSI_X) > pci_disable_msix(pdev); > @@ -1177,6 +1198,7 @@ static void mlx4_remove_one(struct pci_dev *pdev) > iounmap(priv->kar); > mlx4_uar_free(dev, &priv->driver_uar); > mlx4_cleanup_uar_table(dev); > + mlx4_free_eq_table(dev); > mlx4_close_hca(dev); > mlx4_cmd_cleanup(dev); > > diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h > index 34c909d..e0213ba 100644 > --- a/drivers/net/mlx4/mlx4.h > +++ b/drivers/net/mlx4/mlx4.h > @@ -63,12 +63,6 @@ enum { > }; > > enum { > - MLX4_EQ_ASYNC, > - MLX4_EQ_COMP, > - MLX4_NUM_EQ > -}; > - > -enum { > MLX4_NUM_PDS = 1 << 15 > }; > > @@ -205,10 +199,11 @@ struct mlx4_cq_table { > > struct mlx4_eq_table { > struct mlx4_bitmap bitmap; > + char *irq_names; > void __iomem *clr_int; > - void __iomem *uar_map[(MLX4_NUM_EQ + 6) / 4]; > + void __iomem **uar_map; > u32 clr_mask; > - struct mlx4_eq eq[MLX4_NUM_EQ]; > + struct mlx4_eq *eq; > u64 icm_virt; > struct page *icm_page; > dma_addr_t icm_dma; > @@ -328,6 +323,9 @@ void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap); > > int mlx4_reset(struct mlx4_dev *dev); > > +int mlx4_alloc_eq_table(struct mlx4_dev *dev); > +void mlx4_free_eq_table(struct mlx4_dev *dev); > + > int mlx4_init_pd_table(struct mlx4_dev *dev); > int mlx4_init_uar_table(struct mlx4_dev *dev); > int mlx4_init_mr_table(struct mlx4_dev *dev); > diff --git a/drivers/net/mlx4/profile.c b/drivers/net/mlx4/profile.c > index 9ca42b2..919fb9e 100644 > --- a/drivers/net/mlx4/profile.c > +++ b/drivers/net/mlx4/profile.c > @@ -107,7 +107,9 @@ u64 mlx4_make_profile(struct mlx4_dev *dev, > profile[MLX4_RES_AUXC].num = request->num_qp; > profile[MLX4_RES_SRQ].num = request->num_srq; > profile[MLX4_RES_CQ].num = request->num_cq; > - profile[MLX4_RES_EQ].num = MLX4_NUM_EQ + dev_cap->reserved_eqs; > + profile[MLX4_RES_EQ].num = min(dev_cap->max_eqs, > + dev_cap->reserved_eqs + > + num_possible_cpus() + 1); > profile[MLX4_RES_DMPT].num = request->num_mpt; > profile[MLX4_RES_CMPT].num = MLX4_NUM_CMPTS; > profile[MLX4_RES_MTT].num = request->num_mtt; > diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h > index 371086f..8f659cc 100644 > --- a/include/linux/mlx4/device.h > +++ b/include/linux/mlx4/device.h > @@ -206,6 +206,7 @@ struct mlx4_caps { > int reserved_cqs; > int num_eqs; > int reserved_eqs; > + int num_comp_vectors; > int num_mpts; > int num_mtt_segs; > int fmr_reserved_mtts; > @@ -328,6 +329,7 @@ struct mlx4_cq { > int arm_sn; > > int cqn; > + unsigned vector; > > atomic_t refcount; > struct completion free; > @@ -437,7 +439,7 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, > > int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, > struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, > - int collapsed); > + unsigned vector, int collapsed); > void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); > > int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Sun Dec 21 07:09:53 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 21 Dec 2008 07:09:53 -0800 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: <3b5e77ad0812210414n73765c2iccf2fc98c492c07c@mail.gmail.com> (Ron Livne's message of "Sun, 21 Dec 2008 14:14:30 +0200") References: <4907348E.7060508@mellanox.co.il> <3b5e77ad0812210414n73765c2iccf2fc98c492c07c@mail.gmail.com> Message-ID: > Wouldn't going to retry with the same nreq num instead of the err > value might produce an infinite loop? yep, I never exercised that code. Will fix, thanks. - R. From sashak at voltaire.com Sun Dec 21 07:21:00 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 21 Dec 2008 17:21:00 +0200 Subject: [ofa-general] ***SPAM*** Re: [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: <20081211162031.0c591f54.weiny2@llnl.gov> References: <20081211162031.0c591f54.weiny2@llnl.gov> Message-ID: <20081221152100.GN25208@sashak.voltaire.com> Hi Ira, Some initial comments... On 16:20 Thu 11 Dec , Ira Weiny wrote: > From d615162e547f3a2b2d1acd8c79c24ee691c96c95 Mon Sep 17 00:00:00 2001 > From: Ira Weiny > Date: Wed, 26 Nov 2008 12:54:47 -0800 > Subject: [PATCH] Create a new library libibnetdisc > > This encompasses the functionality of ibnetdiscover in a C library. It returns > a single "ibnd_fabric_t" object which represents the data found during the > scan. The NodeInfo, PortInfo, and SwitchInfo are preserved from the queries > made on the fabric to be used by the calling function as they see fit. > > This greatly benefits some diags like iblinkinfo.pl. This diag in particular > was re-written using this library in C and has shown an 85% speed up on a ~1000 > node cluster. > > Previous iblinkinfo.pl > real 3m35.876s > user 0m13.210s > sys 1m1.046s > > New iblinkinfotest > real 0m32.869s > user 0m0.067s > sys 0m0.140s > > Signed-off-by: Ira Weiny > --- > infiniband-diags/Makefile.am | 1 + > infiniband-diags/configure.in | 31 +- > infiniband-diags/libibnetdisc/Makefile.am | 66 ++ > .../libibnetdisc/include/infiniband/ibnetdisc.h | 276 ++++++ > infiniband-diags/libibnetdisc/libibnetdisc.ver | 9 + > infiniband-diags/libibnetdisc/man/ibnd_debug.3 | 2 + > .../libibnetdisc/man/ibnd_destroy_fabric.3 | 2 + > .../libibnetdisc/man/ibnd_discover_fabric.3 | 49 ++ > .../libibnetdisc/man/ibnd_find_node_dr.3 | 2 + > .../libibnetdisc/man/ibnd_find_node_guid.3 | 25 + > .../libibnetdisc/man/ibnd_iter_nodes.3 | 24 + > .../libibnetdisc/man/ibnd_iter_nodes_type.3 | 2 + > .../libibnetdisc/man/ibnd_linkspeed_str.3 | 2 + > .../libibnetdisc/man/ibnd_linkstate_str.3 | 2 + > .../libibnetdisc/man/ibnd_linkwidth_str.3 | 26 + > .../libibnetdisc/man/ibnd_node_type_str.3 | 2 + > .../libibnetdisc/man/ibnd_node_type_str_short.3 | 2 + > .../libibnetdisc/man/ibnd_physstate_str.3 | 2 + > .../libibnetdisc/man/ibnd_show_progress.3 | 2 + > .../libibnetdisc/man/ibnd_update_node.3 | 21 + > infiniband-diags/libibnetdisc/src/chassis.c | 818 ++++++++++++++++++ > infiniband-diags/libibnetdisc/src/chassis.h | 85 ++ > infiniband-diags/libibnetdisc/src/ibnetdisc.c | 872 ++++++++++++++++++++ > infiniband-diags/libibnetdisc/src/internal.h | 82 ++ > infiniband-diags/libibnetdisc/src/libibnetdisc.map | 27 + > .../libibnetdisc/test/iblinkinfotest.c | 395 +++++++++ > infiniband-diags/libibnetdisc/test/ibnetdisctest.c | 675 +++++++++++++++ > infiniband-diags/libibnetdisc/test/testleaks.c | 268 ++++++ > 28 files changed, 3769 insertions(+), 1 deletions(-) > create mode 100644 infiniband-diags/libibnetdisc/Makefile.am > create mode 100644 infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h > create mode 100644 infiniband-diags/libibnetdisc/libibnetdisc.ver > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_debug.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_update_node.3 > create mode 100644 infiniband-diags/libibnetdisc/src/chassis.c > create mode 100644 infiniband-diags/libibnetdisc/src/chassis.h > create mode 100644 infiniband-diags/libibnetdisc/src/ibnetdisc.c > create mode 100644 infiniband-diags/libibnetdisc/src/internal.h > create mode 100644 infiniband-diags/libibnetdisc/src/libibnetdisc.map > create mode 100644 infiniband-diags/libibnetdisc/test/iblinkinfotest.c > create mode 100644 infiniband-diags/libibnetdisc/test/ibnetdisctest.c > create mode 100644 infiniband-diags/libibnetdisc/test/testleaks.c > > diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am > index c22ba5e..8e8c3c1 100644 > --- a/infiniband-diags/Makefile.am > +++ b/infiniband-diags/Makefile.am > @@ -1,3 +1,4 @@ > +SUBDIRS = libibnetdisc > > INCLUDES = -I$(top_builddir)/include/ -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband > > diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in > index 5509fec..7c346e2 100644 > --- a/infiniband-diags/configure.in > +++ b/infiniband-diags/configure.in > @@ -145,6 +145,34 @@ IBSCRIPTPATH_TMP2="`echo $IBSCRIPTPATH_TMP1 | sed 's/^NONE/$ac_default_prefix/'` > IBSCRIPTPATH="`eval echo $IBSCRIPTPATH_TMP2`" > AC_SUBST(IBSCRIPTPATH) > > +dnl Begin libibnetdisc stuff > +AC_CHECK_HEADERS([stdint.h stdlib.h string.h syslog.h unistd.h]) I cannot find where syslog.h is actually used in infiniband-diags. > +AC_CHECK_FUNCS([strrchr strtoul strtoull]) > + > +ibnetdisc_api_version=`grep LIBVERSION $srcdir/libibnetdisc/libibnetdisc.ver | sed 's/LIBVERSION=//'` > +if test -z $ibnetdisc_api_version; then > + echo "FAILED to find $srcdir/libibnetdisc/libibnetdisc.ver" > + exit 1 > +fi > +AC_SUBST(ibnetdisc_api_version) > +AC_DEFINE_UNQUOTED(API_VERSION, > + ["$ibnetdisc_api_version"], > + [The API version of this library]) > + > +AC_MSG_CHECKING(for --enable-test-utils) > +AC_ARG_ENABLE(test-utils, > +[ --enable-test-utils build additional test utilities (default=no)], > +[case "${enableval}" in > + yes) tutils=yes ;; > + no) tutils=no ;; > + *) AC_MSG_ERROR(bad value ${enableval} for --enable-test-utils) ;; > +esac],[tutils=no]) > +AM_CONDITIONAL(ENABLE_TEST_UTILS, test x$tutils = xyes) > +AC_MSG_RESULT(${tutils=no}) > + > +dnl End libibnetdisc stuff > + > + > AC_CONFIG_FILES([\ > Makefile \ > infiniband-diags.spec \ > @@ -165,6 +193,7 @@ AC_CONFIG_FILES([\ > scripts/ibhosts \ > scripts/ibnodes \ > scripts/ibswitches \ > - scripts/ibrouters > + scripts/ibrouters \ > + libibnetdisc/Makefile > ]) > AC_OUTPUT > diff --git a/infiniband-diags/libibnetdisc/Makefile.am b/infiniband-diags/libibnetdisc/Makefile.am > new file mode 100644 > index 0000000..7b478b1 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/Makefile.am > @@ -0,0 +1,66 @@ > + > +#SUBDIRS = . > + > +INCLUDES = -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband > + > +lib_LTLIBRARIES = libibnetdisc.la > +sbin_PROGRAMS = > + > +if ENABLE_TEST_UTILS > +sbin_PROGRAMS += test/ibnetdisctest \ > + test/iblinkinfotest \ > + test/testleaks > +endif > + > +DBGFLAGS = -g > + > +if HAVE_LD_VERSION_SCRIPT > +libibnetdisc_version_script = -Wl,--version-script=$(srcdir)/src/libibnetdisc.map > +else > +libibnetdisc_version_script = > +endif > + > +libibnetdisc_la_SOURCES = src/ibnetdisc.c src/chassis.c src/chassis.h > +libibnetdisc_la_CFLAGS = -Wall $(DBGFLAGS) > +libibnetdisc_la_LDFLAGS = -version-info $(ibnetdisc_api_version) \ > + -export-dynamic $(libibnetdisc_version_script) \ > + -losmcomp -libmad > +libibnetdisc_la_DEPENDENCIES = $(srcdir)/src/libibnetdisc.map > + > +libibnetdiscincludedir = $(includedir)/infiniband > + > +test_ibnetdisctest_SOURCES = test/ibnetdisctest.c > +test_ibnetdisctest_CFLAGS = -Wall $(DBGFLAGS) > +test_ibnetdisctest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ > + -libcommon -libnetdisc > + > +test_iblinkinfotest_SOURCES = test/iblinkinfotest.c > +test_iblinkinfotest_CFLAGS = -Wall $(DBGFLAGS) > +test_iblinkinfotest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ > + -libcommon -libnetdisc > + > +test_testleaks_SOURCES = test/testleaks.c > +test_testleaks_CFLAGS = -Wall $(DBGFLAGS) > +test_testleaks_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ > + -libcommon -libnetdisc Having --rpath in Makefiles is not permitted by FC and RH package review process. I know that it is not something introduced by this patch and infiniband-diags/Makefile.am has it already, but I think to clean this up some days. > + > +libibnetdiscinclude_HEADERS = $(srcdir)/include/infiniband/ibnetdisc.h > + > +man_MANS = man/ibnd_debug.3 \ > + man/ibnd_destroy_fabric.3 \ > + man/ibnd_discover_fabric.3 \ > + man/ibnd_find_node_dr.3 \ > + man/ibnd_find_node_guid.3 \ > + man/ibnd_iter_nodes.3 \ > + man/ibnd_iter_nodes_type.3 \ > + man/ibnd_linkspeed_str.3 \ > + man/ibnd_linkstate_str.3 \ > + man/ibnd_linkwidth_str.3 \ > + man/ibnd_node_type_str.3 \ > + man/ibnd_physstate_str.3 \ > + man/ibnd_update_node.3 \ > + man/ibnd_show_progress.3 > + > +EXTRA_DIST = libibnetdisc.spec.in libibnetdisc.spec \ Files *.spec.in and *.spec don't exist anymore and 'make dist' fails. > + $(srcdir)/src/libibnetdisc.map libibnetdisc.ver autogen.sh > + > diff --git a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h > new file mode 100644 > index 0000000..cdee2bd > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h > @@ -0,0 +1,276 @@ > +/* > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#ifndef _IBNETDISC_H_ > +#define _IBNETDISC_H_ > + > +#include > +#include > + > +#define MAXHOPS 63 > + > +/* HASH table defines */ > +#define HASHGUID(guid) ((uint32_t)(((uint32_t)(guid) * 101) ^ ((uint32_t)((guid) >> 32) * 103))) > +#define HTSZ 137 > + > +#define IBND_DEBUG(str, args...) \ > + if (ibdebug) printf("%s:%d; "str, __FILE__, __LINE__, ##args) > +#define IBND_ERROR(str, args...) \ > + fprintf(stderr, "%s:%d; "str, __FILE__, __LINE__, ##args) > + > +/** ========================================================================= > + * ENUM definitions > + */ > +typedef enum { > + IBND_CA_NODE = 1, > + IBND_SWITCH_NODE = 2, > + IBND_ROUTER_NODE = 3 > +} ibnd_node_type_t; > + > +typedef enum { > + IBND_LINK_DOWN = 1, > + IBND_LINK_INIT = 2, > + IBND_LINK_ARMED = 3, > + IBND_LINK_ACTIVE = 4 > +} ibnd_link_state_t; > + > +/** ========================================================================= > + * Node > + */ > +typedef struct switch_info { > + int smaenhsp0; > +} ibnd_switch_info_t; > + > +typedef struct node_info { > + int base_ver; > + int class_ver; > + int type; > + int numports; > + uint64_t sysimgguid; > + uint64_t nodeguid; > + uint64_t nodeportguid; > + uint16_t partition_cap; > + uint32_t devid; > + uint32_t revision; > + int localport; > + uint32_t vendid; > +} ibnd_node_info_t; > + > +struct ib_fabric; /* forward declare */ > +struct chassis; /* forward declare */ > +struct port; /* forward declare */ > + > +typedef struct node { > + struct node *next; /* all node list in fabric */ > + struct ib_fabric *fabric; /* the fabric node belongs to */ > + > + ib_portid_t path_portid; /* path from "from_node" */ > + int dist; /* num of hops from "from_node" */ > + int smalid; > + int smalmc; > + ibnd_switch_info_t sw_info; > + ibnd_node_info_t info; > + char nodedesc[64]; > + struct port **ports; /* in order array of port pointers */ > + /* the size of this array is info.numports + 1 */ > + /* items MAY BE NULL! (ie 0 == switches only) */ > + > + /* chassis info */ > + struct node *next_chassis_node; /* next node in ibnd_chassis_t->nodes */ > + struct chassis *chassis; /* if != NULL the chassis this node belongs to */ > + unsigned char ch_type; > + unsigned char ch_anafanum; > + unsigned char ch_slotnum; > + unsigned char ch_slot; > +} ibnd_node_t; > + > +/** ========================================================================= > + * Port > + */ > +typedef struct port_info { > + int lid; > + int smlid; > + int link_speed_supported; > + int link_speed_enabled; > + int link_speed_active; > + int link_state; > + int phys_state; > + int link_down_def_state; > + int mkey_prot_bits; > + int lmc; > + int neighbor_mtu; > + int smsl; > + int init_type; > + int vl_capability; > + int vl_high_limit; > + int vl_arb_high_cap; > + int vl_arb_low_cap; > + int init_reply; > + int mtu_cap; > + int vl_stall_count; > + int hoq_lifetime; > + int oper_vls; > + int partition_enforce_in; > + int partition_enforce_out; > + int filter_raw_in; > + int filter_raw_out; > + int mkey_violations; > + int pkey_violations; > + int qkey_violations; > + int guid_capabilities; > + int client_rereg; > + int subnet_timeout; > + int response_time_val; > + int local_phys_error; > + int overrun_error; > + int max_credit_hint; > + uint32_t link_round_trip; > + int local_port; > + int link_width_supported; > + int link_width_enabled; > + int link_width_active; > + int diag_code; > + int mkey_lease; > + uint32_t capability_mask; > + uint64_t mkey; > + uint64_t gid_prefix; > +} ibnd_port_info_t; What is the reason to redeclear custom NodeInfo and PortInfo structures? The original are defined by IBA and there are lot of utilities to work with them. Wouldn't it be better to use it as is? > + > +typedef struct port { > + uint64_t guid; > + int portnum; > + int ext_portnum; /* optional if != 0 external port num */ > + ibnd_node_t *node; /* node this port belongs to */ > + ibnd_port_info_t info; > + struct port *remoteport; /* null if SMA, or does not exist */ > +} ibnd_port_t; > + > + > +/** ========================================================================= > + * Chassis data > + */ > +typedef struct chassis { > + struct chassis *next; > + uint64_t chassisguid; > + int chassisnum; > + > + /* generic grouping by SystemImageGUID */ > + int nodecount; > + ibnd_node_t *nodes; > + > + /* specific to voltaire type nodes */ > +#define SPINES_MAX_NUM 12 > +#define LINES_MAX_NUM 36 > + ibnd_node_t *spinenode[SPINES_MAX_NUM + 1]; > + ibnd_node_t *linenode[LINES_MAX_NUM + 1]; > +} ibnd_chassis_t; > + > +/** ========================================================================= > + * Fabric > + * Main fabric object which is returned and represents the data discovered > + */ > +typedef struct ib_fabric { > + /* the node the discover was initiated from > + * "from" parameter in ibnd_discover_fabric > + * or by default the node you ar running on > + */ > + ibnd_node_t *from_node; > + /* NULL term list of all nodes in the fabric */ > + ibnd_node_t *nodes; > + /* NULL terminated list of all chassis found in the fabric */ > + ibnd_chassis_t *chassis; > + int maxhops_discovered; > +} ibnd_fabric_t; > + > + > +/** ========================================================================= > + * Initialization (fabric operations) > + */ > +void ibnd_debug(int i); > +void ibnd_show_progress(int i); > + > +ibnd_fabric_t *ibnd_discover_fabric(char *dev_name, int dev_port, > + int timeout_ms, ib_portid_t *from, int hops); > + /** > + * dev_name: (required) local device name to use to access the fabric > + * dev_port: (required) local device port to use to access the fabric > + * timeout_ms: (required) gives the timeout for a _SINGLE_ query on > + * the fabric. So if there are mutiple nodes not > + * responding this may result in a lengthy delay. > + * from: (optional) specify the node to start scanning from. > + * If NULL start from the node we are running on. > + * hops: (optional) Specify how much of the fabric to traverse. > + * negative value == scan entire fabric > + */ > +void ibnd_destroy_fabric(ibnd_fabric_t *fabric); > + > +/** ========================================================================= > + * Node operations > + */ > +ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid); > +ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str); > +ibnd_node_t *ibnd_update_node(ibnd_node_t *node); > + > +typedef void (*ibnd_iter_node_func_t)(ibnd_node_t *node, void *user_data); > +void ibnd_iter_nodes(ibnd_fabric_t *fabric, > + ibnd_iter_node_func_t func, > + void *user_data); > +void ibnd_iter_nodes_type(ibnd_fabric_t *fabric, > + ibnd_iter_node_func_t func, > + ibnd_node_type_t node_type, > + void *user_data); > + > +/** ========================================================================= > + * Str convert functions > + */ > +char *ibnd_linkwidth_str(int link_width); > +char *ibnd_linkstate_str(int link_state); > +char *ibnd_physstate_str(int phys_state); > +const char *ibnd_node_type_str(ibnd_node_t *node); > +const char *ibnd_node_type_str_short(ibnd_node_t *node); > +char *ibnd_linkspeed_str(int link_speed, int data_rate); > + /* if data_rate == 0 use "SDR", "DDR", etc. */ > + /* if data_rate == 1 use "2.5 Gbps", "5.0 Gbps", etc. */ Similar functions exist in libibmad. Why do we need another set? Sasha > + > +/** ========================================================================= > + * Chassis queries > + */ > +uint64_t ibnd_get_chassis_guid(ibnd_fabric_t *fabric, unsigned char chassisnum); > +char *ibnd_get_chassis_type(ibnd_node_t *node); > +char *ibnd_get_chassis_slot_str(ibnd_node_t *node, char *str, size_t size); > + > +int ibnd_is_xsigo_guid(uint64_t guid); > +int ibnd_is_xsigo_tca(uint64_t guid); > +int ibnd_is_xsigo_hca(uint64_t guid); > + > +#endif /* _IBNETDISC_H_ */ > diff --git a/infiniband-diags/libibnetdisc/libibnetdisc.ver b/infiniband-diags/libibnetdisc/libibnetdisc.ver > new file mode 100644 > index 0000000..a0a5f3c > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/libibnetdisc.ver > @@ -0,0 +1,9 @@ > +# In this file we track the current API version > +# of the IB net discover interface (and libraries) > +# The version is built of the following > +# tree numbers: > +# API_REV:RUNNING_REV:AGE > +# API_REV - advance on any added API > +# RUNNING_REV - advance any change to the vendor files > +# AGE - number of backward versions the API still supports > +LIBVERSION=1:0:0 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_debug.3 b/infiniband-diags/libibnetdisc/man/ibnd_debug.3 > new file mode 100644 > index 0000000..a4076fc > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_debug.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_DEBUG 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_discover_fabric.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 b/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 > new file mode 100644 > index 0000000..8fe20ae > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_DESTROY_FABRIC 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_discover_fabric.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 b/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 > new file mode 100644 > index 0000000..44d8c65 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 > @@ -0,0 +1,49 @@ > +.TH IBND_DISCOVER_FABRIC 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.SH "NAME" > +ibnd_discover_fabric, ibnd_destroy_fabric, ibnd_debug ibnd_show_progress \- initialize ibnetdiscover library. > +.SH "SYNOPSIS" > +.nf > +.B #include > +.sp > +.BI "ibnd_fabric_t *ibnd_discover_fabric(char *dev_name, int dev_port, int timeout_ms, ib_portid_t *from, int hops)" > +.BI "void ibnd_destroy_fabric(ibnd_fabric_t *fabric)" > +.BI "void ibnd_debug(int i)" > +.BI "void ibnd_show_progress(int i)" > + > + > +.SH "DESCRIPTION" > +.B ibnd_discover_fabric() > +Discover the fabric connected to the port specified by dev_name and dev_port, using a timeout specified. The "from" and "hops" parameters are optional and allow one to scan part of a fabric by specifying a node "from" and a number of hops away from that node to scan, "hops". This gives the user a "sub-fabric" which is "centered" anywhere they chose. > + > +.B ibnd_destroy_fabric() > +free all memory and resources associated with the fabric. > + > +.B ibnd_debug() > +Set the debug level to be printed as library operations take place. > + > +.B ibnd_debug() > +Indicate that the library should print debug output which shows it's progress > +through the fabric. > + > +.SH "RETURN VALUE" > +.B ibnd_discover_fabric() > +return NULL on failure, otherwise a valid ibnd_fabric_t object. > + > +.B ibnd_destory_fabric(), ibnd_debug() > +NONE > + > +.SH "EXAMPLES" > + > +.B Discover the entire fabric connected to device "mthca0", port 1. > + > + ibnd_discover_fabric("mthca0", 1, 100, NULL, 0); > + > +.B Discover only a single node and those nodes connected to it. > + > + str2drpath(&(port_id.drpath), from, 0, 0); > + > + ibnd_discover_fabric("mthca0", 1, 100, &port_id, 1); > + > +.SH "AUTHORS" > +.TP > +Ira Weiny > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 b/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 > new file mode 100644 > index 0000000..612e501 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_FIND_NODE_DR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_find_node_guid.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 b/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 > new file mode 100644 > index 0000000..676b528 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 > @@ -0,0 +1,25 @@ > +.TH IBND_FIND_NODE_GUID 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.SH "NAME" > +ibnd_find_node_guid, ibnd_find_node_dr \- given a fabric object find the node object within it which matches the guid or directed route specified. > + > +.SH "SYNOPSIS" > +.nf > +.B #include > +.sp > +.BI "ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid)" > +.BI "ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str)" > + > +.SH "DESCRIPTION" > +.B ibnd_find_node_guid() > +Given a fabric object and a guid, return the ibnd_node_t object with that node guid. > +.B ibnd_find_node_dr() > +Given a fabric object and a directed route, return the ibnd_node_t object with > +that directed route. > + > +.SH "RETURN VALUE" > +.B ibnd_find_node_guid(), ibnd_find_node_dr() > +return NULL on failure, otherwise a valid ibnd_node_t object. > + > +.SH "AUTHORS" > +.TP > +Ira Weiny > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 > new file mode 100644 > index 0000000..7199dfb > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 > @@ -0,0 +1,24 @@ > +.TH IBND_ITER_NODES 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.SH "NAME" > +ibnd_iter_nodes, ibnd_iter_nodes_type \- given a fabric object and a function itterate over the nodes in the fabric. > + > +.SH "SYNOPSIS" > +.nf > +.B #include > +.sp > +.BI "void ibnd_iter_nodes(ibnd_fabric_t *fabric, ibnd_iter_func_t func, void *user_data)" > +.BI "void ibnd_iter_nodes_type(ibnd_fabric_t *fabric, ibnd_iter_func_t func, ibnd_node_type_t type, void *user_data)" > + > +.SH "DESCRIPTION" > +.B ibnd_iter_nodes() > +Itterate through all the nodes in the fabric and call "func" on them. > +.B ibnd_iter_nodes_type() > +The same as ibnd_iter_nodes except to limit the iteration to the nodes with the specified type. > + > +.SH "RETURN VALUE" > +.B ibnd_iter_nodes(), ibnd_iter_nodes_type() > +NONE > + > +.SH "AUTHORS" > +.TP > +Ira Weiny > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 > new file mode 100644 > index 0000000..878547c > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_FIND_NODES_TYPE 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_find_nodes.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 > new file mode 100644 > index 0000000..128cd3e > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_LINKSPEED_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_linkwidth_str.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 > new file mode 100644 > index 0000000..2fa9189 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_LINKSTATE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_linkwidth_str.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 > new file mode 100644 > index 0000000..2cd4f0a > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 > @@ -0,0 +1,26 @@ > +.TH IBND_LINKWIDTH_STR 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.SH "NAME" > +ibnd_linkwidth_str, ibnd_linkspeed_str, ibnd_linkstate_str, ibnd_physstate_str, ibnd_node_type_str \- prety string functions. > + > +.SH "SYNOPSIS" > +.nf > +.B #include > +.sp > +.BI > +.BI "char *ibnd_linkwidth_str(int link_width)" > +.BI "char *ibnd_linkspeed_str(int link_speed)" > +.BI "char *ibnd_linkstate_str(int link_state)" > +.BI "char *ibnd_physstate_str(int phys_state)" > +.BI "const char *ibnd_node_type_str(ibnd_node_t *node)" > +.BI "const char *ibnd_node_type_str_short(ibnd_node_t *node)" > + > +.SH "DESCRIPTION" > +Return user readable strings for the values given. > + > +.BI "const char *ibnd_node_type_str_short(ibnd_node_t *node)" > +Returns a shorter abbreviated version of the string. > + > + > +.SH "AUTHORS" > +.TP > +Ira Weiny > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 > new file mode 100644 > index 0000000..77dbf07 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_NODE_TYPE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_linkwidth_str.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 > new file mode 100644 > index 0000000..62feb6e > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_NODE_TYPE_STR_SHORT 3 "Aug 05, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_linkwidth_str.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 > new file mode 100644 > index 0000000..aeeaeb7 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_PHYSSTATE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_physstate_str.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 b/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 > new file mode 100644 > index 0000000..280af31 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_SHOW_PROGRESS 3 "Nov 26, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_discover_fabric.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 b/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 > new file mode 100644 > index 0000000..d3aa206 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 > @@ -0,0 +1,21 @@ > +.TH IBND_UPDATE_NODE 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.SH "NAME" > +ibnd_update_node \- Update the node specified with new data from the fabric. > + > +.SH "SYNOPSIS" > +.nf > +.B #include > +.sp > +.BI "ibnd_node_t *ibnd_update_node(ibnd_node_t *node)" > + > +.SH "DESCRIPTION" > +.B ibnd_update_node() > +Update the node info, port info, and node description of the node specified. > + > +.SH "RETURN VALUE" > +.B ibnd_update_node() > +Return NULL on failure, otherwise a valid ibnd_node_t object which is part of the fabric object. > + > +.SH "AUTHORS" > +.TP > +Ira Weiny > diff --git a/infiniband-diags/libibnetdisc/src/chassis.c b/infiniband-diags/libibnetdisc/src/chassis.c > new file mode 100644 > index 0000000..41f325e > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/src/chassis.c > @@ -0,0 +1,818 @@ > +/* > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +/*========================================================*/ > +/* FABRIC SCANNER SPECIFIC DATA */ > +/*========================================================*/ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#include > +#include > +#include > + > +#include > +#include > + > +#include "internal.h" > +#include "chassis.h" > + > +static char *ChassisTypeStr[5] = { "", "ISR9288", "ISR9096", "ISR2012", "ISR2004" }; > +static char *ChassisSlotTypeStr[4] = { "", "Line", "Spine", "SRBD" }; > + > +char *ibnd_get_chassis_type(ibnd_node_t *node) > +{ > + /* Currently, only if Voltaire chassis */ > + if (node->info.vendid != VTR_VENDOR_ID) > + return (NULL); > + if (!node->chassis) > + return (NULL); > + if (node->ch_type == UNRESOLVED_CT > + || node->ch_type > ISR2004_CT) > + return (NULL); > + return ChassisTypeStr[node->ch_type]; > +} > + > +char *ibnd_get_chassis_slot_str(ibnd_node_t *node, char *str, size_t size) > +{ > + /* Currently, only if Voltaire chassis */ > + if (node->info.vendid != VTR_VENDOR_ID) > + return (NULL); > + if (!node->chassis) > + return (NULL); > + if (node->ch_slot == UNRESOLVED_CS > + || node->ch_slot > SRBD_CS) > + return (NULL); > + if (!str) > + return (NULL); > + snprintf(str, size, "%s %d Chip %d", > + ChassisSlotTypeStr[node->ch_slot], > + node->ch_slotnum, > + node->ch_anafanum); > + return (str); > +} > + > +static ibnd_chassis_t *find_chassisnum(struct ibnd_fabric *fabric, unsigned char chassisnum) > +{ > + ibnd_chassis_t *current; > + > + for (current = fabric->first_chassis; current; current = current->next) { > + if (current->chassisnum == chassisnum) > + return current; > + } > + > + return NULL; > +} > + > +static uint64_t topspin_chassisguid(uint64_t guid) > +{ > + /* Byte 3 in system image GUID is chassis type, and */ > + /* Byte 4 is location ID (slot) so just mask off byte 4 */ > + return guid & 0xffffffff00ffffffULL; > +} > + > +int ibnd_is_xsigo_guid(uint64_t guid) > +{ > + if ((guid & 0xffffff0000000000ULL) == 0x0013970000000000ULL) > + return 1; > + else > + return 0; > +} > + > +static int is_xsigo_leafone(uint64_t guid) > +{ > + if ((guid & 0xffffffffff000000ULL) == 0x0013970102000000ULL) > + return 1; > + else > + return 0; > +} > + > +int ibnd_is_xsigo_hca(uint64_t guid) > +{ > + /* NodeType 2 is HCA */ > + if ((guid & 0xffffffff00000000ULL) == 0x0013970200000000ULL) > + return 1; > + else > + return 0; > +} > + > +int ibnd_is_xsigo_tca(uint64_t guid) > +{ > + /* NodeType 3 is TCA */ > + if ((guid & 0xffffffff00000000ULL) == 0x0013970300000000ULL) > + return 1; > + else > + return 0; > +} > + > +static int is_xsigo_ca(uint64_t guid) > +{ > + if (ibnd_is_xsigo_hca(guid) || ibnd_is_xsigo_tca(guid)) > + return 1; > + else > + return 0; > +} > + > +static int is_xsigo_switch(uint64_t guid) > +{ > + if ((guid & 0xffffffff00000000ULL) == 0x0013970100000000ULL) > + return 1; > + else > + return 0; > +} > + > +static uint64_t xsigo_chassisguid(ibnd_node_t *node) > +{ > + if (!is_xsigo_ca(node->info.sysimgguid)) { > + /* Byte 3 is NodeType and byte 4 is PortType */ > + /* If NodeType is 1 (switch), PortType is masked */ > + if (is_xsigo_switch(node->info.sysimgguid)) > + return node->info.sysimgguid & 0xffffffff00ffffffULL; > + else > + return node->info.sysimgguid; > + } else { > + if (!node->ports || !node->ports[1]) > + return (0); > + > + /* Is there a peer port ? */ > + if (!node->ports[1]->remoteport) > + return node->info.sysimgguid; > + > + /* If peer port is Leaf 1, use its chassis GUID */ > + if (is_xsigo_leafone(node->ports[1]->remoteport->node->info.sysimgguid)) > + return node->ports[1]->remoteport->node->info.sysimgguid & > + 0xffffffff00ffffffULL; > + else > + return node->info.sysimgguid; > + } > +} > + > +static uint64_t get_chassisguid(ibnd_node_t *node) > +{ > + if (node->info.vendid == TS_VENDOR_ID || node->info.vendid == SS_VENDOR_ID) > + return topspin_chassisguid(node->info.sysimgguid); > + else if (node->info.vendid == XS_VENDOR_ID || ibnd_is_xsigo_guid(node->info.sysimgguid)) > + return xsigo_chassisguid(node); > + else > + return node->info.sysimgguid; > +} > + > +static ibnd_chassis_t *find_chassisguid(ibnd_node_t *node) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(node->fabric); > + ibnd_chassis_t *current; > + uint64_t chguid; > + > + chguid = get_chassisguid(node); > + for (current = f->first_chassis; current; current = current->next) { > + if (current->chassisguid == chguid) > + return current; > + } > + > + return NULL; > +} > + > +uint64_t ibnd_get_chassis_guid(ibnd_fabric_t *fabric, unsigned char chassisnum) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > + ibnd_chassis_t *chassis; > + > + chassis = find_chassisnum(f, chassisnum); > + if (chassis) > + return chassis->chassisguid; > + else > + return 0; > +} > + > +static int is_router(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_IB_FC_ROUTER || > + n->node.info.devid == VTR_DEVID_IB_IP_ROUTER); > +} > + > +static int is_spine_9096(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SFB4 || > + n->node.info.devid == VTR_DEVID_SFB4_DDR); > +} > + > +static int is_spine_9288(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SFB12 || > + n->node.info.devid == VTR_DEVID_SFB12_DDR); > +} > + > +static int is_spine_2004(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SFB2004); > +} > + > +static int is_spine_2012(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SFB2012); > +} > + > +static int is_spine(struct ibnd_node *n) > +{ > + return (is_spine_9096(n) || is_spine_9288(n) || > + is_spine_2004(n) || is_spine_2012(n)); > +} > + > +static int is_line_24(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SLB24 || > + n->node.info.devid == VTR_DEVID_SLB24_DDR || > + n->node.info.devid == VTR_DEVID_SRB2004); > +} > + > +static int is_line_8(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SLB8); > +} > + > +static int is_line_2024(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SLB2024); > +} > + > +static int is_line(struct ibnd_node *n) > +{ > + return (is_line_24(n) || is_line_8(n) || is_line_2024(n)); > +} > + > +int is_chassis_switch(struct ibnd_node *n) > +{ > + return (is_spine(n) || is_line(n)); > +} > + > +/* these structs help find Line (Anafa) slot number while using spine portnum */ > +int line_slot_2_sfb4[25] = { 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4 }; > +int anafa_line_slot_2_sfb4[25] = { 0, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2 }; > +int line_slot_2_sfb12[25] = { 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10, 10, 11, 11, 12, 12 }; > +int anafa_line_slot_2_sfb12[25] = { 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2 }; > + > +/* IPR FCR modules connectivity while using sFB4 port as reference */ > +int ipr_slot_2_sfb4_port[25] = { 0, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1 }; > + > +/* these structs help find Spine (Anafa) slot number while using spine portnum */ > +int spine12_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > +int anafa_spine12_slot_2_slb[25]= { 0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > +int spine4_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > +int anafa_spine4_slot_2_slb[25] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > +/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ > + > +static void get_sfb_slot(struct ibnd_node *node, ibnd_port_t *lineport) > +{ > + ibnd_node_t *n = (ibnd_node_t *)node; > + > + n->ch_slot = SPINE_CS; > + if (is_spine_9096(node)) { > + n->ch_type = ISR9096_CT; > + n->ch_slotnum = spine4_slot_2_slb[lineport->portnum]; > + n->ch_anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; > + } else if (is_spine_9288(node)) { > + n->ch_type = ISR9288_CT; > + n->ch_slotnum = spine12_slot_2_slb[lineport->portnum]; > + n->ch_anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; > + } else if (is_spine_2012(node)) { > + n->ch_type = ISR2012_CT; > + n->ch_slotnum = spine12_slot_2_slb[lineport->portnum]; > + n->ch_anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; > + } else if (is_spine_2004(node)) { > + n->ch_type = ISR2004_CT; > + n->ch_slotnum = spine4_slot_2_slb[lineport->portnum]; > + n->ch_anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; > + } else { > + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, > + node->node.info.nodeguid); > + } > +} > + > +static void get_router_slot(struct ibnd_node *node, ibnd_port_t *spineport) > +{ > + ibnd_node_t *n = (ibnd_node_t *)node; > + int guessnum = 0; > + > + node->ch_found = 1; > + > + n->ch_slot = SRBD_CS; > + if (is_spine_9096(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR9096_CT; > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > + n->ch_anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; > + } else if (is_spine_9288(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR9288_CT; > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > + /* this is a smart guess based on nodeguids order on sFB-12 module */ > + guessnum = spineport->node->info.nodeguid % 4; > + /* module 1 <--> remote anafa 3 */ > + /* module 2 <--> remote anafa 2 */ > + /* module 3 <--> remote anafa 1 */ > + n->ch_anafanum = (guessnum == 3 ? 1 : (guessnum == 1 ? 3 : 2)); > + } else if (is_spine_2012(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR2012_CT; > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > + /* this is a smart guess based on nodeguids order on sFB-12 module */ > + guessnum = spineport->node->info.nodeguid % 4; > + // module 1 <--> remote anafa 3 > + // module 2 <--> remote anafa 2 > + // module 3 <--> remote anafa 1 > + n->ch_anafanum = (guessnum == 3? 1 : (guessnum == 1 ? 3 : 2)); > + } else if (is_spine_2004(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR2004_CT; > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > + n->ch_anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; > + } else { > + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, > + spineport->node->info.nodeguid); > + } > +} > + > +static void get_slb_slot(ibnd_node_t *n, ibnd_port_t *spineport) > +{ > + n->ch_slot = LINE_CS; > + if (is_spine_9096(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR9096_CT; > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > + n->ch_anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; > + } else if (is_spine_9288(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR9288_CT; > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > + n->ch_anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; > + } else if (is_spine_2012(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR2012_CT; > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > + n->ch_anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; > + } else if (is_spine_2004(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR2004_CT; > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > + n->ch_anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; > + } else { > + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, > + spineport->node->info.nodeguid); > + } > +} > + > +/* forward declare this */ > +static void voltaire_portmap(ibnd_port_t *port); > +/* > + This function called for every Voltaire node in fabric > + It could be optimized so, but time overhead is very small > + and its only diag.util > +*/ > +static void fill_voltaire_chassis_record(struct ibnd_node *node) > +{ > + ibnd_node_t *n = (ibnd_node_t *)node; > + int p = 0; > + ibnd_port_t *port; > + struct ibnd_node *remnode = 0; > + > + if (node->ch_found) /* somehow this node has already been passed */ > + return; > + node->ch_found = 1; > + > + /* node is router only in case of using unique lid */ > + /* (which is lid of chassis router port) */ > + /* in such case node->ports is actually a requested port... */ > + if (is_router(node)) { > + /* find the remote node */ > + for (p = 1; p <= node->node.info.numports; p++) { > + port = node->node.ports[p]; > + if (port && is_spine(CONV_NODE_INTERNAL(port->remoteport->node))) > + get_router_slot(node, port->remoteport); > + } > + } else if (is_spine(node)) { > + for (p = 1; p <= node->node.info.numports; p++) { > + port = node->node.ports[p]; > + if (!port || !port->remoteport) > + continue; > + remnode = CONV_NODE_INTERNAL(port->remoteport->node); > + if (remnode->node.info.type != IBND_SWITCH_NODE) { > + if (!remnode->ch_found) > + get_router_slot(remnode, port); > + continue; > + } > + if (!n->ch_type) > + /* we assume here that remoteport belongs to line */ > + get_sfb_slot(node, port->remoteport); > + > + /* we could break here, but need to find if more routers connected */ > + } > + > + } else if (is_line(node)) { > + for (p = 1; p <= node->node.info.numports; p++) { > + port = node->node.ports[p]; > + if (!port || port->portnum > 12 || !port->remoteport) > + continue; > + /* we assume here that remoteport belongs to spine */ > + get_slb_slot(n, port->remoteport); > + break; > + } > + } > + > + /* for each port of this node, map external ports */ > + for (p = 1; p <= node->node.info.numports; p++) { > + port = node->node.ports[p]; > + if (!port) > + continue; > + voltaire_portmap(port); > + } > + > + return; > +} > + > +static int get_line_index(ibnd_node_t *node) > +{ > + int retval = 3 * (node->ch_slotnum - 1) + node->ch_anafanum; > + > + if (retval > LINES_MAX_NUM || retval < 1) > + IBPANIC("Internal error"); > + return retval; > +} > + > +static int get_spine_index(ibnd_node_t *node) > +{ > + int retval; > + > + if (is_spine_9288(CONV_NODE_INTERNAL(node)) || is_spine_2012(CONV_NODE_INTERNAL(node))) > + retval = 3 * (node->ch_slotnum - 1) + node->ch_anafanum; > + else > + retval = node->ch_slotnum; > + > + if (retval > SPINES_MAX_NUM || retval < 1) > + IBPANIC("Internal error"); > + return retval; > +} > + > +static void insert_line_router(ibnd_node_t *node, ibnd_chassis_t *chassis) > +{ > + int i = get_line_index(node); > + > + if (chassis->linenode[i]) > + return; /* already filled slot */ > + > + chassis->linenode[i] = node; > + node->chassis = chassis; > +} > + > +static void insert_spine(ibnd_node_t *node, ibnd_chassis_t *chassis) > +{ > + int i = get_spine_index(node); > + > + if (chassis->spinenode[i]) > + return; /* already filled slot */ > + > + chassis->spinenode[i] = node; > + node->chassis = chassis; > +} > + > +static void pass_on_lines_catch_spines(ibnd_chassis_t *chassis) > +{ > + ibnd_node_t *node, *remnode; > + ibnd_port_t *port; > + int i, p; > + > + for (i = 1; i <= LINES_MAX_NUM; i++) { > + node = chassis->linenode[i]; > + > + if (!(node && is_line(CONV_NODE_INTERNAL(node)))) > + continue; /* empty slot or router */ > + > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (!port || port->portnum > 12 || !port->remoteport) > + continue; > + > + remnode = port->remoteport->node; > + > + if (!CONV_NODE_INTERNAL(remnode)->ch_found) > + continue; /* some error - spine not initialized ? FIXME */ > + insert_spine(remnode, chassis); > + } > + } > +} > + > +static void pass_on_spines_catch_lines(ibnd_chassis_t *chassis) > +{ > + ibnd_node_t *node, *remnode; > + ibnd_port_t *port; > + int i, p; > + > + for (i = 1; i <= SPINES_MAX_NUM; i++) { > + node = chassis->spinenode[i]; > + if (!node) > + continue; /* empty slot */ > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (!port || !port->remoteport) > + continue; > + remnode = port->remoteport->node; > + > + if (!CONV_NODE_INTERNAL(remnode)->ch_found) > + continue; /* some error - line/router not initialized ? FIXME */ > + insert_line_router(remnode, chassis); > + } > + } > +} > + > +/* > + Stupid interpolation algorithm... > + But nothing to do - have to be compliant with VoltaireSM/NMS > +*/ > +static void pass_on_spines_interpolate_chguid(ibnd_chassis_t *chassis) > +{ > + ibnd_node_t *node; > + int i; > + > + for (i = 1; i <= SPINES_MAX_NUM; i++) { > + node = chassis->spinenode[i]; > + if (!node) > + continue; /* skip the empty slots */ > + > + /* take first guid minus one to be consistent with SM */ > + chassis->chassisguid = node->info.nodeguid - 1; > + break; > + } > +} > + > +/* > + This function fills chassis structure with all nodes > + in that chassis > + chassis structure = structure of one standalone chassis > +*/ > +static void build_chassis(struct ibnd_node *node, ibnd_chassis_t *chassis) > +{ > + int p = 0; > + struct ibnd_node *remnode = 0; > + ibnd_port_t *port = 0; > + > + /* we get here with node = chassis_spine */ > + insert_spine((ibnd_node_t *)node, chassis); > + > + /* loop: pass on all ports of node */ > + for (p = 1; p <= node->node.info.numports; p++ ) { > + port = node->node.ports[p]; > + if (!port || !port->remoteport) > + continue; > + remnode = CONV_NODE_INTERNAL(port->remoteport->node); > + > + if (!remnode->ch_found) > + continue; /* some error - line or router not initialized ? FIXME */ > + > + insert_line_router(&(remnode->node), chassis); > + } > + > + pass_on_lines_catch_spines(chassis); > + /* this pass needed for to catch routers, since routers connected only */ > + /* to spines in slot 1 or 4 and we could miss them first time */ > + pass_on_spines_catch_lines(chassis); > + > + /* additional 2 passes needed for to overcome a problem of pure "in-chassis" */ > + /* connectivity - extra pass to ensure that all related chips/modules */ > + /* inserted into the chassis */ > + pass_on_lines_catch_spines(chassis); > + pass_on_spines_catch_lines(chassis); > + pass_on_spines_interpolate_chguid(chassis); > +} > + > +/*========================================================*/ > +/* INTERNAL TO EXTERNAL PORT MAPPING */ > +/*========================================================*/ > + > +/* > +Description : On ISR9288/9096 external ports indexing > + is not matching the internal ( anafa ) port > + indexes. Use this MAP to translate the data you get from > + the OpenIB diagnostics (smpquery, ibroute, ibtracert, etc.) > + > + > +Module : sLB-24 > + anafa 1 anafa 2 > +ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 > +int port | 22 23 24 18 17 16 | 22 23 24 18 17 16 > +ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 > +int port | 19 20 21 15 14 13 | 19 20 21 15 14 13 > +------------------------------------------------ > + > +Module : sLB-8 > + anafa 1 anafa 2 > +ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 > +int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 > +ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 > +int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 > + > +-----------> > + anafa 1 anafa 2 > +ext port | - - 5 - - 6 | - - 7 - - 8 > +int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 > +ext port | - - 1 - - 2 | - - 3 - - 4 > +int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 > +------------------------------------------------ > + > +Module : sLB-2024 > + > +ext port | 13 14 15 16 17 18 19 20 21 22 23 24 > +A1 int port| 13 14 15 16 17 18 19 20 21 22 23 24 > +ext port | 1 2 3 4 5 6 7 8 9 10 11 12 > +A2 int port| 13 14 15 16 17 18 19 20 21 22 23 24 > +--------------------------------------------------- > + > +*/ > + > +int int2ext_map_slb24[2][25] = { > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 5, 4, 18, 17, 16, 1, 2, 3, 13, 14, 15 }, > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 11, 10, 24, 23, 22, 7, 8, 9, 19, 20, 21 } > + }; > +int int2ext_map_slb8[2][25] = { > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 6, 6, 6, 1, 1, 1, 5, 5, 5 }, > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 8, 8, 8, 3, 3, 3, 7, 7, 7 } > + }; > +int int2ext_map_slb2024[2][25] = { > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }, > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 } > + }; > +/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ > + > +/* map internal ports to external ports if appropriate */ > +static void > +voltaire_portmap(ibnd_port_t *port) > +{ > + struct ibnd_node *n = CONV_NODE_INTERNAL(port->node); > + int portnum = port->portnum; > + int chipnum = 0; > + ibnd_node_t *node = port->node; > + > + if (!n->ch_found || !is_line(CONV_NODE_INTERNAL(node)) || (portnum < 13 || portnum > 24)) { > + port->ext_portnum = 0; > + return; > + } > + > + if (port->node->ch_anafanum < 1 || port->node->ch_anafanum > 2) { > + port->ext_portnum = 0; > + return; > + } > + > + chipnum = port->node->ch_anafanum - 1; > + > + if (is_line_24(CONV_NODE_INTERNAL(node))) > + port->ext_portnum = int2ext_map_slb24[chipnum][portnum]; > + else if (is_line_2024(CONV_NODE_INTERNAL(node))) > + port->ext_portnum = int2ext_map_slb2024[chipnum][portnum]; > + else > + port->ext_portnum = int2ext_map_slb8[chipnum][portnum]; > +} > + > +static void add_chassis(struct ibnd_fabric *fabric) > +{ > + if (!(fabric->current_chassis = calloc(1, sizeof(ibnd_chassis_t)))) > + IBPANIC("out of mem"); > + > + if (fabric->first_chassis == NULL) { > + fabric->first_chassis = fabric->current_chassis; > + fabric->last_chassis = fabric->current_chassis; > + } else { > + fabric->last_chassis->next = fabric->current_chassis; > + fabric->last_chassis = fabric->current_chassis; > + } > +} > + > +static void > +add_node_to_chassis(ibnd_chassis_t *chassis, ibnd_node_t *node) > +{ > + node->chassis = chassis; > + node->next_chassis_node = chassis->nodes; > + chassis->nodes = node; > +} > + > +/* > + Main grouping function > + Algorithm: > + 1. pass on every Voltaire node > + 2. catch spine chip for every Voltaire node > + 2.1 build/interpolate chassis around this chip > + 2.2 go to 1. > + 3. pass on non Voltaire nodes (SystemImageGUID based grouping) > + 4. now group non Voltaire nodes by SystemImageGUID > + Returns: > + Pointer to the first chassis in a NULL terminated list of chassis in > + the fabric specified. > +*/ > +ibnd_chassis_t *group_nodes(struct ibnd_fabric *fabric) > +{ > + struct ibnd_node *node; > + int dist; > + int chassisnum = 0; > + ibnd_chassis_t *chassis; > + > + fabric->first_chassis = NULL; > + fabric->current_chassis = NULL; > + > + /* first pass on switches and build for every Voltaire node */ > + /* an appropriate chassis record (slotnum and position) */ > + /* according to internal connectivity */ > + /* not very efficient but clear code so... */ > + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > + if (node->node.info.vendid == VTR_VENDOR_ID) > + fill_voltaire_chassis_record(node); > + } > + } > + > + /* separate every Voltaire chassis from each other and build linked list of them */ > + /* algorithm: catch spine and find all surrounding nodes */ > + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > + if (node->node.info.vendid != VTR_VENDOR_ID) > + continue; > + //if (!node->node.chrecord || node->node.chrecord->chassisnum || !is_spine(node)) > + if (!node->ch_found > + || (node->node.chassis && node->node.chassis->chassisnum) > + || !is_spine(node)) > + continue; > + add_chassis(fabric); > + fabric->current_chassis->chassisnum = ++chassisnum; > + build_chassis(node, fabric->current_chassis); > + } > + } > + > + /* now make pass on nodes for chassis which are not Voltaire */ > + /* grouped by common SystemImageGUID */ > + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > + if (node->node.info.vendid == VTR_VENDOR_ID) > + continue; > + if (node->node.info.sysimgguid) { > + chassis = find_chassisguid((ibnd_node_t *)node); > + if (chassis) > + chassis->nodecount++; > + else { > + /* Possible new chassis */ > + add_chassis(fabric); > + fabric->current_chassis->chassisguid = > + get_chassisguid((ibnd_node_t *)node); > + fabric->current_chassis->nodecount = 1; > + } > + } > + } > + } > + > + /* now, make another pass to see which nodes are part of chassis */ > + /* (defined as chassis->nodecount > 1) */ > + for (dist = 0; dist <= MAXHOPS; ) { > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > + if (node->node.info.vendid == VTR_VENDOR_ID) > + continue; > + if (node->node.info.sysimgguid) { > + chassis = find_chassisguid((ibnd_node_t *)node); > + if (chassis && chassis->nodecount > 1) { > + if (!chassis->chassisnum) > + chassis->chassisnum = ++chassisnum; > + if (!node->ch_found) { > + node->ch_found = 1; > + add_node_to_chassis(chassis, (ibnd_node_t *)node); > + } > + } > + } > + } > + if (dist == fabric->fabric.maxhops_discovered) > + dist = MAXHOPS; /* skip to CAs */ > + else > + dist++; > + } > + > + return (fabric->first_chassis); > +} > diff --git a/infiniband-diags/libibnetdisc/src/chassis.h b/infiniband-diags/libibnetdisc/src/chassis.h > new file mode 100644 > index 0000000..16dad49 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/src/chassis.h > @@ -0,0 +1,85 @@ > +/* > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#ifndef _CHASSIS_H_ > +#define _CHASSIS_H_ > + > +#include > + > +#include "internal.h" > + > +/*========================================================*/ > +/* CHASSIS RECOGNITION SPECIFIC DATA */ > +/*========================================================*/ > + > +/* Device IDs */ > +#define VTR_DEVID_IB_FC_ROUTER 0x5a00 > +#define VTR_DEVID_IB_IP_ROUTER 0x5a01 > +#define VTR_DEVID_ISR9600_SPINE 0x5a02 > +#define VTR_DEVID_ISR9600_LEAF 0x5a03 > +#define VTR_DEVID_HCA1 0x5a04 > +#define VTR_DEVID_HCA2 0x5a44 > +#define VTR_DEVID_HCA3 0x6278 > +#define VTR_DEVID_SW_6IB4 0x5a05 > +#define VTR_DEVID_ISR9024 0x5a06 > +#define VTR_DEVID_ISR9288 0x5a07 > +#define VTR_DEVID_SLB24 0x5a09 > +#define VTR_DEVID_SFB12 0x5a08 > +#define VTR_DEVID_SFB4 0x5a0b > +#define VTR_DEVID_ISR9024_12 0x5a0c > +#define VTR_DEVID_SLB8 0x5a0d > +#define VTR_DEVID_RLX_SWITCH_BLADE 0x5a20 > +#define VTR_DEVID_ISR9024_DDR 0x5a31 > +#define VTR_DEVID_SFB12_DDR 0x5a32 > +#define VTR_DEVID_SFB4_DDR 0x5a33 > +#define VTR_DEVID_SLB24_DDR 0x5a34 > +#define VTR_DEVID_SFB2012 0x5a37 > +#define VTR_DEVID_SLB2024 0x5a38 > +#define VTR_DEVID_ISR2012 0x5a39 > +#define VTR_DEVID_SFB2004 0x5a40 > +#define VTR_DEVID_ISR2004 0x5a41 > +#define VTR_DEVID_SRB2004 0x5a42 > + > +/* Vendor IDs (for chassis based systems) */ > +#define VTR_VENDOR_ID 0x8f1 /* Voltaire */ > +#define TS_VENDOR_ID 0x5ad /* Cisco */ > +#define SS_VENDOR_ID 0x66a /* InfiniCon */ > +#define XS_VENDOR_ID 0x1397 /* Xsigo */ > + > +enum ibnd_chassis_type { UNRESOLVED_CT, ISR9288_CT, ISR9096_CT, ISR2012_CT, ISR2004_CT }; > +enum ibnd_chassis_slot_type { UNRESOLVED_CS, LINE_CS, SPINE_CS, SRBD_CS }; > + > +ibnd_chassis_t *group_nodes(struct ibnd_fabric *fabric); > + > +#endif /* _CHASSIS_H_ */ > diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc.c b/infiniband-diags/libibnetdisc/src/ibnetdisc.c > new file mode 100644 > index 0000000..64e4ece > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/src/ibnetdisc.c > @@ -0,0 +1,872 @@ > +/* > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Laboratory > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +#include > +#include > + > +#include "internal.h" > +#include "chassis.h" > + > +static int timeout_ms = 2000; > +static int show_progress = 0; > + > +static char *linkwidth_str[] = { > + "??", > + "1x", > + "4x", > + "??", > + "8x", > + "??", > + "??", > + "??", > + "12x" > +}; > + > +static char *linkspeed_str[] = { > + "???", > + "SDR", > + "DDR", > + "???", > + "QDR" > +}; > + > +static char *linkspeed_datarate_str[] = { > + "???", > + "2.5 Gbps", > + "5.0 Gbps", > + "???", > + "10.0 Gbps" > +}; > + > +static char *linkstate_str[] = { > + "No State", > + "Down", > + "Init", > + "Armed", > + "Active" > +}; > + > +static char *physstate_str[] = { > + "No State", > + "Sleep", > + "Polling", > + "Disabled", > + "PortConfigTraining", > + "LinkUp", > + "LinkErrorRecovery", > + "Phy Test" > +}; > + > +char * > +ibnd_linkwidth_str(int link_width) > +{ > + if (link_width > 8) > + return linkwidth_str[0]; > + else > + return linkwidth_str[link_width]; > +} > + > +char * > +ibnd_linkspeed_str(int link_speed, int data_rate) > +{ > + if (link_speed > 4) > + return linkspeed_str[0]; > + else if (data_rate) > + return linkspeed_datarate_str[link_speed]; > + else > + return linkspeed_str[link_speed]; > +} > +char * > +ibnd_linkstate_str(int link_state) > +{ > + if (link_state > 4) > + return linkstate_str[0]; > + else > + return linkstate_str[link_state]; > +} > + > +char * > +ibnd_physstate_str(int phys_state) > +{ > + if (phys_state > 7) > + return physstate_str[0]; > + else > + return physstate_str[phys_state]; > +} > + > +void > +decode_port_info(void * rcv_buf, ibnd_port_info_t *pi) > +{ > + mad_decode_field(rcv_buf, IB_PORT_LID_F, &pi->lid); > + mad_decode_field(rcv_buf, IB_PORT_SMLID_F, &pi->smlid); > + > + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_SUPPORTED_F, &pi->link_speed_supported); > + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_ENABLED_F, &pi->link_speed_enabled); > + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_ACTIVE_F, &pi->link_speed_active); > + > + mad_decode_field(rcv_buf, IB_PORT_LOCAL_PORT_F, &pi->local_port); > + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_SUPPORTED_F, &pi->link_width_supported); > + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_ENABLED_F, &pi->link_width_enabled); > + > + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_ACTIVE_F, &pi->link_width_active); > + > + mad_decode_field(rcv_buf, IB_PORT_DIAG_F, &pi->diag_code); > + mad_decode_field(rcv_buf, IB_PORT_MKEY_LEASE_F, &pi->mkey_lease); > + mad_decode_field(rcv_buf, IB_PORT_CAPMASK_F, &pi->capability_mask); > + mad_decode_field(rcv_buf, IB_PORT_MKEY_F, &pi->mkey); > + mad_decode_field(rcv_buf, IB_PORT_GID_PREFIX_F, &pi->gid_prefix); > + > + mad_decode_field(rcv_buf, IB_PORT_STATE_F, &pi->link_state); > + mad_decode_field(rcv_buf, IB_PORT_PHYS_STATE_F, &pi->phys_state); > + > + mad_decode_field(rcv_buf, IB_PORT_LINK_DOWN_DEF_F, &pi->link_down_def_state); > + mad_decode_field(rcv_buf, IB_PORT_MKEY_PROT_BITS_F, &pi->mkey_prot_bits); > + > + mad_decode_field(rcv_buf, IB_PORT_LMC_F, &pi->lmc); > + mad_decode_field(rcv_buf, IB_PORT_NEIGHBOR_MTU_F, &pi->neighbor_mtu); > + mad_decode_field(rcv_buf, IB_PORT_SMSL_F, &pi->smsl); > + mad_decode_field(rcv_buf, IB_PORT_INIT_TYPE_F, &pi->init_type); > + > + mad_decode_field(rcv_buf, IB_PORT_VL_CAP_F, &pi->vl_capability); > + mad_decode_field(rcv_buf, IB_PORT_VL_HIGH_LIMIT_F, &pi->vl_high_limit); > + mad_decode_field(rcv_buf, IB_PORT_VL_ARBITRATION_HIGH_CAP_F, &pi->vl_arb_high_cap); > + mad_decode_field(rcv_buf, IB_PORT_VL_ARBITRATION_LOW_CAP_F, &pi->vl_arb_low_cap); > + > + mad_decode_field(rcv_buf, IB_PORT_INIT_TYPE_REPLY_F, &pi->init_reply); > + mad_decode_field(rcv_buf, IB_PORT_MTU_CAP_F, &pi->mtu_cap); > + mad_decode_field(rcv_buf, IB_PORT_VL_STALL_COUNT_F, &pi->vl_stall_count); > + mad_decode_field(rcv_buf, IB_PORT_HOQ_LIFE_F, &pi->hoq_lifetime); > + mad_decode_field(rcv_buf, IB_PORT_OPER_VLS_F, &pi->oper_vls); > + mad_decode_field(rcv_buf, IB_PORT_PART_EN_INB_F, &pi->partition_enforce_in); > + mad_decode_field(rcv_buf, IB_PORT_PART_EN_OUTB_F, &pi->partition_enforce_out); > + mad_decode_field(rcv_buf, IB_PORT_FILTER_RAW_INB_F, &pi->filter_raw_in); > + mad_decode_field(rcv_buf, IB_PORT_FILTER_RAW_OUTB_F, &pi->filter_raw_out); > + mad_decode_field(rcv_buf, IB_PORT_MKEY_VIOL_F, &pi->mkey_violations); > + mad_decode_field(rcv_buf, IB_PORT_PKEY_VIOL_F, &pi->pkey_violations); > + mad_decode_field(rcv_buf, IB_PORT_QKEY_VIOL_F, &pi->qkey_violations); > + > + mad_decode_field(rcv_buf, IB_PORT_GUID_CAP_F, &pi->guid_capabilities); > + > + mad_decode_field(rcv_buf, IB_PORT_CLIENT_REREG_F, &pi->client_rereg); > + mad_decode_field(rcv_buf, IB_PORT_SUBN_TIMEOUT_F, &pi->subnet_timeout); > + mad_decode_field(rcv_buf, IB_PORT_RESP_TIME_VAL_F, &pi->response_time_val); > + mad_decode_field(rcv_buf, IB_PORT_LOCAL_PHYS_ERR_F, &pi->local_phys_error); > + mad_decode_field(rcv_buf, IB_PORT_OVERRUN_ERR_F, &pi->overrun_error); > + mad_decode_field(rcv_buf, IB_PORT_MAX_CREDIT_HINT_F, &pi->max_credit_hint); > + mad_decode_field(rcv_buf, IB_PORT_LINK_ROUND_TRIP_F, &pi->link_round_trip); > +} > + > +static int > +get_port_info(struct ibnd_fabric *fabric, struct ibnd_port *port, > + int portnum, ib_portid_t *portid) > +{ > + char portinfo[64]; > + void *pi = portinfo; > + > + port->port.portnum = portnum; > + > + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, portnum, timeout_ms, > + fabric->ibmad_port)) > + return -1; > + > + decode_port_info(pi, &port->port.info); > + > + IBND_DEBUG("portid %s portnum %d: lid %d state %d physstate %d %s %s\n", > + portid2str(portid), portnum, port->port.info.lid, port->port.info.link_state, > + port->port.info.phys_state, ibnd_linkwidth_str(port->port.info.link_width_active), > + ibnd_linkspeed_str(port->port.info.link_speed_active, 0)); > + return 1; > +} > + > +static void > +decode_node_info(void * rcv_buf, ibnd_node_info_t *ni) > +{ > + mad_decode_field(rcv_buf, IB_NODE_BASE_VERS_F, &ni->base_ver); > + mad_decode_field(rcv_buf, IB_NODE_CLASS_VERS_F, &ni->class_ver); > + mad_decode_field(rcv_buf, IB_NODE_TYPE_F, &ni->type); > + mad_decode_field(rcv_buf, IB_NODE_NPORTS_F, &ni->numports); > + mad_decode_field(rcv_buf, IB_NODE_SYSTEM_GUID_F, &ni->sysimgguid); > + mad_decode_field(rcv_buf, IB_NODE_GUID_F, &ni->nodeguid); > + mad_decode_field(rcv_buf, IB_NODE_PORT_GUID_F, &ni->nodeportguid); > + mad_decode_field(rcv_buf, IB_NODE_PARTITION_CAP_F, &ni->partition_cap); > + mad_decode_field(rcv_buf, IB_NODE_DEVID_F, &ni->devid); > + mad_decode_field(rcv_buf, IB_NODE_REVISION_F, &ni->revision); > + mad_decode_field(rcv_buf, IB_NODE_LOCAL_PORT_F, &ni->localport); > + mad_decode_field(rcv_buf, IB_NODE_VENDORID_F, &ni->vendid); > +} > + > +/* > + * Returns -1 if error. > + */ > +static int > +query_node_info(struct ibnd_fabric *fabric, struct ibnd_node *node, ib_portid_t *portid) > +{ > + char nodeinfo[64]; > + void *ni = nodeinfo; > + if (!smp_query_via(ni, portid, IB_ATTR_NODE_INFO, 0, timeout_ms, > + fabric->ibmad_port)) > + return -1; > + decode_node_info(ni, &(node->node.info)); > + return (0); > +} > + > +/* > + * Returns 0 if non switch node is found, 1 if switch is found, -1 if error. > + */ > +static int > +query_node(struct ibnd_fabric *fabric, struct ibnd_node *inode, > + struct ibnd_port *iport, ib_portid_t *portid) > +{ > + char portinfo[64]; > + void *pi = portinfo; > + char switchinfo[64]; > + void *si = switchinfo; > + ibnd_node_t *node = &(inode->node); > + ibnd_port_t *port = &(iport->port); > + void *nd = inode->node.nodedesc; > + > + if (query_node_info(fabric, inode, portid)) > + return -1; > + > + port->portnum = node->info.localport; > + port->guid = node->info.nodeportguid; > + > + if (!smp_query_via(nd, portid, IB_ATTR_NODE_DESC, 0, timeout_ms, > + fabric->ibmad_port)) > + return -1; > + > + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, 0, timeout_ms, > + fabric->ibmad_port)) > + return -1; > + decode_port_info(pi, &port->info); > + > + if (node->info.type != IBND_SWITCH_NODE) > + return 0; > + > + node->smalid = port->info.lid; > + node->smalmc = port->info.lmc; > + > + /* after we have the sma information find out the real PortInfo for this port */ > + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, node->info.localport, timeout_ms, > + fabric->ibmad_port)) > + return -1; > + decode_port_info(pi, &port->info); > + > + if (!smp_query_via(si, portid, IB_ATTR_SWITCH_INFO, 0, timeout_ms, > + fabric->ibmad_port)) > + node->sw_info.smaenhsp0 = 0; /* assume base SP0 */ > + else > + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &node->sw_info.smaenhsp0); > + > + IBND_DEBUG("portid %s: got switch node %" PRIx64 " '%s'\n", > + portid2str(portid), node->info.nodeguid, node->nodedesc); > + return 1; > +} > + > +static int > +add_port_to_dpath(ib_dr_path_t *path, int nextport) > +{ > + if (path->cnt+2 >= sizeof(path->p)) > + return -1; > + ++path->cnt; > + path->p[path->cnt] = nextport; > + return path->cnt; > +} > + > +static int > +extend_dpath(struct ibnd_fabric *f, ib_dr_path_t *path, int nextport) > +{ > + int rc = add_port_to_dpath(path, nextport); > + if ((rc != -1) && (path->cnt > f->fabric.maxhops_discovered)) > + f->fabric.maxhops_discovered = path->cnt; > + return (rc); > +} > + > +static void > +dump_endnode(ib_portid_t *path, char *prompt, > + struct ibnd_node *node, struct ibnd_port *port) > +{ > + if (!show_progress) > + return; > + > + printf("%s -> %s %s {%016" PRIx64 "} portnum %d lid %d-%d\"%s\"\n", > + portid2str(path), prompt, > + ibnd_node_type_str((ibnd_node_t *)node), > + node->node.info.nodeguid, > + node->node.info.type == IBND_SWITCH_NODE ? 0 : port->port.portnum, > + port->port.info.lid, port->port.info.lid + (1 << port->port.info.lmc) - 1, > + node->node.nodedesc); > +} > + > +static struct ibnd_node * > +find_existing_node(struct ibnd_fabric *fabric, struct ibnd_node *new) > +{ > + int hash = HASHGUID(new->node.info.nodeguid) % HTSZ; > + struct ibnd_node *node; > + > + for (node = fabric->nodestbl[hash]; node; node = node->htnext) > + if (node->node.info.nodeguid == new->node.info.nodeguid) > + return node; > + > + return NULL; > +} > + > +ibnd_node_t * > +ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > + int hash = HASHGUID(guid) % HTSZ; > + struct ibnd_node *node; > + > + for (node = f->nodestbl[hash]; node; node = node->htnext) > + if (node->node.info.nodeguid == guid) > + return (ibnd_node_t *)node; > + > + return NULL; > +} > + > +ibnd_node_t * > +ibnd_update_node(ibnd_node_t *node) > +{ > + char portinfo[64]; > + void *pi = portinfo; > + ibnd_port_info_t port0_info; > + char switchinfo[64]; > + void *si = switchinfo; > + void *nd = node->nodedesc; > + int p = 0; > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(node->fabric); > + struct ibnd_node *n = CONV_NODE_INTERNAL(node); > + > + if (query_node_info(f, n, &(n->node.path_portid))) > + return (NULL); > + > + if (!smp_query_via(nd, &(n->node.path_portid), IB_ATTR_NODE_DESC, 0, timeout_ms, > + f->ibmad_port)) > + return (NULL); > + > + /* update all the port info's */ > + for (p = 1; p >= n->node.info.numports; p++) { > + get_port_info(f, CONV_PORT_INTERNAL(n->node.ports[p]), p, &(n->node.path_portid)); > + } > + > + if (n->node.info.type != IBND_SWITCH_NODE) > + goto done; > + > + if (!smp_query_via(pi, &(n->node.path_portid), IB_ATTR_PORT_INFO, 0, timeout_ms, > + f->ibmad_port)) > + return (NULL); > + decode_port_info(pi, &port0_info); > + > + n->node.smalid = port0_info.lid; > + n->node.smalmc = port0_info.lmc; > + > + if (!smp_query_via(si, &(n->node.path_portid), IB_ATTR_SWITCH_INFO, 0, timeout_ms, > + f->ibmad_port)) > + node->sw_info.smaenhsp0 = 0; /* assume base SP0 */ > + else > + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &n->node.sw_info.smaenhsp0); > + > +done: > + return (node); > +} > + > +ibnd_node_t * > +ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > + int i = 0; > + ibnd_node_t *rc = f->fabric.from_node; > + ib_dr_path_t path; > + > + if (str2drpath(&path, dr_str, 0, 0) == -1) { > + return (NULL); > + } > + > + for (i = 0; i <= path.cnt; i++) { > + ibnd_port_t *remote_port = NULL; > + if (path.p[i] == 0) > + continue; > + if (!rc->ports) > + return (NULL); > + > + remote_port = rc->ports[path.p[i]]->remoteport; > + if (!remote_port) > + return (NULL); > + > + rc = remote_port->node; > + } > + > + return (rc); > +} > + > +static void > +add_to_nodeguid_hash(struct ibnd_node *node, struct ibnd_node *hash[]) > +{ > + int hash_idx = HASHGUID(node->node.info.nodeguid) % HTSZ; > + > + node->htnext = hash[hash_idx]; > + hash[hash_idx] = node; > +} > + > +static void > +add_to_portguid_hash(struct ibnd_port *port, struct ibnd_port *hash[]) > +{ > + int hash_idx = HASHGUID(port->port.guid) % HTSZ; > + > + port->htnext = hash[hash_idx]; > + hash[hash_idx] = port; > +} > + > +static void > +add_to_type_list(struct ibnd_node*node, struct ibnd_fabric *fabric) > +{ > + switch (node->node.info.type) { > + case IBND_CA_NODE: > + node->type_next = fabric->ch_adapters; > + fabric->ch_adapters = node; > + break; > + case IBND_SWITCH_NODE: > + node->type_next = fabric->switches; > + fabric->switches = node; > + break; > + case IBND_ROUTER_NODE: > + node->type_next = fabric->routers; > + fabric->routers = node; > + break; > + } > +} > + > +static void > +add_to_nodedist(struct ibnd_node *node, struct ibnd_fabric *fabric) > +{ > + int dist = node->node.dist; > + if (node->node.info.type != IBND_SWITCH_NODE) > + dist = MAXHOPS; /* special Ca list */ > + > + node->dnext = fabric->nodesdist[dist]; > + fabric->nodesdist[dist] = node; > +} > + > + > +static struct ibnd_node * > +create_node(struct ibnd_fabric *fabric, struct ibnd_node *temp, ib_portid_t *path, int dist) > +{ > + struct ibnd_node *node; > + > + node = malloc(sizeof(*node)); > + if (!node) { > + IBPANIC("OOM: node creation failed\n"); > + return NULL; > + } > + > + memcpy(node, temp, sizeof(*node)); > + node->node.dist = dist; > + node->node.path_portid = *path; > + node->node.fabric = (ibnd_fabric_t *)fabric; > + > + add_to_nodeguid_hash(node, fabric->nodestbl); > + > + /* add this to the all nodes list */ > + node->node.next = fabric->fabric.nodes; > + fabric->fabric.nodes = (ibnd_node_t *)node; > + > + add_to_type_list(node, fabric); > + add_to_nodedist(node, fabric); > + > + return node; > +} > + > +static struct ibnd_port * > +find_existing_port_node(struct ibnd_node *node, struct ibnd_port *port) > +{ > + if (port->port.portnum > node->node.info.numports || node->node.ports == NULL ) > + return (NULL); > + > + return (CONV_PORT_INTERNAL(node->node.ports[port->port.portnum])); > +} > + > +static struct ibnd_port * > +add_port_to_node(struct ibnd_fabric *fabric, struct ibnd_node *node, struct ibnd_port *temp) > +{ > + struct ibnd_port *port; > + > + port = malloc(sizeof(*port)); > + if (!port) > + return NULL; > + > + memcpy(port, temp, sizeof(*port)); > + port->port.node = (ibnd_node_t *)node; > + port->port.ext_portnum = 0; > + > + if (node->node.ports == NULL) { > + node->node.ports = calloc(sizeof(*node->node.ports), node->node.info.numports + 1); > + if (!node->node.ports) { > + IBND_ERROR("Failed to allocate the ports array\n"); > + return (NULL); > + } > + } > + > + node->node.ports[temp->port.portnum] = (ibnd_port_t *)port; > + > + add_to_portguid_hash(port, fabric->portstbl); > + return port; > +} > + > +static void > +link_ports(struct ibnd_node *node, struct ibnd_port *port, > + struct ibnd_node *remotenode, struct ibnd_port *remoteport) > +{ > + IBND_DEBUG("linking: 0x%" PRIx64 " %p->%p:%u and 0x%" PRIx64 " %p->%p:%u\n", > + node->node.info.nodeguid, node, port, port->port.portnum, > + remotenode->node.info.nodeguid, remotenode, > + remoteport, remoteport->port.portnum); > + if (port->port.remoteport) > + port->port.remoteport->remoteport = NULL; > + if (remoteport->port.remoteport) > + remoteport->port.remoteport->remoteport = NULL; > + port->port.remoteport = (ibnd_port_t *)remoteport; > + remoteport->port.remoteport = (ibnd_port_t *)port; > +} > + > +static int > +get_remote_node(struct ibnd_fabric *fabric, struct ibnd_node *node, struct ibnd_port *port, ib_portid_t *path, > + int portnum, int dist) > +{ > + struct ibnd_node node_buf; > + struct ibnd_port port_buf; > + struct ibnd_node *remotenode, *oldnode; > + struct ibnd_port *remoteport, *oldport; > + > + memset(&node_buf, 0, sizeof(node_buf)); > + memset(&port_buf, 0, sizeof(port_buf)); > + > + IBND_DEBUG("handle node %p port %p:%d dist %d\n", node, port, portnum, dist); > + if (port->port.info.phys_state != 5) /* LinkUp */ > + return -1; > + > + if (extend_dpath(fabric, &path->drpath, portnum) < 0) > + return -1; > + > + if (query_node(fabric, &node_buf, &port_buf, path) < 0) { > + IBWARN("NodeInfo on %s failed, skipping port", > + portid2str(path)); > + path->drpath.cnt--; /* restore path */ > + return -1; > + } > + > + oldnode = find_existing_node(fabric, &node_buf); > + if (oldnode) > + remotenode = oldnode; > + else if (!(remotenode = create_node(fabric, &node_buf, path, dist + 1))) > + IBPANIC("no memory"); > + > + oldport = find_existing_port_node(remotenode, &port_buf); > + if (oldport) { > + remoteport = oldport; > + } else if (!(remoteport = add_port_to_node(fabric, remotenode, &port_buf))) > + IBPANIC("no memory"); > + > + dump_endnode(path, oldnode ? "known remote" : "new remote", > + remotenode, remoteport); > + > + link_ports(node, port, remotenode, remoteport); > + > + path->drpath.cnt--; /* restore path */ > + return 0; > +} > + > +static void * > +ibnd_init_port(char *dev_name, int dev_port) > +{ > + int mgmt_classes[2] = {IB_SMI_CLASS, IB_SMI_DIRECT_CLASS}; > + > + /* Crank up the mad lib */ > + return (mad_rpc_open_port(dev_name, dev_port, mgmt_classes, 2)); > +} > + > +ibnd_fabric_t * > +ibnd_discover_fabric(char *dev_name, int dev_port, int timeout_ms, > + ib_portid_t *from, int hops) > +{ > + struct ibnd_fabric *fabric = NULL; > + ib_portid_t my_portid = {0}; > + struct ibnd_node node_buf; > + struct ibnd_port port_buf; > + struct ibnd_node *node; > + struct ibnd_port *port; > + int i; > + int dist = 0; > + ib_portid_t *path; > + int max_hops = MAXHOPS-1; /* default find everything */ > + > + /* if not everything how much? */ > + if (hops >= 0) { > + max_hops = hops; > + } > + > + /* If not specified start from "my" port */ > + if (!from) { > + from = &my_portid; > + } > + > + fabric = malloc(sizeof(*fabric)); > + > + if (!fabric) { > + IBPANIC("OOM: failed to malloc ibnd_fabric_t\n"); > + return (NULL); > + } > + > + memset(fabric, 0, sizeof(*fabric)); > + > + fabric->ibmad_port = ibnd_init_port(dev_name, dev_port); > + if (!fabric->ibmad_port) { > + IBPANIC("OOM: failed to open \"%s\" port %d\n", > + dev_name, dev_port); > + goto error; > + } > + > + IBND_DEBUG("from %s\n", portid2str(from)); > + > + memset(&node_buf, 0, sizeof(node_buf)); > + memset(&port_buf, 0, sizeof(port_buf)); > + > + if (query_node(fabric, &node_buf, &port_buf, from) < 0) { > + IBWARN("can't reach node %s\n", portid2str(from)); > + goto error; > + } > + > + node = create_node(fabric, &node_buf, from, 0); > + if (!node) > + goto error; > + > + fabric->fabric.from_node = (ibnd_node_t *)node; > + > + port = add_port_to_node(fabric, node, &port_buf); > + if (!port) > + IBPANIC("out of memory"); > + > + if (node->node.info.type != IBND_SWITCH_NODE && > + get_remote_node(fabric, node, port, from, node->node.info.localport, 0) < 0) > + return ((ibnd_fabric_t *)fabric); > + > + for (dist = 0; dist <= max_hops; dist++) { > + > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > + > + path = &node->node.path_portid; > + > + IBND_DEBUG("dist %d node %p\n", dist, node); > + dump_endnode(path, "processing", node, port); > + > + for (i = 1; i <= node->node.info.numports; i++) { > + if (i == node->node.info.localport) > + continue; > + > + if (get_port_info(fabric, &port_buf, i, path) < 0) { > + IBWARN("can't reach node %s port %d", portid2str(path), i); > + continue; > + } > + > + port = find_existing_port_node(node, &port_buf); > + if (port) > + continue; > + > + port = add_port_to_node(fabric, node, &port_buf); > + if (!port) > + IBPANIC("out of memory"); > + > + /* If switch, set port GUID to node port GUID */ > + if (node->node.info.type == IBND_SWITCH_NODE) > + port->port.guid = node->node.info.nodeportguid; > + > + get_remote_node(fabric, node, port, path, i, dist); > + } > + } > + } > + > + fabric->fabric.chassis = group_nodes(fabric); > + > + return ((ibnd_fabric_t *)fabric); > +error: > + free(fabric); > + return (NULL); > +} > + > +static void > +destroy_node(struct ibnd_node *node) > +{ > + int p = 0; > + > + for (p = 0; p <= node->node.info.numports; p++) { > + free(node->node.ports[p]); > + } > + free(node->node.ports); > + free(node); > +} > + > +void > +ibnd_destroy_fabric(ibnd_fabric_t *fabric) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > + int dist = 0; > + struct ibnd_node *node = NULL; > + struct ibnd_node *next = NULL; > + ibnd_chassis_t *ch, *ch_next; > + > + ch = f->first_chassis; > + while (ch) { > + ch_next = ch->next; > + free(ch); > + ch = ch_next; > + } > + for (dist = 0; dist <= MAXHOPS; dist++) { > + node = f->nodesdist[dist]; > + while (node) { > + next = node->dnext; > + destroy_node(node); > + node = next; > + } > + } > + if (f->ibmad_port) > + mad_rpc_close_port(f->ibmad_port); > + free(f); > +} > + > +void > +ibnd_debug(int i) > +{ > + if (i) { > + ibdebug++; > + madrpc_show_errors(1); > + umad_debug(i); > + } else { > + ibdebug = 0; > + madrpc_show_errors(0); > + umad_debug(0); > + } > +} > + > +void > +ibnd_show_progress(int i) > +{ > + show_progress = i; > +} > + > +const char* > +ibnd_node_type_str(ibnd_node_t *node) > +{ > + switch(node->info.type) { > + case IBND_CA_NODE: return "Ca"; > + case IBND_SWITCH_NODE: return "Switch"; > + case IBND_ROUTER_NODE: return "Router"; > + } > + return "??"; > +} > + > +const char* > +ibnd_node_type_str_short(ibnd_node_t *node) > +{ > + switch(node->info.type) { > + case IBND_SWITCH_NODE: return "SW"; > + case IBND_CA_NODE: return "CA"; > + case IBND_ROUTER_NODE: return "RT"; > + } > + return "??"; > +} > + > + > +void > +ibnd_iter_nodes(ibnd_fabric_t *fabric, > + ibnd_iter_node_func_t func, > + void *user_data) > +{ > + ibnd_node_t *cur = NULL; > + > + for (cur = fabric->nodes; cur; cur = cur->next) { > + func(cur, user_data); > + } > +} > + > + > +void > +ibnd_iter_nodes_type(ibnd_fabric_t *fabric, > + ibnd_iter_node_func_t func, > + ibnd_node_type_t node_type, > + void *user_data) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > + struct ibnd_node *list = NULL; > + struct ibnd_node *cur = NULL; > + > + switch (node_type) { > + case IBND_SWITCH_NODE: > + list = f->switches; > + break; > + case IBND_CA_NODE: > + list = f->ch_adapters; > + break; > + case IBND_ROUTER_NODE: > + list = f->routers; > + break; > + default: > + IBND_DEBUG("Invalid node_type specified %d\n", node_type); > + break; > + } > + > + for (cur = list; cur; cur = cur->type_next) { > + func((ibnd_node_t *)cur, user_data); > + } > +} > + > diff --git a/infiniband-diags/libibnetdisc/src/internal.h b/infiniband-diags/libibnetdisc/src/internal.h > new file mode 100644 > index 0000000..89f238f > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/src/internal.h > @@ -0,0 +1,82 @@ > +/* > + * Copyright (c) 2008 Lawrence Livermore National Laboratory > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +/** ========================================================================= > + * Define the internal data structures. > + */ > + > +#ifndef _INTERNAL_H_ > +#define _INTERNAL_H_ > + > +#include > + > +struct ibnd_node { > + /* This member MUST BE FIRST */ > + ibnd_node_t node; > + > + /* internal use only */ > + unsigned char ch_found; > + struct ibnd_node *htnext; /* hash table list */ > + struct ibnd_node *dnext; /* nodesdist next */ > + struct ibnd_node *type_next; /* next based on type */ > +}; > +#define CONV_NODE_INTERNAL(node) ((struct ibnd_node *)node) > + > +struct ibnd_port { > + /* This member MUST BE FIRST */ > + ibnd_port_t port; > + > + /* internal use only */ > + struct ibnd_port *htnext; > +}; > +#define CONV_PORT_INTERNAL(port) ((struct ibnd_port *)port) > + > +struct ibnd_fabric { > + /* This member MUST BE FIRST */ > + ibnd_fabric_t fabric; > + > + /* internal use only */ > + void *ibmad_port; > + struct ibnd_node *nodestbl[HTSZ]; > + struct ibnd_port *portstbl[HTSZ]; > + struct ibnd_node *nodesdist[MAXHOPS+1]; > + ibnd_chassis_t *first_chassis; > + ibnd_chassis_t *current_chassis; > + ibnd_chassis_t *last_chassis; > + struct ibnd_node *switches; > + struct ibnd_node *ch_adapters; > + struct ibnd_node *routers; > +}; > +#define CONV_FABRIC_INTERNAL(fabric) ((struct ibnd_fabric *)fabric) > + > +#endif /* _INTERNAL_H_ */ > diff --git a/infiniband-diags/libibnetdisc/src/libibnetdisc.map b/infiniband-diags/libibnetdisc/src/libibnetdisc.map > new file mode 100644 > index 0000000..5e8c315 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/src/libibnetdisc.map > @@ -0,0 +1,27 @@ > +IBNETDISC_1.0 { > + global: > + ibnd_debug; > + ibnd_show_progress; > + ibnd_discover_fabric; > + ibnd_cache_fabric; > + ibnd_read_fabric; > + ibnd_destroy_fabric; > + ibnd_find_node_guid; > + ibnd_update_node; > + ibnd_find_node_dr; > + ibnd_linkwidth_str; > + ibnd_linkspeed_str; > + ibnd_node_type_str; > + ibnd_node_type_str_short; > + ibnd_is_xsigo_guid; > + ibnd_is_xsigo_tca; > + ibnd_is_xsigo_hca; > + ibnd_get_chassis_guid; > + ibnd_get_chassis_type; > + ibnd_get_chassis_slot_str; > + ibnd_linkstate_str; > + ibnd_physstate_str; > + ibnd_iter_nodes; > + ibnd_iter_nodes_type; > + local: *; > +}; > diff --git a/infiniband-diags/libibnetdisc/test/iblinkinfotest.c b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c > new file mode 100644 > index 0000000..6e63f4a > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c > @@ -0,0 +1,395 @@ > +/* > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > + > +char *argv0 = "iblinkinfotest"; > +static FILE *f; > + > +static char *node_name_map_file = NULL; > +static nn_map_t *node_name_map = NULL; > + > +static int timeout_ms = 500; > + > +static int debug = 0; > +#define DEBUG(str, args...) \ > + if (debug) fprintf(stderr, str, ##args) > + > +static int down_links_only = 0; > +static int line_mode = 0; > +static int add_sw_settings = 0; > +static int print_port_guids = 0; > + > +static unsigned int > +get_max(unsigned int num) > +{ > + unsigned int v = num; // 32-bit word to find the log base 2 of > + unsigned r = 0; // r will be lg(v) > + > + while (v >>= 1) // unroll for more speed... > + { > + r++; > + } > + > + return (1 << r); > +} > + > +void > +get_msg(char *width_msg, char *speed_msg, int msg_size, ibnd_port_t *port) > +{ > + int max_speed = 0; > + > + int max_width = get_max(port->info.link_width_supported > + & port->remoteport->info.link_width_supported); > + if ((max_width & port->info.link_width_active) == 0) { > + // we are not at the max supported width > + // print what we could be at. > + snprintf(width_msg, msg_size, "Could be %s", > + ibnd_linkwidth_str(max_width)); > + } > + > + max_speed = get_max(port->info.link_speed_supported > + & port->remoteport->info.link_speed_supported); > + if ((max_speed & port->info.link_speed_active) == 0) { > + // we are not at the max supported speed > + // print what we could be at. > + snprintf(speed_msg, msg_size, "Could be %s", > + ibnd_linkspeed_str(max_speed, 1)); > + } > +} > + > +void > +print_port(ibnd_node_t *node, ibnd_port_t *port) > +{ > + char remote_guid_str[256]; > + char remote_str[256]; > + char link_str[256]; > + char width_msg[256]; > + char speed_msg[256]; > + char ext_port_str[256]; > + > + if (!port) > + return; > + > + remote_guid_str[0] = '\0'; > + remote_str[0] = '\0'; > + link_str[0] = '\0'; > + width_msg[0] = '\0'; > + speed_msg[0] = '\0'; > + > + if (port->remoteport) { > + char remote_name_buf[256]; > + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); > + > + if (port->remoteport->ext_portnum) > + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); > + else > + ext_port_str[0] = '\0'; > + > + get_msg(width_msg, speed_msg, 256, port); > + if (line_mode) { > + if (print_port_guids) { > + snprintf(remote_guid_str, 256, > + "0x%016lx ", > + port->remoteport->guid); > + } else { > + snprintf(remote_guid_str, 256, > + "0x%016lx ", > + port->remoteport->node->info.nodeguid); > + } > + } > + > + snprintf(remote_str, 256, > + "%s%6d %4d[%2s] \"%s\" (%s %s)\n", > + remote_guid_str, > + port->remoteport->info.lid ? > + port->remoteport->info.lid : > + port->remoteport->node->smalid, > + port->remoteport->portnum, > + ext_port_str, > + remap_node_name(node_name_map, > + port->remoteport->node->info.nodeguid, > + remote_name_buf), > + width_msg, > + speed_msg > + ); > + } else { > + snprintf(remote_str, 256, > + "%6s %4s[%2s] \"\" ( )\n", "", "", ""); > + } > + > + if (add_sw_settings) { > + snprintf(link_str, 256, > + "(%3s %s %6s/%8s) (HOQ:%d VL_Stall:%d)", > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 1), > + ibnd_linkstate_str(port->info.link_state), > + ibnd_physstate_str(port->info.phys_state), > + port->info.hoq_lifetime, > + port->info.vl_stall_count > + ); > + } else { > + snprintf(link_str, 256, > + "(%3s %s %6s/%8s)", > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 1), > + ibnd_linkstate_str(port->info.link_state), > + ibnd_physstate_str(port->info.phys_state) > + ); > + } > + > + if (port->ext_portnum) > + snprintf(ext_port_str, 256, "%d", port->ext_portnum); > + else > + ext_port_str[0] = '\0'; > + > + if (line_mode) { > + char name_buf[256]; > + strncpy(name_buf, node->nodedesc, 256); > + printf("0x%016lx \"%30s\" %6d %4d[%2s] ==%s==> %s", > + node->info.nodeguid, > + remap_node_name(node_name_map, > + node->info.nodeguid, > + name_buf), > + node->smalid, port->portnum, > + ext_port_str, > + link_str, > + remote_str > + ); > + } else { > + printf(" %6d %4d[%2s] ==%s==> %s", > + node->smalid, port->portnum, > + ext_port_str, > + link_str, > + remote_str > + ); > + } > +} > + > +void > +print_switch(ibnd_node_t *node, void *user_data) > +{ > + int i = 0; > + > + if (!line_mode) { > + char name_buf[256]; > + strncpy(name_buf, node->nodedesc, 256); > + printf("Switch 0x%016lx %s:\n", > + node->info.nodeguid, > + remap_node_name(node_name_map, > + node->info.nodeguid, > + name_buf)); > + } > + > + for (i = 1; i <= node->info.numports; i++) { > + ibnd_port_t *port = node->ports[i]; > + if (!port) > + continue; > + if (!down_links_only || port->info.link_state == IBND_LINK_DOWN) { > + print_port(node, port); > + } > + } > +} > + > +void > +usage(void) > +{ > + fprintf(stderr, > + "Usage: %s [-hclp -S -D -C -P ]\n" > + " Report link speed and connection for each port of each switch which is active\n" > + " -h This help message\n" > + " -S output only the node specified by guid\n" > + " -D print only node specified by \n" > + " -f specify node to start \"from\"\n" > + " -n Number of hops to include away from specified node\n" > + " -d print only down links\n" > + " -l (line mode) print all information for each link on each line\n" > + " -p print additional switch settings (PktLifeTime,HoqLife,VLStallCount)\n" > + > + > + " -t timeout for any single fabric query\n" > + " -s show errors\n" > + " --node-name-map use specified node name map\n" > + > + " -C use selected Channel Adaptor name for queries\n" > + " -P use selected channel adaptor port for queries\n" > + " -g print port guids instead of node guids\n" > + " --debug print debug messages\n" > + , > + argv0); > + exit(-1); > +} > + > +int > +main(int argc, char **argv) > +{ > + char *ca = 0; > + int ca_port = 0; > + ibnd_fabric_t *fabric = NULL; > + uint64_t guid = 0; > + char *dr_path = NULL; > + char *from = NULL; > + int hops = 0; > + ib_portid_t port_id; > + > + static char const str_opts[] = "S:D:n:C:P:t:sldgphuf:"; > + static const struct option long_opts[] = { > + { "S", 1, 0, 'S'}, > + { "D", 1, 0, 'D'}, > + { "num-hops", 1, 0, 'n'}, > + { "down-links-only", 0, 0, 'd'}, > + { "line-mode", 0, 0, 'l'}, > + { "ca-name", 1, 0, 'C'}, > + { "ca-port", 1, 0, 'P'}, > + { "timeout", 1, 0, 't'}, > + { "show", 0, 0, 's'}, > + { "print-port-guids", 0, 0, 'g'}, > + { "print-additional", 0, 0, 'p'}, > + { "help", 0, 0, 'h'}, > + { "usage", 0, 0, 'u'}, > + { "node-name-map", 1, 0, 1}, > + { "debug", 0, 0, 2}, > + { "from", 1, 0, 'f'}, > + { } > + }; > + > + f = stdout; > + > + argv0 = argv[0]; > + > + while (1) { > + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); > + if ( ch == -1 ) > + break; > + switch(ch) { > + case 1: > + node_name_map_file = strdup(optarg); > + break; > + case 2: > + debug = 1; > + ibnd_debug(1); > + break; > + case 'f': > + from = strdup(optarg); > + break; > + case 'C': > + ca = strdup(optarg); > + break; > + case 'P': > + ca_port = strtoul(optarg, 0, 0); > + break; > + case 'D': > + dr_path = strdup(optarg); > + break; > + case 'n': > + hops = (int)strtol(optarg, NULL, 0); > + break; > + case 'd': > + down_links_only = 1; > + break; > + case 'l': > + line_mode = 1; > + break; > + case 't': > + timeout_ms = strtoul(optarg, 0, 0); > + break; > + case 'g': > + print_port_guids = 1; > + break; > + case 'S': > + guid = (uint64_t)strtoull(optarg, 0, 0); > + break; > + case 'p': > + add_sw_settings = 1; > + break; > + default: > + usage(); > + break; > + } > + } > + argc -= optind; > + argv += optind; > + > + if (argc && !(f = fopen(argv[0], "w"))) > + fprintf(stderr, "can't open file %s for writing", argv[0]); > + > + node_name_map = open_node_name_map(node_name_map_file); > + > + if (from) { > + /* only scan part of the fabric */ > + str2drpath(&(port_id.drpath), from, 0, 0); > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { > + fprintf(stderr, "discover failed\n"); > + exit(1); > + } > + guid = 0; > + } else { > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { > + fprintf(stderr, "discover failed\n"); > + exit(1); > + } > + } > + > + if (guid) { > + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); > + print_switch(sw, NULL); > + } else if (dr_path) { > + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); > + print_switch(sw, NULL); > + } else { > + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); > + } > + > + ibnd_destroy_fabric(fabric); > + > + close_node_name_map(node_name_map); > + exit(0); > +} > diff --git a/infiniband-diags/libibnetdisc/test/ibnetdisctest.c b/infiniband-diags/libibnetdisc/test/ibnetdisctest.c > new file mode 100644 > index 0000000..fc6e234 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/test/ibnetdisctest.c > @@ -0,0 +1,675 @@ > +/* > + * Copyright (c) 2004-2008 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +static int verbose; > +#define LIST_CA_NODE (1 << IBND_CA_NODE) > +#define LIST_SWITCH_NODE (1 << IBND_SWITCH_NODE) > +#define LIST_ROUTER_NODE (1 << IBND_ROUTER_NODE) > + > +char *argv0 = "ibnetdiscover"; > +static FILE *f; > + > +static char *node_name_map_file = NULL; > +static nn_map_t *node_name_map = NULL; > + > +static int timeout_ms = 2000; > + > +static int debug = 0; > +#define DEBUG(str, args...) \ > + if (debug) fprintf(stderr, str, ##args) > + > + > +char * > +node_name(ibnd_node_t *node) > +{ > + static char buf[256]; > + > + switch(node->info.type) { > + case IBND_CA_NODE: > + sprintf(buf, "\"%s", "H"); > + break; > + case IBND_SWITCH_NODE: > + sprintf(buf, "\"%s", "S"); > + break; > + case IBND_ROUTER_NODE: > + sprintf(buf, "\"%s", "R"); > + break; > + default: > + sprintf(buf, "\"%s", "?"); > + break; > + } > + sprintf(buf+2, "-%016" PRIx64 "\"", node->info.nodeguid); > + > + return buf; > +} > + > +void > +list_node(ibnd_node_t *node, void *user_data) > +{ > + char *nodename = remap_node_name(node_name_map, node->info.nodeguid, > + node->nodedesc); > + > + fprintf(f, "%s\t : 0x%016" PRIx64 " ports %d devid 0x%x vendid 0x%x \"%s\"\n", > + ibnd_node_type_str(node), > + node->info.nodeguid, node->info.numports, node->info.devid, > + node->info.vendid, > + nodename); > + > + free(nodename); > +} > + > +void > +list_nodes(ibnd_fabric_t *fabric, int list) > +{ > + if (list & LIST_CA_NODE) { > + ibnd_iter_nodes_type(fabric, list_node, IBND_CA_NODE, NULL); > + } > + if (list & LIST_SWITCH_NODE) { > + ibnd_iter_nodes_type(fabric, list_node, IBND_SWITCH_NODE, NULL); > + } > + if (list & LIST_ROUTER_NODE) { > + ibnd_iter_nodes_type(fabric, list_node, IBND_ROUTER_NODE, NULL); > + } > +} > + > +void > +out_ids(ibnd_node_t *node, int group, char *chname) > +{ > + fprintf(f, "\nvendid=0x%x\ndevid=0x%x\n", node->info.vendid, node->info.devid); > + if (node->info.sysimgguid) > + fprintf(f, "sysimgguid=0x%" PRIx64, node->info.sysimgguid); > + if (group > + && node->chassis && node->chassis->chassisnum) { > + fprintf(f, "\t\t# Chassis %d", node->chassis->chassisnum); > + if (chname) > + fprintf(f, " (%s)", clean_nodedesc(chname)); > + if (ibnd_is_xsigo_tca(node->info.nodeguid) > + && node->ports[1] > + && node->ports[1]->remoteport) > + fprintf(f, " slot %d", node->ports[1]->remoteport->portnum); > + } > + fprintf(f, "\n"); > +} > + > + > +uint64_t > +out_chassis(ibnd_fabric_t *fabric, int chassisnum) > +{ > + uint64_t guid; > + > + fprintf(f, "\nChassis %d", chassisnum); > + guid = ibnd_get_chassis_guid(fabric, chassisnum); > + if (guid) > + fprintf(f, " (guid 0x%" PRIx64 ")", guid); > + fprintf(f, "\n"); > + return guid; > +} > + > +void > +out_switch(ibnd_node_t *node, int group, char *chname) > +{ > + char *str; > + char str2[256]; > + char *nodename = NULL; > + > + out_ids(node, group, chname); > + fprintf(f, "switchguid=0x%" PRIx64, node->info.nodeguid); > + fprintf(f, "(%" PRIx64 ")", node->info.nodeportguid); > + if (group) { > + str = ibnd_get_chassis_type(node); > + if (str) > + fprintf(f, "%s ", str); > + str = ibnd_get_chassis_slot_str(node, str2, 256); > + if (str) > + fprintf(f, "%s", str); > + } > + > + nodename = remap_node_name(node_name_map, node->info.nodeguid, > + node->nodedesc); > + > + fprintf(f, "\nSwitch\t%d %s\t\t# \"%s\" %s port 0 lid %d lmc %d\n", > + node->info.numports, node_name(node), > + nodename, > + node->sw_info.smaenhsp0 ? "enhanced" : "base", > + node->smalid, node->smalmc); > + > + free(nodename); > +} > + > +void > +out_ca(ibnd_node_t *node, int group, char *chname) > +{ > + char *node_type; > + char *node_type2; > + > + out_ids(node, group, chname); > + switch(node->info.type) { > + case IBND_CA_NODE: > + node_type = "ca"; > + node_type2 = "Ca"; > + break; > + case IBND_ROUTER_NODE: > + node_type = "rt"; > + node_type2 = "Rt"; > + break; > + default: > + node_type = "???"; > + node_type2 = "???"; > + break; > + } > + > + fprintf(f, "%sguid=0x%" PRIx64 "\n", node_type, node->info.nodeguid); > + fprintf(f, "%s\t%d %s\t\t# \"%s\"", > + node_type2, node->info.numports, node_name(node), > + clean_nodedesc(node->nodedesc)); > + if (group && ibnd_is_xsigo_hca(node->info.nodeguid)) > + fprintf(f, " (scp)"); > + fprintf(f, "\n"); > +} > + > +#define OUT_BUFFER_SIZE 16 > +static char * > +out_ext_port(ibnd_port_t *port, int group) > +{ > + static char mapping[OUT_BUFFER_SIZE]; > + > + if (group && port->ext_portnum != 0) { > + snprintf(mapping, OUT_BUFFER_SIZE, > + "[ext %d]", port->ext_portnum); > + return (mapping); > + } > + > + return (NULL); > +} > + > +void > +out_switch_port(ibnd_port_t *port, int group) > +{ > + char *ext_port_str = NULL; > + char *rem_nodename = NULL; > + > + DEBUG("port %p:%d remoteport %p\n", port, port->portnum, port->remoteport); > + fprintf(f, "[%d]", port->portnum); > + > + ext_port_str = out_ext_port(port, group); > + if (ext_port_str) > + fprintf(f, "%s", ext_port_str); > + > + rem_nodename = remap_node_name(node_name_map, > + port->remoteport->node->info.nodeguid, > + port->remoteport->node->nodedesc); > + > + ext_port_str = out_ext_port(port->remoteport, group); > + fprintf(f, "\t%s[%d]%s", > + node_name(port->remoteport->node), > + port->remoteport->portnum, > + ext_port_str ? ext_port_str : ""); > + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) > + fprintf(f, "(%" PRIx64 ") ", port->remoteport->guid); > + fprintf(f, "\t\t# \"%s\" lid %d %s%s", > + rem_nodename, > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 0)); > + > + if (ibnd_is_xsigo_tca(port->remoteport->guid)) > + fprintf(f, " slot %d", port->portnum); > + else if (ibnd_is_xsigo_hca(port->remoteport->guid)) > + fprintf(f, " (scp)"); > + fprintf(f, "\n"); > + > + free(rem_nodename); > +} > + > +void > +out_ca_port(ibnd_port_t *port, int group) > +{ > + char *str = NULL; > + char *rem_nodename = NULL; > + > + fprintf(f, "[%d]", port->portnum); > + if (port->node->info.type != IBND_SWITCH_NODE) > + fprintf(f, "(%" PRIx64 ") ", port->guid); > + fprintf(f, "\t%s[%d]", > + node_name(port->remoteport->node), > + port->remoteport->portnum); > + str = out_ext_port(port->remoteport, group); > + if (str) > + fprintf(f, "%s", str); > + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) > + fprintf(f, " (%" PRIx64 ") ", port->remoteport->guid); > + > + rem_nodename = remap_node_name(node_name_map, > + port->remoteport->node->info.nodeguid, > + port->remoteport->node->nodedesc); > + > + fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d %s%s\n", > + port->info.lid, port->info.lmc, rem_nodename, > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 0)); > + > + free(rem_nodename); > +} > + > +struct iter_user_data { > + int group; > + int skip_chassis_nodes; > +}; > + > +static void > +switch_iter_func(ibnd_node_t *node, void *iter_user_data) > +{ > + ibnd_port_t *port; > + int p = 0; > + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; > + > + DEBUG("SWITCH: node %p\n", node); > + > + /* skip chassis based switches if flagged */ > + if (data->skip_chassis_nodes && node->chassis && node->chassis->chassisnum) > + return; > + > + out_switch(node, data->group, NULL); > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_switch_port(port, data->group); > + } > +} > + > +static void > +ca_iter_func(ibnd_node_t *node, void *iter_user_data) > +{ > + ibnd_port_t *port; > + int p = 0; > + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; > + > + DEBUG("CA: node %p\n", node); > + /* Now, skip chassis based CAs */ > + if (data->group && node->chassis && node->chassis->chassisnum) > + return; > + out_ca(node, data->group, NULL); > + > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_ca_port(port, data->group); > + } > +} > + > +static void > +router_iter_func(ibnd_node_t *node, void *iter_user_data) > +{ > + ibnd_port_t *port; > + int p = 0; > + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; > + > + DEBUG("RT: node %p\n", node); > + /* Now, skip chassis based RTs */ > + if (data->group && node->chassis && node->chassis->chassisnum) > + return; > + out_ca(node, data->group, NULL); > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_ca_port(port, data->group); > + } > +} > + > +int > +dump_topology(int group, ibnd_fabric_t *fabric) > +{ > + ibnd_node_t *node; > + ibnd_port_t *port; > + int i = 0, p = 0; > + time_t t = time(0); > + uint64_t chguid; > + char *chname = NULL; > + struct iter_user_data iter_user_data; > + > + fprintf(f, "#\n# Topology file: generated on %s#\n", ctime(&t)); > + fprintf(f, "# Max of %d hops discovered\n", fabric->maxhops_discovered); > + fprintf(f, "# Initiated from node %016" PRIx64 " port %016" PRIx64 "\n", > + fabric->from_node->info.nodeguid, fabric->from_node->info.nodeportguid); > + > + /* Make pass on switches */ > + if (group) { > + ibnd_chassis_t *ch = NULL; > + > + /* Chassis based switches first */ > + for (ch = fabric->chassis; ch; ch = ch->next) { > + int n = 0; > + > + if (!ch->chassisnum) > + continue; > + chguid = out_chassis(fabric, ch->chassisnum); > + > + chname = NULL; > +/** > + * Will this work for Xsigo? > + */ > + if (ibnd_is_xsigo_guid(chguid)) { > + for (node = ch->nodes; node; > + node = node->next_chassis_node) { > + if (ibnd_is_xsigo_hca(node->info.nodeguid)) { > + chname = node->nodedesc; > + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); > + } > + } > + > +#if 0 > +/** > + * vs. this? > + * I don't want to expose the nodesdist array to the end user. > + */ > + for (node = fabric->nodesdist[MAXHOPS]; node; node = node->dnext) { > + if (!node->chrecord || > + !node->chrecord->chassisnum) > + continue; > + > + if (node->chrecord->chassisnum != ch->chassisnum) > + continue; > + > + if (ibnd_is_xsigo_hca(node->nodeguid)) { > + chname = node->nodedesc; > + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); > + } > + } > +#endif > + } > + > + fprintf(f, "\n# Spine Nodes"); > + for (n = 1; n <= SPINES_MAX_NUM; n++) { > + if (ch->spinenode[n]) { > + out_switch(ch->spinenode[n], group, chname); > + for (p = 1; p <= ch->spinenode[n]->info.numports; p++) { > + port = ch->spinenode[n]->ports[p]; > + if (port && port->remoteport) > + out_switch_port(port, group); > + } > + } > + } > + fprintf(f, "\n# Line Nodes"); > + for (n = 1; n <= LINES_MAX_NUM; n++) { > + if (ch->linenode[n]) { > + out_switch(ch->linenode[n], group, chname); > + for (p = 1; p <= ch->linenode[n]->info.numports; p++) { > + port = ch->linenode[n]->ports[p]; > + if (port && port->remoteport) > + out_switch_port(port, group); > + } > + } > + } > + > + fprintf(f, "\n# Chassis Switches"); > + for (node = ch->nodes; node; > + node = node->next_chassis_node) { > + if (node->info.type == IBND_SWITCH_NODE) { > + out_switch(node, group, chname); > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_switch_port(port, group); > + } > + } > + } > + > + fprintf(f, "\n# Chassis CAs"); > + for (node = ch->nodes; node; > + node = node->next_chassis_node) { > + if (node->info.type == IBND_CA_NODE) { > + out_ca(node, group, chname); > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_ca_port(port, group); > + } > + } > + } > + > + } > + > + } else { /* !group */ > + iter_user_data.group = group; > + iter_user_data.skip_chassis_nodes = 0; > + > + ibnd_iter_nodes_type(fabric, switch_iter_func, > + IBND_SWITCH_NODE, &iter_user_data); > + } > + > + chname = NULL; > + if (group) { > + iter_user_data.group = group; > + iter_user_data.skip_chassis_nodes = 1; > + > + fprintf(f, "\nNon-Chassis Nodes\n"); > + ibnd_iter_nodes_type(fabric, switch_iter_func, > + IBND_SWITCH_NODE, &iter_user_data); > + > + } > + > + iter_user_data.group = group; > + iter_user_data.skip_chassis_nodes = 0; > + > + /* Make pass on CAs */ > + ibnd_iter_nodes_type(fabric, ca_iter_func, IBND_CA_NODE, > + &iter_user_data); > + > + /* make pass on routers */ > + ibnd_iter_nodes_type(fabric, router_iter_func, IBND_ROUTER_NODE, > + &iter_user_data); > + > + return i; > +} > + > + > +void dump_ports_report (ibnd_node_t *node, void *user_data) > +{ > + int p = 0; > + ibnd_port_t *port = NULL; > + > + /* for each port */ > + for (p = node->info.numports, port = node->ports[p]; > + p > 0; > + port = node->ports[--p]) { > + if (port == NULL) > + continue; > + > + fprintf(stdout, > + "%2s %5d %2d 0x%016" PRIx64 " %s %s", > + ibnd_node_type_str_short(node), > + node->info.type == IBND_SWITCH_NODE ? node->smalid : port->info.lid, > + port->portnum, > + port->guid, > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 0)); > + if (port->remoteport) > + fprintf(stdout, > + " - %2s %5d %2d 0x%016" PRIx64 > + " ( '%s' - '%s' )\n", > + ibnd_node_type_str_short(port->remoteport->node), > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? > + port->remoteport->node->smalid : port->remoteport->info.lid, > + port->remoteport->portnum, > + port->remoteport->guid, > + port->node->nodedesc, > + port->remoteport->node->nodedesc); > + else > + fprintf(stdout, "%36s'%s'\n", "", > + port->node->nodedesc); > + } > +} > + > +void > +usage(void) > +{ > + fprintf(stderr, "Usage: %s [-d(ebug)] -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -R(outer_list) -V(ersion) -C ca_name -P ca_port " > + "-t(imeout) timeout_ms --node-name-map node-name-map] -p(orts) []\n", > + argv0); > + fprintf(stderr, " --node-name-map specify a node name map file\n"); > + exit(-1); > +} > + > +int > +main(int argc, char **argv) > +{ > + int list = 0; > + char *ca = 0; > + int ca_port = 0; > + int group = 0; > + int ports_report = 0; > + ibnd_fabric_t *fabric = NULL; > + > + static char const str_opts[] = "C:P:t:devslgHSRpVhu"; > + static const struct option long_opts[] = { > + { "C", 1, 0, 'C'}, > + { "P", 1, 0, 'P'}, > + { "debug", 0, 0, 'd'}, > + { "verbose", 0, 0, 'v'}, > + { "show", 0, 0, 's'}, > + { "list", 0, 0, 'l'}, > + { "grouping", 0, 0, 'g'}, > + { "Hca_list", 0, 0, 'H'}, > + { "Switch_list", 0, 0, 'S'}, > + { "Router_list", 0, 0, 'R'}, > + { "timeout", 1, 0, 't'}, > + { "node-name-map", 1, 0, 1}, > + { "ports", 0, 0, 'p'}, > + { "Version", 0, 0, 'V'}, > + { "help", 0, 0, 'h'}, > + { "usage", 0, 0, 'u'}, > + { } > + }; > + > + f = stdout; > + > + argv0 = argv[0]; > + > + while (1) { > + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); > + if ( ch == -1 ) > + break; > + switch(ch) { > + case 1: > + node_name_map_file = strdup(optarg); > + break; > + case 'C': > + ca = optarg; > + break; > + case 'P': > + ca_port = strtoul(optarg, 0, 0); > + break; > + case 'd': > + debug = 1; > + ibnd_debug(1); > + break; > + case 't': > + timeout_ms = strtoul(optarg, 0, 0); > + break; > + case 'v': > + verbose++; > + break; > + case 's': > + ibnd_show_progress(1); > + break; > + case 'l': > + list = LIST_CA_NODE | LIST_SWITCH_NODE | LIST_ROUTER_NODE; > + break; > + case 'g': > + group = 1; > + break; > + case 'S': > + list |= LIST_SWITCH_NODE; > + break; > + case 'H': > + list |= LIST_CA_NODE; > + break; > + case 'R': > + list |= LIST_ROUTER_NODE; > + break; > + case 'p': > + ports_report = 1; > + break; > + default: > + usage(); > + break; > + } > + } > + argc -= optind; > + argv += optind; > + > + if (argc && !(f = fopen(argv[0], "w"))) > + fprintf(stderr, "can't open file %s for writing", argv[0]); > + > + node_name_map = open_node_name_map(node_name_map_file); > + > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { > + fprintf(stderr, "discover failed\n"); > + exit(1); > + } > + > + if (ports_report) > + ibnd_iter_nodes(fabric, > + dump_ports_report, > + NULL); > + else if (list) > + list_nodes(fabric, list); > + else > + dump_topology(group, fabric); > + > + ibnd_destroy_fabric(fabric); > + close_node_name_map(node_name_map); > + exit(0); > +} > diff --git a/infiniband-diags/libibnetdisc/test/testleaks.c b/infiniband-diags/libibnetdisc/test/testleaks.c > new file mode 100644 > index 0000000..3fbf7af > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/test/testleaks.c > @@ -0,0 +1,268 @@ > +/* > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > + > +char *argv0 = "iblinkinfotest"; > +static FILE *f; > + > +static int timeout_ms = 500; > + > +void > +print_port(ibnd_node_t *node, ibnd_port_t *port) > +{ > + char remote_guid_str[256]; > + char remote_str[256]; > + char link_str[256]; > + char speed_msg[256]; > + char ext_port_str[256]; > + > + if (!port) > + return; > + > + remote_guid_str[0] = '\0'; > + remote_str[0] = '\0'; > + link_str[0] = '\0'; > + speed_msg[0] = '\0'; > + > + if (port->remoteport) { > + char remote_name_buf[256]; > + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); > + > + if (port->remoteport->ext_portnum) > + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); > + else > + ext_port_str[0] = '\0'; > + > + snprintf(remote_str, 256, > + "%s%6d %4d[%2s] \"%s\" (%s)\n", > + remote_guid_str, > + port->remoteport->info.lid ? > + port->remoteport->info.lid : > + port->remoteport->node->smalid, > + port->remoteport->portnum, > + ext_port_str, > + port->remoteport->node->nodedesc, > + speed_msg > + ); > + } else { > + snprintf(remote_str, 256, > + "%6s %4s[%2s] \"\" ( )\n", "", "", ""); > + } > + > + snprintf(link_str, 256, > + "(%3s %s %6s/%8s)", > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 0), > + ibnd_linkstate_str(port->info.link_state), > + ibnd_physstate_str(port->info.phys_state) > + ); > + > + if (port->ext_portnum) > + snprintf(ext_port_str, 256, "%d", port->ext_portnum); > + else > + ext_port_str[0] = '\0'; > + > + printf(" %6d %4d[%2s] ==%s==> %s", > + node->smalid, port->portnum, > + ext_port_str, > + link_str, > + remote_str > + ); > +} > + > +void > +print_switch(ibnd_node_t *node, void *user_data) > +{ > + int i = 0; > + > + for (i = 1; i <= node->info.numports; i++) { > + ibnd_port_t *port = node->ports[i]; > + if (!port) > + continue; > + if (port->info.link_state == IBND_LINK_DOWN) { > + print_port(node, port); > + } > + } > +} > + > +void > +usage(void) > +{ > + fprintf(stderr, > + "Usage: %s [-hclp -S -D -C -P ]\n" > + " Report link speed and connection for each port of each switch which is active\n" > + " -h This help message\n" > + " -i Number of iterations to run (default -1 == infinate)\n" > + > + " -S output only the node specified by guid\n" > + " -D print only node specified by \n" > + " -f specify node to start \"from\"\n" > + " -n Number of hops to include away from specified node\n" > + > + " -t timeout for any single fabric query\n" > + " -s show errors\n" > + > + " -C use selected Channel Adaptor name for queries\n" > + " -P use selected channel adaptor port for queries\n" > + " --debug print debug messages\n" > + , > + argv0); > + exit(-1); > +} > + > +int > +main(int argc, char **argv) > +{ > + char *ca = 0; > + int ca_port = 0; > + ibnd_fabric_t *fabric = NULL; > + uint64_t guid = 0; > + char *dr_path = NULL; > + char *from = NULL; > + int hops = 0; > + ib_portid_t port_id; > + int iters = -1; > + > + static char const str_opts[] = "S:D:n:C:P:t:shuf:i:"; > + static const struct option long_opts[] = { > + { "S", 1, 0, 'S'}, > + { "D", 1, 0, 'D'}, > + { "num-hops", 1, 0, 'n'}, > + { "ca-name", 1, 0, 'C'}, > + { "ca-port", 1, 0, 'P'}, > + { "timeout", 1, 0, 't'}, > + { "show", 0, 0, 's'}, > + { "help", 0, 0, 'h'}, > + { "usage", 0, 0, 'u'}, > + { "debug", 0, 0, 2}, > + { "from", 1, 0, 'f'}, > + { "iters", 1, 0, 'i'}, > + { } > + }; > + > + f = stdout; > + > + argv0 = argv[0]; > + > + while (1) { > + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); > + if ( ch == -1 ) > + break; > + switch(ch) { > + case 2: > + ibnd_debug(1); > + break; > + case 'f': > + from = strdup(optarg); > + break; > + case 'C': > + ca = strdup(optarg); > + break; > + case 'P': > + ca_port = strtoul(optarg, 0, 0); > + break; > + case 'D': > + dr_path = strdup(optarg); > + break; > + case 'n': > + hops = (int)strtol(optarg, NULL, 0); > + break; > + case 'i': > + iters = (int)strtol(optarg, NULL, 0); > + break; > + case 't': > + timeout_ms = strtoul(optarg, 0, 0); > + break; > + case 'S': > + guid = (uint64_t)strtoull(optarg, 0, 0); > + break; > + default: > + usage(); > + break; > + } > + } > + argc -= optind; > + argv += optind; > + > + while (iters == -1 || iters-- > 0) { > + if (from) { > + /* only scan part of the fabric */ > + str2drpath(&(port_id.drpath), from, 0, 0); > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { > + fprintf(stderr, "discover failed\n"); > + exit(1); > + } > + guid = 0; > + } else { > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { > + fprintf(stderr, "discover failed\n"); > + exit(1); > + } > + } > + > +#if 0 > + if (guid) { > + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); > + print_switch(sw, NULL); > + } else if (dr_path) { > + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); > + print_switch(sw, NULL); > + } else { > + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); > + } > +#endif > + > + ibnd_destroy_fabric(fabric); > + } > + > + exit(0); > +} > -- > 1.5.4.5 > From sashak at voltaire.com Sun Dec 21 07:27:08 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 21 Dec 2008 17:27:08 +0200 Subject: [ofa-general] Re: [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: <20081211162031.0c591f54.weiny2@llnl.gov> References: <20081211162031.0c591f54.weiny2@llnl.gov> Message-ID: <20081221152708.GO25208@sashak.voltaire.com> On 16:20 Thu 11 Dec , Ira Weiny wrote: > [snip...] > diff --git a/infiniband-diags/libibnetdisc/test/iblinkinfotest.c b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c > new file mode 100644 > index 0000000..6e63f4a > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c > @@ -0,0 +1,395 @@ > +/* > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > + > +char *argv0 = "iblinkinfotest"; > +static FILE *f; > + > +static char *node_name_map_file = NULL; > +static nn_map_t *node_name_map = NULL; > + > +static int timeout_ms = 500; > + > +static int debug = 0; > +#define DEBUG(str, args...) \ > + if (debug) fprintf(stderr, str, ##args) > + > +static int down_links_only = 0; > +static int line_mode = 0; > +static int add_sw_settings = 0; > +static int print_port_guids = 0; > + > +static unsigned int > +get_max(unsigned int num) > +{ > + unsigned int v = num; // 32-bit word to find the log base 2 of > + unsigned r = 0; // r will be lg(v) > + > + while (v >>= 1) // unroll for more speed... > + { > + r++; > + } > + > + return (1 << r); > +} > + > +void > +get_msg(char *width_msg, char *speed_msg, int msg_size, ibnd_port_t *port) > +{ > + int max_speed = 0; > + > + int max_width = get_max(port->info.link_width_supported > + & port->remoteport->info.link_width_supported); > + if ((max_width & port->info.link_width_active) == 0) { > + // we are not at the max supported width > + // print what we could be at. > + snprintf(width_msg, msg_size, "Could be %s", > + ibnd_linkwidth_str(max_width)); > + } > + > + max_speed = get_max(port->info.link_speed_supported > + & port->remoteport->info.link_speed_supported); > + if ((max_speed & port->info.link_speed_active) == 0) { > + // we are not at the max supported speed > + // print what we could be at. > + snprintf(speed_msg, msg_size, "Could be %s", > + ibnd_linkspeed_str(max_speed, 1)); > + } > +} > + > +void > +print_port(ibnd_node_t *node, ibnd_port_t *port) > +{ > + char remote_guid_str[256]; > + char remote_str[256]; > + char link_str[256]; > + char width_msg[256]; > + char speed_msg[256]; > + char ext_port_str[256]; > + > + if (!port) > + return; > + > + remote_guid_str[0] = '\0'; > + remote_str[0] = '\0'; > + link_str[0] = '\0'; > + width_msg[0] = '\0'; > + speed_msg[0] = '\0'; > + > + if (port->remoteport) { > + char remote_name_buf[256]; > + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); > + > + if (port->remoteport->ext_portnum) > + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); > + else > + ext_port_str[0] = '\0'; > + > + get_msg(width_msg, speed_msg, 256, port); > + if (line_mode) { > + if (print_port_guids) { > + snprintf(remote_guid_str, 256, > + "0x%016lx ", > + port->remoteport->guid); Here and below, printing uint64_t as %lx generates warning on 32-bit machine. I would suggest to use portable string macros - PRIx64. Sasha From julia at diku.dk Sun Dec 21 07:37:17 2008 From: julia at diku.dk (Julia Lawall) Date: Sun, 21 Dec 2008 16:37:17 +0100 (CET) Subject: [ofa-general] [PATCH 3/13] drivers/infiniband/hw/ehca: Remove redundant test Message-ID: From: Julia Lawall vpage is checked not to be NULL just after it is initialized at the beginning of each loop iteration. A simplified version of the semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // @r exists@ local idexpression x; expression E; position p1,p2; @@ if (x at p1 == NULL || ...) { ... when forall return ...; } ... when != \(x=E\|x--\|x++\|--x\|++x\|x-=E\|x+=E\|x|=E\|x&=E\|&x\) ( x at p2 == NULL | x at p2 != NULL ) // another path to the test that is not through p1? @s exists@ local idexpression r.x; position r.p1,r.p2; @@ ... when != x at p1 ( x at p2 == NULL | x at p2 != NULL ) @fix depends on !s@ position r.p1,r.p2; expression x,E; statement S1,S2; @@ ( - if ((x at p2 != NULL) || ...) S1 | - if ((x at p2 == NULL) && ...) S1 | - BUG_ON(x at p2 == NULL); ) // Signed-off-by: Julia Lawall --- drivers/infiniband/hw/ehca/ehca_eq.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_eq.c b/drivers/infiniband/hw/ehca/ehca_eq.c index 49660df..523e733 100644 --- a/drivers/infiniband/hw/ehca/ehca_eq.c +++ b/drivers/infiniband/hw/ehca/ehca_eq.c @@ -113,7 +113,7 @@ int ehca_create_eq(struct ehca_shca *shca, if (h_ret != H_SUCCESS || vpage) goto create_eq_exit2; } else { - if (h_ret != H_PAGE_REGISTERED || !vpage) + if (h_ret != H_PAGE_REGISTERED) goto create_eq_exit2; } } From sashak at voltaire.com Sun Dec 21 07:45:20 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 21 Dec 2008 17:45:20 +0200 Subject: [ofa-general] [PATCH] infiniband-diags/ibstat, smpdump: kill unused includes Message-ID: <20081221154520.GA28259@sashak.voltaire.com> Kill unused headers inclusion. Signed-off-by: Sasha Khapyorsky --- infiniband-diags/src/ibstat.c | 13 ------------- infiniband-diags/src/smpdump.c | 11 ----------- 2 files changed, 0 insertions(+), 24 deletions(-) diff --git a/infiniband-diags/src/ibstat.c b/infiniband-diags/src/ibstat.c index 600a657..5d2113e 100644 --- a/infiniband-diags/src/ibstat.c +++ b/infiniband-diags/src/ibstat.c @@ -39,22 +39,9 @@ #include #include -#include #include #include -#include -#include -#include -#include -#include -#include -#include -#include #include -#include -#include -#include -#include #include #include diff --git a/infiniband-diags/src/smpdump.c b/infiniband-diags/src/smpdump.c index 209d7b1..e26b369 100644 --- a/infiniband-diags/src/smpdump.c +++ b/infiniband-diags/src/smpdump.c @@ -42,18 +42,7 @@ #include #include #include -#include -#include -#include -#include -#include -#include -#include #include -#include -#include -#include -#include #include #include -- 1.6.0.4.766.g6fc4a From ronli.voltaire at gmail.com Sun Dec 21 08:21:55 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Sun, 21 Dec 2008 18:21:55 +0200 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: References: <4907348E.7060508@mellanox.co.il> <3b5e77ad0812210414n73765c2iccf2fc98c492c07c@mail.gmail.com> Message-ID: <3b5e77ad0812210821k66346d5frb036bc6bc9e8894c@mail.gmail.com> Roland, We encountered a problem that when a machine didn't support the required number of vectors (nvec), instead of trying to get 2 vectors like in the previous version, it didn't use MSI-X at all - causing a major performance degradation. Maybe in a case of failure we should try lowering the number of vectors to 2 (like in the previous version) or the return value of pci_enable_msix and goto no_msi only in case of a second failure. Ron On Sun, Dec 21, 2008 at 5:09 PM, Roland Dreier wrote: > > Wouldn't going to retry with the same nreq num instead of the err > > value might produce an infinite loop? > > yep, I never exercised that code. Will fix, thanks. > > - R. > From sashak at voltaire.com Sun Dec 21 12:38:06 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 21 Dec 2008 22:38:06 +0200 Subject: [ofa-general] PATCH[5/6] Windows port of libibmad - rpc.c, sa.c, serv.c, smp.c, vendor.c In-Reply-To: <000301c96171$2e579420$435a180a@amr.corp.intel.com> References: <000301c96171$2e579420$435a180a@amr.corp.intel.com> Message-ID: <20081221203806.GD28259@sashak.voltaire.com> On 16:31 Thu 18 Dec , Sean Hefty wrote: > >-static pthread_mutex_t rpclock = PTHREAD_MUTEX_INITIALIZER; > >+static cl_plock_t rpclock; > > There's a complib mutex implementation available. plock is a reader/writer > lock. Actually I'm not sure that we need such synchronization in libibmad. I think it was here from a days when libibumad was not thread safe. I will look at this closer, maybe we just can remove pthread*() from libibmad. Sasha From sashak at voltaire.com Sun Dec 21 12:51:24 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 21 Dec 2008 22:51:24 +0200 Subject: [ofa-general] PATCH[1/6] Windows port of libibmad - mad.h In-Reply-To: References: Message-ID: <20081221205124.GE28259@sashak.voltaire.com> Hi Arlin, On 15:37 Thu 18 Dec , Davis, Arlin R wrote: > > Port of libibmad to windows. Dependencies on libibumad port and complib (in lieu of libcommon). Removed dependency on libibcommon. > > Intent is to allow common mad code base for Windows and Linux to simplify maintainablity across OFED and WinOF. This patch set was built and tested on Windows and built on Linux (not tested yet). > > Patches separated as follow: > > 1/6 - mad.h > 2/6 - dump.c > 3/6 - fields.c > 4/6 - gs.c, mad.c, portid.c, register.c, resolve.c > 5/6 - rpc.c, sa.c, serv.c, smp.c, vendor.c > 6/6 - new files for windows: dirs, src/Sources, src/ibmad_export.def, src/ibmad_exports.src, ibmad_main.cpp What is the purpose of those patches? RFC? Inclusion to a main stream? Then applying this patch I'm getting: warning: squelched 20 whitespace errors warning: 25 lines applied after fixing whitespace errors. , and it even doesn't compile (I have complib installed in this case): ./include/infiniband/mad.h:37:30: error: complib/cl_types.h: No such file or directory ./include/infiniband/mad.h:38:33: error: complib/cl_byteswap.h: No such file or directory ./include/infiniband/mad.h:39:30: error: complib/cl_debug.h: No such file or directory Some porting comments are below. > > Signed-off by: Arlin Davis > > diff -aur libibmad-1.2.2/include/infiniband/mad.h libibmad/include/infiniband/mad.h > --- libibmad-1.2.2/include/infiniband/mad.h 2008-08-31 07:15:05.000000000 -0700 > +++ libibmad/include/infiniband/mad.h 2008-12-17 17:02:54.873046600 -0800 > @@ -33,8 +33,10 @@ > #ifndef _MAD_H_ > #define _MAD_H_ > > -#include > -#include > +/* use complib for portability */ > +#include > +#include > +#include Currently libibmad doesn't depend from complib. It would be really nice to not new dependencies (normally we build libibmad before complib, which is part of OpenSM). Also what is wrong with and (or ) in Win? > > #ifdef __cplusplus > # define BEGIN_C_DECLS extern "C" { > @@ -46,8 +48,14 @@ > > BEGIN_C_DECLS > > +#if defined(_WIN32) || defined(_WIN64) > +#define MAD_EXPORT __declspec(dllexport) > +#else > +#define MAD_EXPORT extern > +#endif > + I wrote in another email. It would be nice to minimize a number of needed changes and number of #ifdef introduced. If we will add "extern" keyword for exported symbols and somewhere in windows-specific header file it will be redefined as #define extern __declspec(dllexport) ? Could this help? > #define IB_SUBNET_PATH_HOPS_MAX 64 > -#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000llu > +#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000ULL > #define IB_DEFAULT_QP1_QKEY 0x80010000 > > #define IB_MAD_SIZE 256 > @@ -620,10 +628,10 @@ > /******************************************************************************/ > > /* portid.c */ > -char * portid2str(ib_portid_t *portid); > -int portid2portnum(ib_portid_t *portid); > -int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int drdlid); > -char * drpath2str(ib_dr_path_t *path, char *dstr, size_t dstr_size); > +MAD_EXPORT char * portid2str(ib_portid_t *portid); > +MAD_EXPORT int portid2portnum(ib_portid_t *portid); > +MAD_EXPORT int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int drdlid); > +MAD_EXPORT char * drpath2str(ib_dr_path_t *path, char *dstr, size_t dstr_size); > > static inline int > ib_portid_set(ib_portid_t *portid, int lid, int qp, int qkey) > @@ -639,77 +647,49 @@ > /* fields.c */ > extern ib_field_t ib_mad_f[]; > > -void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); > +void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); What is the change here? Maybe whitespaces which were added/stripped by mailer, but I don't see this. > uint32_t _get_field(void *buf, int base_offs, ib_field_t *f); > -void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); > -void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); > -void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); > +void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); > +void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); > +void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); > uint64_t _get_field64(void *buf, int base_offs, ib_field_t *f); > > /* mad.c */ > -static inline uint32_t > -mad_get_field(void *buf, int base_offs, int field) > -{ > - return _get_field(buf, base_offs, ib_mad_f + field); > -} > - > -static inline void > -mad_set_field(void *buf, int base_offs, int field, uint32_t val) > -{ > - _set_field(buf, base_offs, ib_mad_f + field, val); > -} > - > +MAD_EXPORT uint32_t mad_get_field(void *buf, int base_offs, int field); > +MAD_EXPORT void mad_set_field(void *buf, int base_offs, int field, uint32_t val); Windows don't like "inline"? Maybe we just need to strip _set/get_field*() functions? Sasha > /* field must be byte aligned */ > -static inline uint64_t > -mad_get_field64(void *buf, int base_offs, int field) > -{ > - return _get_field64(buf, base_offs, ib_mad_f + field); > -} > - > -static inline void > -mad_set_field64(void *buf, int base_offs, int field, uint64_t val) > -{ > - _set_field64(buf, base_offs, ib_mad_f + field, val); > -} > - > -static inline void > -mad_set_array(void *buf, int base_offs, int field, void *val) > -{ > - _set_array(buf, base_offs, ib_mad_f + field, val); > -} > - > -static inline void > -mad_get_array(void *buf, int base_offs, int field, void *val) > -{ > - _get_array(buf, base_offs, ib_mad_f + field, val); > -} > - > -void mad_decode_field(uint8_t *buf, int field, void *val); > -void mad_encode_field(uint8_t *buf, int field, void *val); > -void * mad_encode(void *buf, ib_rpc_t *rpc, ib_dr_path_t *drpath, void *data); > -uint64_t mad_trid(void); > -int mad_build_pkt(void *umad, ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, void *data); > +MAD_EXPORT uint64_t mad_get_field64(void *buf, int base_offs, int field); > +MAD_EXPORT void mad_set_field64(void *buf, int base_offs, int field, uint64_t val); > +MAD_EXPORT void mad_set_array(void *buf, int base_offs, int field, void *val); > +MAD_EXPORT void mad_get_array(void *buf, int base_offs, int field, void *val); > +MAD_EXPORT void mad_decode_field(uint8_t *buf, int field, void *val); > +MAD_EXPORT void mad_encode_field(uint8_t *buf, int field, void *val); > +MAD_EXPORT void *mad_encode(void *buf, ib_rpc_t *rpc, ib_dr_path_t *drpath, void *data); > +MAD_EXPORT uint64_t mad_trid(void); > +MAD_EXPORT int mad_build_pkt(void *umad, ib_rpc_t *rpc, ib_portid_t *dport, > + ib_rmpp_hdr_t *rmpp, void *data); > > /* register.c */ > -int mad_register_port_client(int port_id, int mgmt, uint8_t rmpp_version); > -int mad_register_client(int mgmt, uint8_t rmpp_version); > -int mad_register_server(int mgmt, uint8_t rmpp_version, > - long method_mask[16/sizeof(long)], > - uint32_t class_oui); > -int mad_class_agent(int mgmt); > -int mad_agent_class(int agent); > +MAD_EXPORT int mad_register_port_client(int port_id, int mgmt, > + uint8_t rmpp_version); > +MAD_EXPORT int mad_register_client(int mgmt, uint8_t rmpp_version); > +MAD_EXPORT int mad_register_server(int mgmt, uint8_t rmpp_version, > + long method_mask[16/sizeof(long)], > + uint32_t class_oui); > +MAD_EXPORT int mad_class_agent(int mgmt); > +MAD_EXPORT int mad_agent_class(int agent); > > /* serv.c */ > -int mad_send(ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, > - void *data); > -void * mad_receive(void *umad, int timeout); > -int mad_respond(void *umad, ib_portid_t *portid, uint32_t rstatus); > -void * mad_alloc(void); > -void mad_free(void *umad); > +MAD_EXPORT int mad_send(ib_rpc_t *rpc, ib_portid_t *dport, > + ib_rmpp_hdr_t *rmpp, void *data); > +MAD_EXPORT void * mad_receive(void *umad, int timeout); > +MAD_EXPORT int mad_respond(void *umad, ib_portid_t *portid, uint32_t rstatus); > +MAD_EXPORT void * mad_alloc(void); > +MAD_EXPORT void mad_free(void *umad); > > /* vendor.c */ > -uint8_t *ib_vendor_call(void *data, ib_portid_t *portid, > - ib_vendor_call_t *call); > +MAD_EXPORT uint8_t *ib_vendor_call(void *data, ib_portid_t *portid, > + ib_vendor_call_t *call); > > static inline int > mad_is_vendor_range1(int mgmt) > @@ -718,38 +698,41 @@ > } > > static inline int > -mad_is_vendor_range2(int mgmt) > +mad_is_vendor_range2(int mgmt) > { > return mgmt >= 0x30 && mgmt <= 0x4f; > } > > /* rpc.c */ > -int madrpc_portid(void); > -int madrpc_set_retries(int retries); > -int madrpc_set_timeout(int timeout); > -void * madrpc(ib_rpc_t *rpc, ib_portid_t *dport, void *payload, void *rcvdata); > -void * madrpc_rmpp(ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, > +MAD_EXPORT int madrpc_portid(void); > +MAD_EXPORT int madrpc_set_retries(int retries); > +MAD_EXPORT int madrpc_set_timeout(int timeout); > +MAD_EXPORT void madrpc_init(char *dev_name, int dev_port, > + int *mgmt_classes, int num_classes); > +MAD_EXPORT void madrpc_show_errors(int set); > +void * madrpc(ib_rpc_t *rpc, ib_portid_t *dport, > + void *payload, void *rcvdata); > +void * madrpc_rmpp(ib_rpc_t *rpc, ib_portid_t *dport, > + ib_rmpp_hdr_t *rmpp, void *data); > + > +void madrpc_save_mad(void *madbuf, int len); > +void madrpc_lock(void); > +void madrpc_unlock(void); > +void * mad_rpc_open_port(char *dev_name, int dev_port, > + int *mgmt_classes, int num_classes); > +void mad_rpc_close_port(void *ibmad_port); > +void * mad_rpc(const void *ibmad_port, ib_rpc_t *rpc, > + ib_portid_t *dport, void *payload, > + void *rcvdata); > +void * mad_rpc_rmpp(const void *ibmad_port, ib_rpc_t *rpc, > + ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, > void *data); > -void madrpc_init(char *dev_name, int dev_port, int *mgmt_classes, > - int num_classes); > -void madrpc_save_mad(void *madbuf, int len); > -void madrpc_lock(void); > -void madrpc_unlock(void); > -void madrpc_show_errors(int set); > - > -void * mad_rpc_open_port(char *dev_name, int dev_port, int *mgmt_classes, > - int num_classes); > -void mad_rpc_close_port(void *ibmad_port); > -void * mad_rpc(const void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport, > - void *payload, void *rcvdata); > -void * mad_rpc_rmpp(const void *ibmad_port, ib_rpc_t *rpc, ib_portid_t *dport, > - ib_rmpp_hdr_t *rmpp, void *data); > > /* smp.c */ > -uint8_t * smp_query(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, > - unsigned timeout); > -uint8_t * smp_set(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, > - unsigned timeout); > +MAD_EXPORT uint8_t * smp_query(void *buf, ib_portid_t *id, unsigned attrid, > + unsigned mod, unsigned timeout); > +MAD_EXPORT uint8_t * smp_set(void *buf, ib_portid_t *id, unsigned attrid, > + unsigned mod, unsigned timeout); > uint8_t * smp_query_via(void *buf, ib_portid_t *id, unsigned attrid, > unsigned mod, unsigned timeout, const void *srcport); > uint8_t * smp_set_via(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, > @@ -786,9 +769,9 @@ > unsigned timeout); > uint8_t * sa_rpc_call(const void *ibmad_port, void *rcvbuf, ib_portid_t *portid, > ib_sa_call_t *sa, unsigned timeout); > -int ib_path_query(ibmad_gid_t srcgid, ibmad_gid_t destgid, ib_portid_t *sm_id, > - void *buf); /* returns lid */ > -int ib_path_query_via(const void *srcport, ibmad_gid_t srcgid, > +MAD_EXPORT int ib_path_query(ibmad_gid_t srcgid, ibmad_gid_t destgid, > + ib_portid_t *sm_id, void *buf); /* returns lid */ > +int ib_path_query_via(const void *srcport, ibmad_gid_t srcgid, > ibmad_gid_t destgid, ib_portid_t *sm_id, void *buf); > > inline static uint8_t * > @@ -805,38 +788,38 @@ > } > > /* resolve.c */ > -int ib_resolve_smlid(ib_portid_t *sm_id, int timeout); > -int ib_resolve_guid(ib_portid_t *portid, uint64_t *guid, > - ib_portid_t *sm_id, int timeout); > -int ib_resolve_portid_str(ib_portid_t *portid, char *addr_str, > - int dest_type, ib_portid_t *sm_id); > -int ib_resolve_self(ib_portid_t *portid, int *portnum, ibmad_gid_t *gid); > - > -int ib_resolve_smlid_via(ib_portid_t *sm_id, int timeout, > - const void *srcport); > -int ib_resolve_guid_via(ib_portid_t *portid, uint64_t *guid, > - ib_portid_t *sm_id, int timeout, > - const void *srcport); > -int ib_resolve_portid_str_via(ib_portid_t *portid, char *addr_str, > - int dest_type, ib_portid_t *sm_id, > - const void *srcport); > -int ib_resolve_self_via(ib_portid_t *portid, int *portnum, ibmad_gid_t *gid, > - const void *srcport); > +MAD_EXPORT int ib_resolve_smlid(ib_portid_t *sm_id, int timeout); > +MAD_EXPORT int ib_resolve_guid(ib_portid_t *portid, uint64_t *guid, > + ib_portid_t *sm_id, int timeout); > +MAD_EXPORT int ib_resolve_portid_str(ib_portid_t *portid, char *addr_str, > + int dest_type, ib_portid_t *sm_id); > +MAD_EXPORT int ib_resolve_self(ib_portid_t *portid, int *portnum, > + ibmad_gid_t *gid); > +int ib_resolve_smlid_via(ib_portid_t *sm_id, int timeout, > + const void *srcport); > +int ib_resolve_guid_via(ib_portid_t *portid, uint64_t *guid, > + ib_portid_t *sm_id, int timeout, > + const void *srcport); > +int ib_resolve_portid_str_via(ib_portid_t *portid, char *addr_str, > + int dest_type, ib_portid_t *sm_id, > + const void *srcport); > +int ib_resolve_self_via(ib_portid_t *portid, int *portnum, ibmad_gid_t *gid, > + const void *srcport); > > /* gs.c */ > -uint8_t *perf_classportinfo_query(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *perf_classportinfo_query(void *rcvbuf, ib_portid_t *dest, int port, > unsigned timeout); > -uint8_t *port_performance_query(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_performance_query(void *rcvbuf, ib_portid_t *dest, int port, > unsigned timeout); > -uint8_t *port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_performance_reset(void *rcvbuf, ib_portid_t *dest, int port, > unsigned mask, unsigned timeout); > -uint8_t *port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_performance_ext_query(void *rcvbuf, ib_portid_t *dest, int port, > unsigned timeout); > -uint8_t *port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_performance_ext_reset(void *rcvbuf, ib_portid_t *dest, int port, > unsigned mask, unsigned timeout); > -uint8_t *port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_samples_control_query(void *rcvbuf, ib_portid_t *dest, int port, > unsigned timeout); > -uint8_t *port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port, > +MAD_EXPORT uint8_t *port_samples_result_query(void *rcvbuf, ib_portid_t *dest, int port, > unsigned timeout); > > uint8_t *perf_classportinfo_query_via(void *rcvbuf, ib_portid_t *dest, int port, > @@ -855,7 +838,7 @@ > unsigned timeout, const void *srcport); > /* dump.c */ > ib_mad_dump_fn > - mad_dump_int, mad_dump_uint, mad_dump_hex, mad_dump_rhex, > + MAD_EXPORT mad_dump_int, mad_dump_uint, mad_dump_hex, mad_dump_rhex, > mad_dump_bitfield, mad_dump_array, mad_dump_string, > mad_dump_linkwidth, mad_dump_linkwidthsup, mad_dump_linkwidthen, > mad_dump_linkdowndefstate, > @@ -900,6 +883,34 @@ > > extern int ibdebug; > > +/* remove libibcommon dependencies, use complib */ > + > +/* dump.c */ > +MAD_EXPORT void xdump(FILE *file, char *msg, void *p, int size); > + > +/** printf style debugging MACRO's, map to cl_msg_out */ > +#if !defined(IBWARN) > +#define IBWARN(fmt, ...) cl_msg_out(fmt, ## __VA_ARGS__) > +#endif > +#if !defined(IBPANIC) > +#define IBPANIC(fmt, ...) \ > +{ \ > + cl_msg_out(fmt, ## __VA_ARGS__); \ > + CL_ASSERT(0); \ > +} > +#endif > + > +/** align value \a l to \a size (ceil) */ > +#if !defined(ALIGN) > +#define ALIGN(l, size) (((l) + ((size) - 1)) / (size) * (size)) > +#endif > + > +/** align value \a l to \a sizeof 32 bit int (ceil) */ > +#if !defined(ALIGN32) > +#define ALIGN32(l) (ALIGN((l), sizeof(uint32))) > +#endif > + > + > END_C_DECLS > > #endif /* _MAD_H_ */ > > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From sashak at voltaire.com Sun Dec 21 13:00:50 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Sun, 21 Dec 2008 23:00:50 +0200 Subject: [ofa-general] RE: PATCH[1/6] Windows port of libibmad - mad.h In-Reply-To: <000101c9616e$2a602290$435a180a@amr.corp.intel.com> References: <000101c9616e$2a602290$435a180a@amr.corp.intel.com> Message-ID: <20081221210050.GF28259@sashak.voltaire.com> On 16:10 Thu 18 Dec , Sean Hefty wrote: > > Looking at the changes, do the management developers think that it makes sense > to share the libibmad implementation, or should separate implementations be > maintained, similar to libibumad? If the implementations are not shared, can > the Linux side treat the API as an external interface, rather than a private > interface? I think libibmad implementations could be shared - unlike libibumad it is almost system independent. And again, it would be really nice if such porting will not introduce a lot of changes, #ifdef WIN, etc.. C99 stuff looks like a problem for me (could this be solved somehow with VC?), the rest seems doable. Sasha > > >+#if defined(_WIN32) || defined(_WIN64) > >+#define MAD_EXPORT __declspec(dllexport) > >+#else > >+#define MAD_EXPORT extern > > I don't know that 'extern' is appropriate here. > > >+#endif > >+ > > #define IB_SUBNET_PATH_HOPS_MAX 64 > >-#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000llu > >+#define IB_DEFAULT_SUBN_PREFIX 0xfe80000000000000ULL > > #define IB_DEFAULT_QP1_QKEY 0x80010000 > > > > #define IB_MAD_SIZE 256 > >@@ -620,10 +628,10 @@ > > > >/****************************************************************************** > >/ > > > > /* portid.c */ > >-char * portid2str(ib_portid_t *portid); > >-int portid2portnum(ib_portid_t *portid); > >-int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int > >drdlid); > >-char * drpath2str(ib_dr_path_t *path, char *dstr, size_t dstr_size); > >+MAD_EXPORT char * portid2str(ib_portid_t *portid); > >+MAD_EXPORT int portid2portnum(ib_portid_t *portid); > >+MAD_EXPORT int str2drpath(ib_dr_path_t *path, char *routepath, int drslid, int > >drdlid); > >+MAD_EXPORT char * drpath2str(ib_dr_path_t *path, char *dstr, size_t > >dstr_size); > > > > static inline int > > ib_portid_set(ib_portid_t *portid, int lid, int qp, int qkey) > >@@ -639,77 +647,49 @@ > > /* fields.c */ > > extern ib_field_t ib_mad_f[]; > > > >-void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); > >+void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); > > uint32_t _get_field(void *buf, int base_offs, ib_field_t *f); > >-void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); > >-void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); > >-void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); > >+void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); > >+void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); > >+void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); > > uint64_t _get_field64(void *buf, int base_offs, ib_field_t *f); > > Are these really the functions that should be exported from the library or in > the header file? (I'm probably missing some history here.) > > - Sean > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rdreier at cisco.com Sun Dec 21 13:27:57 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 21 Dec 2008 13:27:57 -0800 Subject: [ofa-general] Re: [PATCH 3/13] drivers/infiniband/hw/ehca: Remove redundant test In-Reply-To: (Julia Lawall's message of "Sun, 21 Dec 2008 16:37:17 +0100 (CET)") References: Message-ID: Thanks, applied for 2.6.29 (ehca guys added to cc list just to make sure this is OK). - R. From rdreier at cisco.com Sun Dec 21 13:47:23 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 21 Dec 2008 13:47:23 -0800 Subject: [ofa-general] [PATCH] cma_zero_addr In-Reply-To: <1228639669.3833.10.camel@alst60.voltaire.com> (Aleksey Senin's message of "Sun, 07 Dec 2008 10:47:49 +0200") References: <1228222680.14862.13.camel@alst60.voltaire.com> <1228639669.3833.10.camel@alst60.voltaire.com> Message-ID: > PATCHv6 is the latest version. Should we use it? > http://lists.openfabrics.org/pipermail/general/2008-December/055727.html OK, so I just tried to apply this, but I'm not sure what tree it's against. For example, I see this in patch 1/2: > --- a/drivers/infiniband/core/addr.c > +++ b/drivers/infiniband/core/addr.c > @@ -43,6 +43,7 @@ > #include > #include > #include > +#include but the includes in the kernel look like #include #include #include #include and as far as I can tell they haven't changed for a long time. And further down the patch, I see > -static int addr_resolve_remote(struct sockaddr *src_in, > - struct sockaddr *dst_in, but again my kernel looks different: static int addr_resolve_remote(struct sockaddr_in *src_in, struct sockaddr_in *dst_in, and again that code hasn't changed in several years. rather than having me continuing to try and fix this up by hand (and probably break something), can you resend the series with the patches regenerated against an unpatched kernel? Best is Linus's tree, but really this area hasn't changed for a long time so pretty much and kernel should do. Thanks. From rdreier at cisco.com Sun Dec 21 13:57:09 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 21 Dec 2008 13:57:09 -0800 Subject: [ofa-general] [PATCH] iser: avoid recv buf exhaustion v2 (resend) In-Reply-To: <1228698773-26528-1-git-send-email-ddiss@sgi.com> (David Disseldorp's message of "Mon, 8 Dec 2008 12:12:52 +1100") References: <1228698773-26528-1-git-send-email-ddiss@sgi.com> Message-ID: Please put information that shouldn't go into the patch description inside the [PATCH] part of the subject line -- otherwise I have to edit things by hand to avoid having the "v2 (resend)" in the kernel log. ie the subject line you should have used was something like [PATCH v2 - resend] IB/iser: Avoid recv buffer exhaustion caused by unexpected PDUs Also, checkpatch says: ERROR: code indent should use tabs where possible #146: FILE: drivers/infiniband/ulp/iser/iser_initiator.c:198: + outstanding_unexp_pdus =$ ERROR: code indent should use tabs where possible #147: FILE: drivers/infiniband/ulp/iser/iser_initiator.c:199: + atomic_xchg(&iser_conn->ib_conn->unexpected_pdu_count, 0);$ ERROR: else should follow close brace '}' #300: FILE: drivers/infiniband/ulp/iser/iser_initiator.c:636: + } + else if (opcode == ISCSI_OP_ASYNC_EVENT) { I fixed these by hand when applying, but please check your patches yourself in the future to avoid this. - R. From sashak at voltaire.com Sun Dec 21 14:05:20 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 Dec 2008 00:05:20 +0200 Subject: [ofa-general] [PATCH] libibmad: remove hidden _set/_get_field*() API In-Reply-To: <000201c9620c$00cdb2a0$ae58180a@amr.corp.intel.com> References: <000101c9616e$2a602290$435a180a@amr.corp.intel.com> <000201c9620c$00cdb2a0$ae58180a@amr.corp.intel.com> Message-ID: <20081221220520.GG28259@sashak.voltaire.com> Remove hidden _set/_get_field*() API - instead export and use mad_set/get_field*() API directly. Signed-off-by: Sasha Khapyorsky --- On 11:00 Fri 19 Dec , Sean Hefty wrote: > > I did see that. I was thinking more of renaming _set_field to mad_set_field and > removing the existing implementation of mad_set_field. Ok, let's just do it. Sasha libibmad/include/infiniband/mad.h | 93 ++++------------------- libibmad/src/dump.c | 51 ------------ libibmad/src/fields.c | 156 +++++++++++++++++++++++++++++++++--- libibmad/src/libibmad.map | 20 ++--- libibmad/src/mad.c | 40 ---------- 5 files changed, 166 insertions(+), 194 deletions(-) diff --git a/libibmad/include/infiniband/mad.h b/libibmad/include/infiniband/mad.h index c2ad148..fd0deff 100644 --- a/libibmad/include/infiniband/mad.h +++ b/libibmad/include/infiniband/mad.h @@ -637,58 +637,23 @@ ib_portid_set(ib_portid_t *portid, int lid, int qp, int qkey) } /* fields.c */ -extern ib_field_t ib_mad_f[]; - -void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); -uint32_t _get_field(void *buf, int base_offs, ib_field_t *f); -void _set_array(void *buf, int base_offs, ib_field_t *f, void *val); -void _get_array(void *buf, int base_offs, ib_field_t *f, void *val); -void _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val); -uint64_t _get_field64(void *buf, int base_offs, ib_field_t *f); - -/* mad.c */ -static inline uint32_t -mad_get_field(void *buf, int base_offs, int field) -{ - return _get_field(buf, base_offs, ib_mad_f + field); -} - -static inline void -mad_set_field(void *buf, int base_offs, int field, uint32_t val) -{ - _set_field(buf, base_offs, ib_mad_f + field, val); -} - +uint32_t mad_get_field(void *buf, int base_offs, int field); +void mad_set_field(void *buf, int base_offs, int field, uint32_t val); /* field must be byte aligned */ -static inline uint64_t -mad_get_field64(void *buf, int base_offs, int field) -{ - return _get_field64(buf, base_offs, ib_mad_f + field); -} - -static inline void -mad_set_field64(void *buf, int base_offs, int field, uint64_t val) -{ - _set_field64(buf, base_offs, ib_mad_f + field, val); -} - -static inline void -mad_set_array(void *buf, int base_offs, int field, void *val) -{ - _set_array(buf, base_offs, ib_mad_f + field, val); -} - -static inline void -mad_get_array(void *buf, int base_offs, int field, void *val) -{ - _get_array(buf, base_offs, ib_mad_f + field, val); -} +uint64_t mad_get_field64(void *buf, int base_offs, int field); +void mad_set_field64(void *buf, int base_offs, int field, uint64_t val); +void mad_set_array(void *buf, int base_offs, int field, void *val); +void mad_get_array(void *buf, int base_offs, int field, void *val); +void mad_decode_field(uint8_t *buf, int field, void *val); +void mad_encode_field(uint8_t *buf, int field, void *val); +int mad_print_field(int field, const char *name, void *val); +char *mad_dump_field(int field, char *buf, int bufsz, void *val); +char *mad_dump_val(int field, char *buf, int bufsz, void *val); -void mad_decode_field(uint8_t *buf, int field, void *val); -void mad_encode_field(uint8_t *buf, int field, void *val); -void * mad_encode(void *buf, ib_rpc_t *rpc, ib_dr_path_t *drpath, void *data); +/* mad.c */ +void *mad_encode(void *buf, ib_rpc_t *rpc, ib_dr_path_t *drpath, void *data); uint64_t mad_trid(void); -int mad_build_pkt(void *umad, ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, void *data); +int mad_build_pkt(void *umad, ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, void *data); /* register.c */ int mad_register_port_client(int port_id, int mgmt, uint8_t rmpp_version); @@ -868,36 +833,6 @@ ib_mad_dump_fn mad_dump_nodedesc, mad_dump_nodeinfo, mad_dump_portinfo, mad_dump_switchinfo, mad_dump_perfcounters, mad_dump_perfcounters_ext; -int _mad_dump(ib_mad_dump_fn *fn, char *name, void *val, int valsz); -char * _mad_dump_field(ib_field_t *f, char *name, char *buf, int bufsz, - void *val); -int _mad_print_field(ib_field_t *f, char *name, void *val, int valsz); -char * _mad_dump_val(ib_field_t *f, char *buf, int bufsz, void *val); - -static inline int -mad_print_field(int field, char *name, void *val) -{ - if (field <= IB_NO_FIELD || field >= IB_FIELD_LAST_) - return -1; - return _mad_print_field(ib_mad_f + field, name, val, 0); -} - -static inline char * -mad_dump_field(int field, char *buf, int bufsz, void *val) -{ - if (field <= IB_NO_FIELD || field >= IB_FIELD_LAST_) - return 0; - return _mad_dump_field(ib_mad_f + field, 0, buf, bufsz, val); -} - -static inline char * -mad_dump_val(int field, char *buf, int bufsz, void *val) -{ - if (field <= IB_NO_FIELD || field >= IB_FIELD_LAST_) - return 0; - return _mad_dump_val(ib_mad_f + field, buf, bufsz, val); -} - extern int ibdebug; END_C_DECLS diff --git a/libibmad/src/dump.c b/libibmad/src/dump.c index 052127f..49bb34b 100644 --- a/libibmad/src/dump.c +++ b/libibmad/src/dump.c @@ -729,54 +729,3 @@ mad_dump_perfcounters_ext(char *buf, int bufsz, void *val, int valsz) { _dump_fields(buf, bufsz, val, IB_PC_EXT_FIRST_F, IB_PC_EXT_LAST_F); } - -/************************/ - -char * -_mad_dump_val(ib_field_t *f, char *buf, int bufsz, void *val) -{ - f->def_dump_fn(buf, bufsz, val, ALIGN(f->bitlen, 8) / 8); - buf[bufsz - 1] = 0; - - return buf; -} - -char * -_mad_dump_field(ib_field_t *f, char *name, char *buf, int bufsz, void *val) -{ - char dots[128]; - int l, n; - - if (bufsz <= 32) - return 0; /* buf too small */ - - if (!name) - name = f->name; - - l = strlen(name); - if (l < 32) { - memset(dots, '.', 32 - l); - dots[32 - l] = 0; - } - - n = snprintf(buf, bufsz, "%s:%s", name, dots); - _mad_dump_val(f, buf + n, bufsz - n, val); - buf[bufsz - 1] = 0; - - return buf; -} - -int -_mad_dump(ib_mad_dump_fn *fn, char *name, void *val, int valsz) -{ - ib_field_t f = { .def_dump_fn = fn, .bitlen = valsz * 8}; - char buf[512]; - - return printf("%s\n", _mad_dump_field(&f, name, buf, sizeof buf, val)); -} - -int -_mad_print_field(ib_field_t *f, char *name, void *val, int valsz) -{ - return _mad_dump(f->def_dump_fn, name ? name : f->name, val, valsz ? valsz : ALIGN(f->bitlen, 8) / 8); -} diff --git a/libibmad/src/fields.c b/libibmad/src/fields.c index 6942e85..ffbfc76 100644 --- a/libibmad/src/fields.c +++ b/libibmad/src/fields.c @@ -54,7 +54,7 @@ #define BE_OFFS(o, w) (o), (w) #define BE_TO_BITSOFFS(o, w) (((o) & ~31) | ((32 - ((o) & 31) - (w)))) -ib_field_t ib_mad_f [] = { +static const ib_field_t ib_mad_f [] = { [0] {0, 0}, /* IB_NO_FIELD - reserved as invalid */ [IB_GID_PREFIX_F] {0, 64, "GidPrefix", mad_dump_rhex}, @@ -363,8 +363,7 @@ ib_field_t ib_mad_f [] = { }; -void -_set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val) +static void _set_field64(void *buf, int base_offs, const ib_field_t *f, uint64_t val) { uint64_t nval; @@ -372,16 +371,14 @@ _set_field64(void *buf, int base_offs, ib_field_t *f, uint64_t val) memcpy((char *)buf + base_offs + f->bitoffs / 8, &nval, sizeof(uint64_t)); } -uint64_t -_get_field64(void *buf, int base_offs, ib_field_t *f) +static uint64_t _get_field64(void *buf, int base_offs, const ib_field_t *f) { uint64_t val; memcpy(&val, ((char *)buf + base_offs + f->bitoffs / 8), sizeof(uint64_t)); return ntohll(val); } -void -_set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val) +static void _set_field(void *buf, int base_offs, const ib_field_t *f, uint32_t val) { int prebits = (8 - (f->bitoffs & 7)) & 7; int postbits = (f->bitoffs + f->bitlen) & 7; @@ -411,8 +408,7 @@ _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val) } } -uint32_t -_get_field(void *buf, int base_offs, ib_field_t *f) +static uint32_t _get_field(void *buf, int base_offs, const ib_field_t *f) { int prebits = (8 - (f->bitoffs & 7)) & 7; int postbits = (f->bitoffs + f->bitlen) & 7; @@ -440,8 +436,7 @@ _get_field(void *buf, int base_offs, ib_field_t *f) } /* field must be byte aligned */ -void -_set_array(void *buf, int base_offs, ib_field_t *f, void *val) +static void _set_array(void *buf, int base_offs, const ib_field_t *f, void *val) { int bitoffs = f->bitoffs; @@ -451,8 +446,7 @@ _set_array(void *buf, int base_offs, ib_field_t *f, void *val) memcpy((uint8_t *)buf + base_offs + bitoffs / 8, val, f->bitlen / 8); } -void -_get_array(void *buf, int base_offs, ib_field_t *f, void *val) +static void _get_array(void *buf, int base_offs, const ib_field_t *f, void *val) { int bitoffs = f->bitoffs; @@ -461,3 +455,139 @@ _get_array(void *buf, int base_offs, ib_field_t *f, void *val) memcpy(val, (uint8_t *)buf + base_offs + bitoffs / 8, f->bitlen / 8); } + +uint32_t mad_get_field(void *buf, int base_offs, int field) +{ + return _get_field(buf, base_offs, ib_mad_f + field); +} + +void mad_set_field(void *buf, int base_offs, int field, uint32_t val) +{ + _set_field(buf, base_offs, ib_mad_f + field, val); +} + +uint64_t mad_get_field64(void *buf, int base_offs, int field) +{ + return _get_field64(buf, base_offs, ib_mad_f + field); +} + +void mad_set_field64(void *buf, int base_offs, int field, uint64_t val) +{ + _set_field64(buf, base_offs, ib_mad_f + field, val); +} + +void mad_set_array(void *buf, int base_offs, int field, void *val) +{ + _set_array(buf, base_offs, ib_mad_f + field, val); +} + +void mad_get_array(void *buf, int base_offs, int field, void *val) +{ + _get_array(buf, base_offs, ib_mad_f + field, val); +} + +void mad_decode_field(uint8_t *buf, int field, void *val) +{ + const ib_field_t *f = ib_mad_f + field; + + if (!field) { + *(int *)val = *(int *)buf; + return; + } + if (f->bitlen <= 32) { + *(uint32_t *)val = _get_field(buf, 0, f); + return; + } + if (f->bitlen == 64) { + *(uint64_t *)val = _get_field64(buf, 0, f); + return; + } + _get_array(buf, 0, f, val); +} + +void mad_encode_field(uint8_t *buf, int field, void *val) +{ + const ib_field_t *f = ib_mad_f + field; + + if (!field) { + *(int *)buf = *(int *)val; + return; + } + if (f->bitlen <= 32) { + _set_field(buf, 0, f, *(uint32_t *)val); + return; + } + if (f->bitlen == 64) { + _set_field64(buf, 0, f, *(uint64_t *)val); + return; + } + _set_array(buf, 0, f, val); +} + +/************************/ + +static char *_mad_dump_val(const ib_field_t *f, char *buf, int bufsz, void *val) +{ + f->def_dump_fn(buf, bufsz, val, ALIGN(f->bitlen, 8) / 8); + buf[bufsz - 1] = 0; + + return buf; +} + +static char *_mad_dump_field(const ib_field_t *f, const char *name, char *buf, int bufsz, void *val) +{ + char dots[128]; + int l, n; + + if (bufsz <= 32) + return 0; /* buf too small */ + + if (!name) + name = f->name; + + l = strlen(name); + if (l < 32) { + memset(dots, '.', 32 - l); + dots[32 - l] = 0; + } + + n = snprintf(buf, bufsz, "%s:%s", name, dots); + _mad_dump_val(f, buf + n, bufsz - n, val); + buf[bufsz - 1] = 0; + + return buf; +} + +static int _mad_dump(ib_mad_dump_fn *fn, const char *name, void *val, int valsz) +{ + ib_field_t f = { .def_dump_fn = fn, .bitlen = valsz * 8}; + char buf[512]; + + return printf("%s\n", _mad_dump_field(&f, name, buf, sizeof buf, val)); +} + +static int _mad_print_field(const ib_field_t *f, const char *name, void *val, int valsz) +{ + return _mad_dump(f->def_dump_fn, name ? name : f->name, val, valsz ? valsz : ALIGN(f->bitlen, 8) / 8); +} + +int mad_print_field(int field, const char *name, void *val) +{ + if (field <= IB_NO_FIELD || field >= IB_FIELD_LAST_) + return -1; + return _mad_print_field(ib_mad_f + field, name, val, 0); +} + +char *mad_dump_field(int field, char *buf, int bufsz, void *val) +{ + if (field <= IB_NO_FIELD || field >= IB_FIELD_LAST_) + return 0; + return _mad_dump_field(ib_mad_f + field, 0, buf, bufsz, val); +} + +char *mad_dump_val(int field, char *buf, int bufsz, void *val) +{ + if (field <= IB_NO_FIELD || field >= IB_FIELD_LAST_) + return 0; + return _mad_dump_val(ib_mad_f + field, buf, bufsz, val); +} diff --git a/libibmad/src/libibmad.map b/libibmad/src/libibmad.map index f26d28d..ea1ed4b 100644 --- a/libibmad/src/libibmad.map +++ b/libibmad/src/libibmad.map @@ -1,9 +1,8 @@ IBMAD_1.3 { global: - _mad_dump; - _mad_dump_field; - _mad_dump_val; - _mad_print_field; + mad_dump_field; + mad_dump_val; + mad_print_field; mad_dump_array; mad_dump_bitfield; mad_dump_hex; @@ -34,13 +33,12 @@ IBMAD_1.3 { mad_dump_uint; mad_dump_vlarbitration; mad_dump_vlcap; - _get_array; - _get_field; - _get_field64; - _set_array; - _set_field; - _set_field64; - ib_mad_f; + mad_get_field; + mad_set_field; + mad_get_field64; + mad_set_field64; + mad_get_array; + mad_set_array; perf_classportinfo_query; port_performance_query; port_performance_reset; diff --git a/libibmad/src/mad.c b/libibmad/src/mad.c index 1367ecd..fc73a7a 100644 --- a/libibmad/src/mad.c +++ b/libibmad/src/mad.c @@ -49,46 +49,6 @@ #undef DEBUG #define DEBUG if (ibdebug) IBWARN -void -mad_decode_field(uint8_t *buf, int field, void *val) -{ - ib_field_t *f = ib_mad_f + field; - - if (!field) { - *(int *)val = *(int *)buf; - return; - } - if (f->bitlen <= 32) { - *(uint32_t *)val = _get_field(buf, 0, f); - return; - } - if (f->bitlen == 64) { - *(uint64_t *)val = _get_field64(buf, 0, f); - return; - } - _get_array(buf, 0, f, val); -} - -void -mad_encode_field(uint8_t *buf, int field, void *val) -{ - ib_field_t *f = ib_mad_f + field; - - if (!field) { - *(int *)buf = *(int *)val; - return; - } - if (f->bitlen <= 32) { - _set_field(buf, 0, f, *(uint32_t *)val); - return; - } - if (f->bitlen == 64) { - _set_field64(buf, 0, f, *(uint64_t *)val); - return; - } - _set_array(buf, 0, f, val); -} - uint64_t mad_trid(void) { -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sun Dec 21 15:33:35 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 Dec 2008 01:33:35 +0200 Subject: [ofa-general] ***SPAM*** [PATCH] management: move sysfs()_* function to libibumad In-Reply-To: References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> <000701c96185$e1c68c00$435a180a@amr.corp.intel.com> <000001c96207$00f55120$ae58180a@amr.corp.intel.com> Message-ID: <20081221233325.GH28259@sashak.voltaire.com> Move sysfs() stuff to libibumad tree. Now libibumad doesn't depend from libibcommon anymore - remove this dependency. Signed-off-by: Sasha Khapyorsky --- On 13:28 Fri 19 Dec , Hal Rosenstock wrote: > On Fri, Dec 19, 2008 at 1:24 PM, Sean Hefty wrote: > >>Sasha has written some comments on the general list indicating that > >>libibcommon may disappear in the future. It could be combined with > >>libibumad (as libibmad (and diags) both use it. > > > > I'd much rather see it integrated with libibmad than libibumad. libibumad is > > fairly self-contained as the interface into the kernel. > > That's fine as long as things that use libibumad don't now require > libibmad as an additional dependency for something that was in > libibcommon. I don't know if there's anything like that. xdump > wouldn't cause that. I haven't looked at the rest of libibcommon and > the implications. After this patch libibumad doesn't depend from libibcommon anymore. I think we can just merge libibmad and libibcommon and eventually to remove libibcommon at all. infiniband-diags/configure.in | 4 +- infiniband-diags/src/mcm_rereg_test.c | 1 + libibcommon/Makefile.am | 2 +- libibcommon/include/infiniband/common.h | 7 -- libibcommon/libibcommon.ver | 2 +- libibcommon/src/libibcommon.map | 5 - libibcommon/src/sysfs.c | 167 ------------------------------- libibmad/configure.in | 4 +- libibmad/src/gs.c | 1 + libibmad/src/register.c | 1 + libibmad/src/rpc.c | 1 + libibumad/Makefile.am | 2 +- libibumad/configure.in | 13 --- libibumad/include/infiniband/umad.h | 1 - libibumad/libibumad.spec.in | 4 +- libibumad/src/sysfs.c | 158 +++++++++++++++++++++++++++++ libibumad/src/umad.c | 9 ++ 17 files changed, 180 insertions(+), 202 deletions(-) delete mode 100644 libibcommon/src/sysfs.c create mode 100644 libibumad/src/sysfs.c diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in index d8524f4..17204a4 100644 --- a/infiniband-diags/configure.in +++ b/infiniband-diags/configure.in @@ -32,8 +32,8 @@ AC_PROG_LIBTOOL if test "$disable_libcheck" != "yes" then dnl Checks for libraries -AC_CHECK_LIB(ibcommon, sys_read_string, [], - AC_MSG_ERROR([sys_read_string() not found. diags require libibcommon.])) +AC_CHECK_LIB(ibcommon, ibpanic, [], + AC_MSG_ERROR([ibpanic() not found. diags require libibcommon.])) AC_CHECK_LIB(ibumad, umad_init, [], AC_MSG_ERROR([umad_init() not found. diags require libibumad.])) AC_CHECK_LIB(ibmad, mad_dump_int, [], diff --git a/infiniband-diags/src/mcm_rereg_test.c b/infiniband-diags/src/mcm_rereg_test.c index 9285b95..0ba9901 100644 --- a/infiniband-diags/src/mcm_rereg_test.c +++ b/infiniband-diags/src/mcm_rereg_test.c @@ -36,6 +36,7 @@ #include #include +#include #include #include diff --git a/libibcommon/Makefile.am b/libibcommon/Makefile.am index 75889f4..00e5bc8 100644 --- a/libibcommon/Makefile.am +++ b/libibcommon/Makefile.am @@ -13,7 +13,7 @@ else libibcommon_version_script = endif -libibcommon_la_SOURCES = src/stack.c src/sysfs.c src/util.c src/time.c src/hash.c +libibcommon_la_SOURCES = src/stack.c src/util.c src/time.c src/hash.c libibcommon_la_LDFLAGS = -version-info $(ibcommon_api_version) \ -export-dynamic $(libibcommon_version_script) libibcommon_la_DEPENDENCIES = $(srcdir)/src/libibcommon.map diff --git a/libibcommon/include/infiniband/common.h b/libibcommon/include/infiniband/common.h index 01fc796..287703f 100644 --- a/libibcommon/include/infiniband/common.h +++ b/libibcommon/include/infiniband/common.h @@ -121,13 +121,6 @@ void logmsg(const char *const fn, char *msg, ...) IBCOMMON_STRICT_FORMAT; void xdump(FILE *file, char *msg, void *p, int size); -/* sysfs.c: /sys utilities */ -int sys_read_string(char *dir_name, char *file_name, char *str, int max_len); -int sys_read_guid(char *dir_name, char *file_name, uint64_t *net_guid); -int sys_read_gid(char *dir_name, char *file_name, uint8_t *gid); -int sys_read_uint64(char *dir_name, char *file_name, uint64_t *u); -int sys_read_uint(char *dir_name, char *file_name, unsigned *u); - /* stack.c */ void stack_dump(void); void enable_stack_dump(int loop); diff --git a/libibcommon/libibcommon.ver b/libibcommon/libibcommon.ver index 7b88f1b..22b16d9 100644 --- a/libibcommon/libibcommon.ver +++ b/libibcommon/libibcommon.ver @@ -6,4 +6,4 @@ # API_REV - advance on any added API # RUNNING_REV - advance any change to the vendor files # AGE - number of backward versions the API still supports -LIBVERSION=1:0:0 +LIBVERSION=2:1:1 diff --git a/libibcommon/src/libibcommon.map b/libibcommon/src/libibcommon.map index 96ce2d8..f1f693a 100644 --- a/libibcommon/src/libibcommon.map +++ b/libibcommon/src/libibcommon.map @@ -2,11 +2,6 @@ IBCOMMON_1.0 { global: enable_stack_dump; stack_dump; - sys_read_gid; - sys_read_guid; - sys_read_string; - sys_read_uint; - sys_read_uint64; getcurrenttime; fhash; logmsg; diff --git a/libibcommon/src/sysfs.c b/libibcommon/src/sysfs.c deleted file mode 100644 index 3c23010..0000000 --- a/libibcommon/src/sysfs.c +++ /dev/null @@ -1,167 +0,0 @@ -/* - * Copyright (c) 2004-2008 Voltaire Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -#define _GNU_SOURCE - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "common.h" - -static int -ret_code(void) -{ - int e = errno; - - if (e > 0) - return -e; - return e; -} - -int -sys_read_string(char *dir_name, char *file_name, char *str, int max_len) -{ - char path[256], *s; - int fd, r; - - snprintf(path, sizeof(path), "%s/%s", dir_name, file_name); - - if ((fd = open(path, O_RDONLY)) < 0) - return ret_code(); - - if ((r = read(fd, str, max_len)) < 0) { - int e = errno; - close(fd); - errno = e; - return ret_code(); - } - - str[(r < max_len) ? r : max_len - 1] = 0; - - if ((s = strrchr(str, '\n'))) - *s = 0; - - close(fd); - return 0; -} - -int -sys_read_guid(char *dir_name, char *file_name, uint64_t *net_guid) -{ - char buf[32], *str, *s; - uint64_t guid; - int r, i; - - if ((r = sys_read_string(dir_name, file_name, buf, sizeof(buf))) < 0) - return r; - - guid = 0; - - for (s = buf, i = 0 ; i < 4; i++) { - if (!(str = strsep(&s, ": \t\n"))) - return -EINVAL; - guid = (guid << 16) | (strtoul(str, 0, 16) & 0xffff); - } - - *net_guid = htonll(guid); - - return 0; -} - -int -sys_read_gid(char *dir_name, char *file_name, uint8_t *gid) -{ - char buf[64], *str, *s; - uint16_t *ugid = (uint16_t *)gid; - int r, i; - - if ((r = sys_read_string(dir_name, file_name, buf, sizeof(buf))) < 0) - return r; - - for (s = buf, i = 0 ; i < 8; i++) { - if (!(str = strsep(&s, ": \t\n"))) - return -EINVAL; - ugid[i] = htons(strtoul(str, 0, 16) & 0xffff); - } - - return 0; -} - -int -sys_read_uint64(char *dir_name, char *file_name, uint64_t *u) -{ - char buf[32]; - int r; - - if ((r = sys_read_string(dir_name, file_name, buf, sizeof(buf))) < 0) - return r; - - *u = strtoull(buf, 0, 0); - - return 0; -} - -int -sys_read_uint(char *dir_name, char *file_name, unsigned *u) -{ - char buf[32]; - int r; - - if ((r = sys_read_string(dir_name, file_name, buf, sizeof(buf))) < 0) - return r; - - *u = strtoul(buf, 0, 0); - - return 0; -} diff --git a/libibmad/configure.in b/libibmad/configure.in index 3d8e73d..22ea5ef 100644 --- a/libibmad/configure.in +++ b/libibmad/configure.in @@ -31,8 +31,8 @@ AC_PROG_CC dnl Checks for libraries if test "$disable_libcheck" != "yes" then -AC_CHECK_LIB(ibcommon, sys_read_string, [], - AC_MSG_ERROR([sys_read_string() not found. libibmad requires libibcommon.])) +AC_CHECK_LIB(ibcommon, ibpanic, [], + AC_MSG_ERROR([ibpanic() not found. libibmad requires libibcommon.])) AC_CHECK_LIB(ibumad, umad_init, [], AC_MSG_ERROR([umad_init() not found. libibmad requires libibumad.])) fi diff --git a/libibmad/src/gs.c b/libibmad/src/gs.c index 89c927e..cade54b 100644 --- a/libibmad/src/gs.c +++ b/libibmad/src/gs.c @@ -42,6 +42,7 @@ #include #include +#include #include #include "mad.h" diff --git a/libibmad/src/register.c b/libibmad/src/register.c index a33acd8..8e59e6e 100644 --- a/libibmad/src/register.c +++ b/libibmad/src/register.c @@ -43,6 +43,7 @@ #include #include +#include #include #include "mad.h" diff --git a/libibmad/src/rpc.c b/libibmad/src/rpc.c index df28f65..34a6b9a 100644 --- a/libibmad/src/rpc.c +++ b/libibmad/src/rpc.c @@ -43,6 +43,7 @@ #include #include +#include #include #include "mad.h" diff --git a/libibumad/Makefile.am b/libibumad/Makefile.am index 1e3e6fd..50222df 100644 --- a/libibumad/Makefile.am +++ b/libibumad/Makefile.am @@ -27,7 +27,7 @@ else libibumad_version_script = endif -libibumad_la_SOURCES = src/umad.c +libibumad_la_SOURCES = src/umad.c src/sysfs.c libibumad_la_LDFLAGS = -version-info $(ibumad_api_version) \ -export-dynamic $(libibumad_version_script) libibumad_la_DEPENDENCIES = $(srcdir)/src/libibumad.map diff --git a/libibumad/configure.in b/libibumad/configure.in index ad3afcd..3a08771 100644 --- a/libibumad/configure.in +++ b/libibumad/configure.in @@ -44,23 +44,10 @@ AC_PROG_LN_S AC_PROG_MAKE_SET AM_PROG_LIBTOOL -if test "$disable_libcheck" != "yes" -then -dnl Checks for libraries -AC_CHECK_LIB(ibcommon, sys_read_string, [], - AC_MSG_ERROR([sys_read_string() not found. libibumad requires libibcommon.])) -fi - dnl Checks for header files. AC_HEADER_DIRENT AC_HEADER_STDC AC_CHECK_HEADERS([fcntl.h netinet/in.h stdlib.h string.h sys/ioctl.h unistd.h]) -if test "$disable_libcheck" != "yes" -then -AC_CHECK_HEADER(infiniband/common.h, [], - AC_MSG_ERROR([ not found. libibumad requires libibcommon.]) -) -fi dnl Checks for library functions AC_PROG_GCC_TRADITIONAL diff --git a/libibumad/include/infiniband/umad.h b/libibumad/include/infiniband/umad.h index 7d97c25..91ccf1d 100644 --- a/libibumad/include/infiniband/umad.h +++ b/libibumad/include/infiniband/umad.h @@ -34,7 +34,6 @@ #define _UMAD_H #include -#include #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { diff --git a/libibumad/libibumad.spec.in b/libibumad/libibumad.spec.in index 1b11d18..7732edd 100644 --- a/libibumad/libibumad.spec.in +++ b/libibumad/libibumad.spec.in @@ -13,7 +13,7 @@ Source: http://www.openfabrics.org/downloads/management/@TARBALL@ Url: http://openfabrics.org Requires(post): /sbin/ldconfig Requires(postun): /sbin/ldconfig -BuildRequires: libibcommon-devel, libtool +BuildRequires: libtool %description libibumad provides the user MAD library functions which sit on top of @@ -23,7 +23,7 @@ and management tools, including OpenSM. %package devel Summary: Development files for the libibumad library Group: System Environment/Libraries -Requires: %{name} = %{version}-%{release} libibcommon-devel +Requires: %{name} = %{version}-%{release} Requires(post): /sbin/ldconfig Requires(postun): /sbin/ldconfig diff --git a/libibumad/src/sysfs.c b/libibumad/src/sysfs.c new file mode 100644 index 0000000..af5545e --- /dev/null +++ b/libibumad/src/sysfs.c @@ -0,0 +1,158 @@ +/* + * Copyright (c) 2004-2008 Voltaire Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#define _GNU_SOURCE + +#if HAVE_CONFIG_H +# include +#endif /* HAVE_CONFIG_H */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#if __BYTE_ORDER == __LITTLE_ENDIAN +#define htonll(x) bswap_64(x) +#else +#define htonll(x) (x) +#endif + +static int ret_code(void) +{ + int e = errno; + + if (e > 0) + return -e; + return e; +} + +int sys_read_string(char *dir_name, char *file_name, char *str, int max_len) +{ + char path[256], *s; + int fd, r; + + snprintf(path, sizeof(path), "%s/%s", dir_name, file_name); + + if ((fd = open(path, O_RDONLY)) < 0) + return ret_code(); + + if ((r = read(fd, str, max_len)) < 0) { + int e = errno; + close(fd); + errno = e; + return ret_code(); + } + + str[(r < max_len) ? r : max_len - 1] = 0; + + if ((s = strrchr(str, '\n'))) + *s = 0; + + close(fd); + return 0; +} + +int sys_read_guid(char *dir_name, char *file_name, uint64_t *net_guid) +{ + char buf[32], *str, *s; + uint64_t guid; + int r, i; + + if ((r = sys_read_string(dir_name, file_name, buf, sizeof(buf))) < 0) + return r; + + guid = 0; + + for (s = buf, i = 0 ; i < 4; i++) { + if (!(str = strsep(&s, ": \t\n"))) + return -EINVAL; + guid = (guid << 16) | (strtoul(str, 0, 16) & 0xffff); + } + + *net_guid = htonll(guid); + + return 0; +} + +int sys_read_gid(char *dir_name, char *file_name, uint8_t *gid) +{ + char buf[64], *str, *s; + uint16_t *ugid = (uint16_t *)gid; + int r, i; + + if ((r = sys_read_string(dir_name, file_name, buf, sizeof(buf))) < 0) + return r; + + for (s = buf, i = 0 ; i < 8; i++) { + if (!(str = strsep(&s, ": \t\n"))) + return -EINVAL; + ugid[i] = htons(strtoul(str, 0, 16) & 0xffff); + } + + return 0; +} + +int sys_read_uint64(char *dir_name, char *file_name, uint64_t *u) +{ + char buf[32]; + int r; + + if ((r = sys_read_string(dir_name, file_name, buf, sizeof(buf))) < 0) + return r; + + *u = strtoull(buf, 0, 0); + + return 0; +} + +int sys_read_uint(char *dir_name, char *file_name, unsigned *u) +{ + char buf[32]; + int r; + + if ((r = sys_read_string(dir_name, file_name, buf, sizeof(buf))) < 0) + return r; + + *u = strtoul(buf, 0, 0); + + return 0; +} diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c index 3713ffe..c233de9 100644 --- a/libibumad/src/umad.c +++ b/libibumad/src/umad.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -76,6 +77,14 @@ typedef struct ib_user_mad_reg_req { uint8_t rmpp_version; } ib_user_mad_reg_req_t; +extern int sys_read_string(char *dir_name, char *file_name, char *str, int len); +extern int sys_read_guid(char *dir_name, char *file_name, uint64_t *net_guid); +extern int sys_read_gid(char *dir_name, char *file_name, uint8_t *gid); +extern int sys_read_uint64(char *dir_name, char *file_name, uint64_t *u); +extern int sys_read_uint(char *dir_name, char *file_name, unsigned *u); + +#define IBWARN(fmt, args...) fprintf(stdout, "ibwarn: [%d] %s: " fmt, getpid(), __func__, ## args) + #define TRACE if (umaddebug) IBWARN #define DEBUG if (umaddebug) IBWARN -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sun Dec 21 15:34:24 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 Dec 2008 01:34:24 +0200 Subject: [ofa-general] [PATCH] opensm: remove libibcommon build dependencies In-Reply-To: <20081221233325.GH28259@sashak.voltaire.com> References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> <000701c96185$e1c68c00$435a180a@amr.corp.intel.com> <000001c96207$00f55120$ae58180a@amr.corp.intel.com> <20081221233325.GH28259@sashak.voltaire.com> Message-ID: <20081221233424.GI28259@sashak.voltaire.com> libibumad doesn't depend from libibcommon anymore - so remove this build dependency from OpenSM too. Signed-off-by: Sasha Khapyorsky --- opensm/config/osmvsel.m4 | 4 ++-- opensm/include/vendor/osm_vendor_ibumad.h | 1 - 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/opensm/config/osmvsel.m4 b/opensm/config/osmvsel.m4 index c7798cc..c24930b 100644 --- a/opensm/config/osmvsel.m4 +++ b/opensm/config/osmvsel.m4 @@ -64,8 +64,8 @@ with_sim="/usr") dnl based on the with_osmv we can try the vendor flag if test $with_osmv = "openib"; then AC_DEFINE(OSM_VENDOR_INTF_OPENIB, 1, [Define as 1 for OpenIB vendor]) - OSMV_INCLUDES="-I\$(srcdir)/../include -I\$(srcdir)/../../libibcommon/include -I\$(srcdir)/../../libibumad/include -I\$(includedir)" - OSMV_LDADD="-L\$(abs_srcdir)/../../libibumad/.libs -L\$(abs_srcdir)/../../libibcommon/.libs -L\$(libdir) -libumad -libcommon" + OSMV_INCLUDES="-I\$(srcdir)/../include -I\$(srcdir)/../../libibumad/include -I\$(includedir)" + OSMV_LDADD="-L\$(abs_srcdir)/../../libibumad/.libs -L\$(libdir) -libumad" if test "x$with_umad_libs" != "x"; then OSMV_LDADD="-L$with_umad_libs $OSMV_LDADD" diff --git a/opensm/include/vendor/osm_vendor_ibumad.h b/opensm/include/vendor/osm_vendor_ibumad.h index 3a3f070..e346a2e 100644 --- a/opensm/include/vendor/osm_vendor_ibumad.h +++ b/opensm/include/vendor/osm_vendor_ibumad.h @@ -41,7 +41,6 @@ #include #include -#include #include #ifdef __cplusplus -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sun Dec 21 16:49:45 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 Dec 2008 02:49:45 +0200 Subject: [ofa-general] ***SPAM*** [PATCH] management: remove libibcommon dependencies In-Reply-To: <20081221233424.GI28259@sashak.voltaire.com> References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> <000701c96185$e1c68c00$435a180a@amr.corp.intel.com> <000001c96207$00f55120$ae58180a@amr.corp.intel.com> <20081221233325.GH28259@sashak.voltaire.com> <20081221233424.GI28259@sashak.voltaire.com> Message-ID: <20081222004945.GJ28259@sashak.voltaire.com> Remove libibcommon dependencies from libibmad and infiniband-diags. Actually libibcommon is not used anymore in management tree. Macros IBWARN(), IBPANIC(), ALIGN() and functions htonll(), htohll() and xdump() are part of libibmad now. Signed-off-by: Sasha Khapyorsky --- infiniband-diags/configure.in | 5 --- infiniband-diags/infiniband-diags.spec.in | 2 +- infiniband-diags/src/grouping.c | 1 - infiniband-diags/src/ibaddr.c | 1 - infiniband-diags/src/ibnetdiscover.c | 1 - infiniband-diags/src/ibping.c | 10 ++++++- infiniband-diags/src/ibportstate.c | 1 - infiniband-diags/src/ibroute.c | 1 - infiniband-diags/src/ibstat.c | 1 - infiniband-diags/src/ibsysstat.c | 1 - infiniband-diags/src/ibtracert.c | 1 - infiniband-diags/src/mcm_rereg_test.c | 1 - infiniband-diags/src/perfquery.c | 1 - infiniband-diags/src/sminfo.c | 1 - infiniband-diags/src/smpdump.c | 1 - infiniband-diags/src/smpquery.c | 1 - infiniband-diags/src/vendstat.c | 1 - libibmad/configure.in | 5 --- libibmad/include/infiniband/mad.h | 44 +++++++++++++++++++++++++++++ libibmad/libibmad.ver | 2 +- libibmad/src/dump.c | 28 +++++++++++++++++- libibmad/src/fields.c | 1 - libibmad/src/gs.c | 1 - libibmad/src/libibmad.map | 1 + libibmad/src/mad.c | 1 - libibmad/src/portid.c | 1 - libibmad/src/register.c | 1 - libibmad/src/resolve.c | 1 - libibmad/src/rpc.c | 1 - libibmad/src/sa.c | 1 - libibmad/src/serv.c | 1 - libibmad/src/smp.c | 1 - libibmad/src/vendor.c | 1 - 33 files changed, 83 insertions(+), 39 deletions(-) diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in index 17204a4..58eea0a 100644 --- a/infiniband-diags/configure.in +++ b/infiniband-diags/configure.in @@ -32,8 +32,6 @@ AC_PROG_LIBTOOL if test "$disable_libcheck" != "yes" then dnl Checks for libraries -AC_CHECK_LIB(ibcommon, ibpanic, [], - AC_MSG_ERROR([ibpanic() not found. diags require libibcommon.])) AC_CHECK_LIB(ibumad, umad_init, [], AC_MSG_ERROR([umad_init() not found. diags require libibumad.])) AC_CHECK_LIB(ibmad, mad_dump_int, [], @@ -53,9 +51,6 @@ AC_HEADER_STDC AC_CHECK_HEADERS([stdlib.h string.h unistd.h fcntl.h inttypes.h netinet/in.h sys/ioctl.h syslog.h]) if test "$disable_libcheck" != "yes" then -AC_CHECK_HEADER(infiniband/common.h, [], - AC_MSG_ERROR([ not found. diags require libibcommon.]) -) AC_CHECK_HEADER(infiniband/umad.h, [], AC_MSG_ERROR([ not found. diags require libibumad.]) ) diff --git a/infiniband-diags/infiniband-diags.spec.in b/infiniband-diags/infiniband-diags.spec.in index 9c8c0c4..3791eb4 100644 --- a/infiniband-diags/infiniband-diags.spec.in +++ b/infiniband-diags/infiniband-diags.spec.in @@ -11,7 +11,7 @@ Group: System Environment/Libraries BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) Source: http://www.openfabrics.org/downloads/management/@TARBALL@ Url: http://openfabrics.org/ -BuildRequires: libibmad-devel, opensm-devel, libibcommon-devel, libibumad-devel +BuildRequires: libibmad-devel, opensm-devel, libibumad-devel Provides: perl(IBswcountlimits) Obsoletes: openib-diags diff --git a/infiniband-diags/src/grouping.c b/infiniband-diags/src/grouping.c index f1a996f..94ab859 100644 --- a/infiniband-diags/src/grouping.c +++ b/infiniband-diags/src/grouping.c @@ -44,7 +44,6 @@ #include #include -#include #include #include "ibnetdiscover.h" diff --git a/infiniband-diags/src/ibaddr.c b/infiniband-diags/src/ibaddr.c index f48a9c9..4d6c50d 100644 --- a/infiniband-diags/src/ibaddr.c +++ b/infiniband-diags/src/ibaddr.c @@ -42,7 +42,6 @@ #include #include -#include #include #include diff --git a/infiniband-diags/src/ibnetdiscover.c b/infiniband-diags/src/ibnetdiscover.c index 2cfaa8a..296cb07 100644 --- a/infiniband-diags/src/ibnetdiscover.c +++ b/infiniband-diags/src/ibnetdiscover.c @@ -47,7 +47,6 @@ #include #include -#include #include #include #include diff --git a/infiniband-diags/src/ibping.c b/infiniband-diags/src/ibping.c index 4fd2dcb..4b99725 100644 --- a/infiniband-diags/src/ibping.c +++ b/infiniband-diags/src/ibping.c @@ -43,8 +43,8 @@ #include #include #include +#include -#include #include #include @@ -60,6 +60,14 @@ static char last_host[IB_VENDOR_RANGE2_DATA_SIZE]; char *argv0 = "ibping"; +static uint64_t getcurrenttime(void) +{ + struct timeval tv; + + gettimeofday(&tv, 0); + return (uint64_t)tv.tv_sec * 1000000 + tv.tv_usec; +} + static void get_host_and_domain(char *data, int sz) { diff --git a/infiniband-diags/src/ibportstate.c b/infiniband-diags/src/ibportstate.c index 36453bb..a82bb14 100644 --- a/infiniband-diags/src/ibportstate.c +++ b/infiniband-diags/src/ibportstate.c @@ -43,7 +43,6 @@ #include #include -#include #include #include diff --git a/infiniband-diags/src/ibroute.c b/infiniband-diags/src/ibroute.c index f2ee170..921b5dd 100644 --- a/infiniband-diags/src/ibroute.c +++ b/infiniband-diags/src/ibroute.c @@ -46,7 +46,6 @@ #include #include -#include #include #include #include diff --git a/infiniband-diags/src/ibstat.c b/infiniband-diags/src/ibstat.c index 5d2113e..6bd3c8a 100644 --- a/infiniband-diags/src/ibstat.c +++ b/infiniband-diags/src/ibstat.c @@ -44,7 +44,6 @@ #include #include -#include #include #include diff --git a/infiniband-diags/src/ibsysstat.c b/infiniband-diags/src/ibsysstat.c index e3d0b9f..d881e5b 100644 --- a/infiniband-diags/src/ibsysstat.c +++ b/infiniband-diags/src/ibsysstat.c @@ -43,7 +43,6 @@ #include #include -#include #include #include diff --git a/infiniband-diags/src/ibtracert.c b/infiniband-diags/src/ibtracert.c index bde0ea7..7a28940 100644 --- a/infiniband-diags/src/ibtracert.c +++ b/infiniband-diags/src/ibtracert.c @@ -46,7 +46,6 @@ #include #include -#include #include #include #include diff --git a/infiniband-diags/src/mcm_rereg_test.c b/infiniband-diags/src/mcm_rereg_test.c index 0ba9901..9285b95 100644 --- a/infiniband-diags/src/mcm_rereg_test.c +++ b/infiniband-diags/src/mcm_rereg_test.c @@ -36,7 +36,6 @@ #include #include -#include #include #include diff --git a/infiniband-diags/src/perfquery.c b/infiniband-diags/src/perfquery.c index 7a53e92..d2c5904 100644 --- a/infiniband-diags/src/perfquery.c +++ b/infiniband-diags/src/perfquery.c @@ -43,7 +43,6 @@ #include #include -#include #include #include diff --git a/infiniband-diags/src/sminfo.c b/infiniband-diags/src/sminfo.c index c811057..bdcdad9 100644 --- a/infiniband-diags/src/sminfo.c +++ b/infiniband-diags/src/sminfo.c @@ -42,7 +42,6 @@ #include #include -#include #include #include diff --git a/infiniband-diags/src/smpdump.c b/infiniband-diags/src/smpdump.c index e26b369..6f7bae2 100644 --- a/infiniband-diags/src/smpdump.c +++ b/infiniband-diags/src/smpdump.c @@ -45,7 +45,6 @@ #include #include -#include #include #include diff --git a/infiniband-diags/src/smpquery.c b/infiniband-diags/src/smpquery.c index ed8ec83..6071245 100644 --- a/infiniband-diags/src/smpquery.c +++ b/infiniband-diags/src/smpquery.c @@ -47,7 +47,6 @@ #define __STDC_FORMAT_MACROS #include -#include #include #include #include diff --git a/infiniband-diags/src/vendstat.c b/infiniband-diags/src/vendstat.c index 0674986..9295898 100644 --- a/infiniband-diags/src/vendstat.c +++ b/infiniband-diags/src/vendstat.c @@ -42,7 +42,6 @@ #include #include -#include #include #include diff --git a/libibmad/configure.in b/libibmad/configure.in index 22ea5ef..e7c2deb 100644 --- a/libibmad/configure.in +++ b/libibmad/configure.in @@ -31,8 +31,6 @@ AC_PROG_CC dnl Checks for libraries if test "$disable_libcheck" != "yes" then -AC_CHECK_LIB(ibcommon, ibpanic, [], - AC_MSG_ERROR([ibpanic() not found. libibmad requires libibcommon.])) AC_CHECK_LIB(ibumad, umad_init, [], AC_MSG_ERROR([umad_init() not found. libibmad requires libibumad.])) fi @@ -42,9 +40,6 @@ AC_HEADER_STDC AC_CHECK_HEADERS([netinet/in.h stdlib.h string.h sys/time.h unistd.h]) if test "$disable_libcheck" != "yes" then -AC_CHECK_HEADER(infiniband/common.h, [], - AC_MSG_ERROR([ not found. libibmad requires libibcommon.]) -) AC_CHECK_HEADER(infiniband/umad.h, [], AC_MSG_ERROR([ not found. libibmad requires libibumad.]) ) diff --git a/libibmad/include/infiniband/mad.h b/libibmad/include/infiniband/mad.h index fd0deff..989e474 100644 --- a/libibmad/include/infiniband/mad.h +++ b/libibmad/include/infiniband/mad.h @@ -35,6 +35,11 @@ #include #include +#include +#include +#include +#include +#include #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { @@ -835,6 +840,45 @@ ib_mad_dump_fn extern int ibdebug; +#if __BYTE_ORDER == __LITTLE_ENDIAN +#ifndef ntohll +static inline uint64_t ntohll(uint64_t x) { + return bswap_64(x); +} +#endif +#ifndef htonll +static inline uint64_t htonll(uint64_t x) { + return bswap_64(x); +} +#endif +#elif __BYTE_ORDER == __BIG_ENDIAN +#ifndef ntohll +static inline uint64_t ntohll(uint64_t x) { + return x; +} +#endif +#ifndef htonll +static inline uint64_t htonll(uint64_t x) { + return x; +} +#endif +#endif /* __BYTE_ORDER == __BIG_ENDIAN */ + +/* Misc. macros: */ +/** align value \a l to \a size (ceil) */ +#define ALIGN(l, size) (((l) + ((size) - 1)) / (size) * (size)) + +/** printf style warning MACRO, includes name of function and pid */ +#define IBWARN(fmt, ...) fprintf(stdout, "ibwarn: [%d] %s: " fmt, getpid(), __func__, ## __VA_ARGS__) + +/** printf style abort MACRO, includes name of function and pid */ +#define IBPANIC(fmt, ...) do { \ + fprintf(stdout, "ibpanic: [%d] %s: " fmt, getpid(), __func__, ## __VA_ARGS__); \ + exit(-1); \ +} while(0) + +void xdump(FILE *file, char *msg, void *p, int size); + END_C_DECLS #endif /* _MAD_H_ */ diff --git a/libibmad/libibmad.ver b/libibmad/libibmad.ver index 51f2b71..7e93c16 100644 --- a/libibmad/libibmad.ver +++ b/libibmad/libibmad.ver @@ -6,4 +6,4 @@ # API_REV - advance on any added API # RUNNING_REV - advance any change to the vendor files # AGE - number of backward versions the API still supports -LIBVERSION=4:0:3 +LIBVERSION=5:0:4 diff --git a/libibmad/src/dump.c b/libibmad/src/dump.c index 49bb34b..38a2254 100644 --- a/libibmad/src/dump.c +++ b/libibmad/src/dump.c @@ -44,7 +44,6 @@ #include #include -#include void mad_dump_int(char *buf, int bufsz, void *val, int valsz) @@ -729,3 +728,30 @@ mad_dump_perfcounters_ext(char *buf, int bufsz, void *val, int valsz) { _dump_fields(buf, bufsz, val, IB_PC_EXT_FIRST_F, IB_PC_EXT_LAST_F); } + +void xdump(FILE *file, char *msg, void *p, int size) +{ +#define HEX(x) ((x) < 10 ? '0' + (x) : 'a' + ((x) -10)) + uint8_t *cp = p; + int i; + + if (msg) + fputs(msg, file); + + for (i = 0; i < size;) { + fputc(HEX(*cp >> 4), file); + fputc(HEX(*cp & 0xf), file); + if (++i >= size) + break; + fputc(HEX(cp[1] >> 4), file); + fputc(HEX(cp[1] & 0xf), file); + if ((++i) % 16) + fputc(' ', file); + else + fputc('\n', file); + cp += 2; + } + if (i % 16) { + fputc('\n', file); + } +} diff --git a/libibmad/src/fields.c b/libibmad/src/fields.c index ffbfc76..50611c5 100644 --- a/libibmad/src/fields.c +++ b/libibmad/src/fields.c @@ -41,7 +41,6 @@ #include #include -#include /* * BITSOFFS and BE_OFFS are required due the fact that the bit offsets are inconsistently diff --git a/libibmad/src/gs.c b/libibmad/src/gs.c index cade54b..89c927e 100644 --- a/libibmad/src/gs.c +++ b/libibmad/src/gs.c @@ -42,7 +42,6 @@ #include #include -#include #include #include "mad.h" diff --git a/libibmad/src/libibmad.map b/libibmad/src/libibmad.map index ea1ed4b..927e51c 100644 --- a/libibmad/src/libibmad.map +++ b/libibmad/src/libibmad.map @@ -1,5 +1,6 @@ IBMAD_1.3 { global: + xdump; mad_dump_field; mad_dump_val; mad_print_field; diff --git a/libibmad/src/mad.c b/libibmad/src/mad.c index fc73a7a..f0fffcd 100644 --- a/libibmad/src/mad.c +++ b/libibmad/src/mad.c @@ -42,7 +42,6 @@ #include #include -#include #include #include diff --git a/libibmad/src/portid.c b/libibmad/src/portid.c index 24a555b..a84baee 100644 --- a/libibmad/src/portid.c +++ b/libibmad/src/portid.c @@ -45,7 +45,6 @@ #include #include -#include #undef DEBUG #define DEBUG if (ibdebug) IBWARN diff --git a/libibmad/src/register.c b/libibmad/src/register.c index 8e59e6e..a33acd8 100644 --- a/libibmad/src/register.c +++ b/libibmad/src/register.c @@ -43,7 +43,6 @@ #include #include -#include #include #include "mad.h" diff --git a/libibmad/src/resolve.c b/libibmad/src/resolve.c index 25062f6..f012543 100644 --- a/libibmad/src/resolve.c +++ b/libibmad/src/resolve.c @@ -42,7 +42,6 @@ #include #include -#include #include #include diff --git a/libibmad/src/rpc.c b/libibmad/src/rpc.c index 34a6b9a..df28f65 100644 --- a/libibmad/src/rpc.c +++ b/libibmad/src/rpc.c @@ -43,7 +43,6 @@ #include #include -#include #include #include "mad.h" diff --git a/libibmad/src/sa.c b/libibmad/src/sa.c index 2e092ec..192f56e 100644 --- a/libibmad/src/sa.c +++ b/libibmad/src/sa.c @@ -43,7 +43,6 @@ #include #include -#include #undef DEBUG #define DEBUG if (ibdebug) IBWARN diff --git a/libibmad/src/serv.c b/libibmad/src/serv.c index 9b20cb6..a90e961 100644 --- a/libibmad/src/serv.c +++ b/libibmad/src/serv.c @@ -43,7 +43,6 @@ #include #include -#include #include #include diff --git a/libibmad/src/smp.c b/libibmad/src/smp.c index 2c2bde2..d190af0 100644 --- a/libibmad/src/smp.c +++ b/libibmad/src/smp.c @@ -43,7 +43,6 @@ #include #include -#include #undef DEBUG #define DEBUG if (ibdebug) IBWARN diff --git a/libibmad/src/vendor.c b/libibmad/src/vendor.c index 468e2d3..04e7641 100644 --- a/libibmad/src/vendor.c +++ b/libibmad/src/vendor.c @@ -43,7 +43,6 @@ #include #include -#include #undef DEBUG #define DEBUG if (ibdebug) IBWARN -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sun Dec 21 16:57:55 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 Dec 2008 02:57:55 +0200 Subject: [ofa-general] ***SPAM*** [PATCH] libibcommon: remove from the management tree In-Reply-To: <20081222004945.GJ28259@sashak.voltaire.com> References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> <000701c96185$e1c68c00$435a180a@amr.corp.intel.com> <000001c96207$00f55120$ae58180a@amr.corp.intel.com> <20081221233325.GH28259@sashak.voltaire.com> <20081221233424.GI28259@sashak.voltaire.com> <20081222004945.GJ28259@sashak.voltaire.com> Message-ID: <20081222005755.GK28259@sashak.voltaire.com> Remove not used anymore libibcommon. Signed-off-by: Sasha Khapyorsky --- Makefile | 2 +- README | 3 +- libibcommon/AUTHORS | 3 - libibcommon/COPYING | 384 ------------------------------- libibcommon/ChangeLog | 21 -- libibcommon/Makefile.am | 32 --- libibcommon/autogen.sh | 11 - libibcommon/configure.in | 52 ---- libibcommon/include/infiniband/common.h | 136 ----------- libibcommon/libibcommon.spec.in | 72 ------ libibcommon/libibcommon.ver | 9 - libibcommon/src/hash.c | 153 ------------ libibcommon/src/libibcommon.map | 12 - libibcommon/src/stack.c | 178 -------------- libibcommon/src/time.c | 47 ---- libibcommon/src/util.c | 132 ----------- make.dist | 2 +- 17 files changed, 3 insertions(+), 1246 deletions(-) delete mode 100644 libibcommon/AUTHORS delete mode 100644 libibcommon/COPYING delete mode 100644 libibcommon/ChangeLog delete mode 100644 libibcommon/Makefile.am delete mode 100755 libibcommon/autogen.sh delete mode 100644 libibcommon/configure.in delete mode 100644 libibcommon/include/infiniband/common.h delete mode 100644 libibcommon/libibcommon.spec.in delete mode 100644 libibcommon/libibcommon.ver delete mode 100644 libibcommon/src/hash.c delete mode 100644 libibcommon/src/libibcommon.map delete mode 100644 libibcommon/src/stack.c delete mode 100644 libibcommon/src/time.c delete mode 100644 libibcommon/src/util.c diff --git a/Makefile b/Makefile index 863c3aa..f99fb29 100644 --- a/Makefile +++ b/Makefile @@ -1,5 +1,5 @@ -SUBDIRS:= libibcommon libibumad libibmad opensm infiniband-diags +SUBDIRS:= libibumad libibmad opensm infiniband-diags all: diff --git a/README b/README index c6efd9f..2b55a45 100644 --- a/README +++ b/README @@ -9,7 +9,6 @@ git://git.openfabrics.org/~sashak/management.git and can be cloned by: Packages -------- -libibcommon - common stuff libibumad - interface to ib_umad module (user_mad) library libibmad - generic MAD handling library opensm - OpenSM @@ -18,7 +17,7 @@ infiniband-diags - various diagnostic tools Building -------- -To make this unpack tarballs and in directories libibcommon, libibumad, +To make this unpack tarballs and in directories libibumad, libibmad, opensm, infiniband-diags (in that order) run: ./configure && make && make install diff --git a/libibcommon/AUTHORS b/libibcommon/AUTHORS deleted file mode 100644 index d09c13f..0000000 --- a/libibcommon/AUTHORS +++ /dev/null @@ -1,3 +0,0 @@ -Shahar Frank -Hal Rosenstock -Sasha Khapyorsky diff --git a/libibcommon/COPYING b/libibcommon/COPYING deleted file mode 100644 index 1b1ca1d..0000000 --- a/libibcommon/COPYING +++ /dev/null @@ -1,384 +0,0 @@ -This software with the exception of OpenSM is available to you -under a choice of one of two licenses. You may chose to be -licensed under the terms of the the OpenIB.org BSD license or -the GNU General Public License (GPL) Version 2, both included -below. - -OpenSM is licensed under either GNU General Public License (GPL) -Version 2, or Intel BSD + Patent license. See OpenSM for the -specific language for the latter licensing terms. - - -Copyright (c) 2004-2008 Voltaire, Inc. All rights reserved. - -================================================================== - - OpenIB.org BSD license - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions -are met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above - copyright notice, this list of conditions and the following - disclaimer in the documentation and/or other materials provided - with the distribution. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS -FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE -COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, -INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, -BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; -LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT -LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN -ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. - -================================================================== - - GNU GENERAL PUBLIC LICENSE - Version 2, June 1991 - - Copyright (C) 1989, 1991 Free Software Foundation, Inc. - 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - Everyone is permitted to copy and distribute verbatim copies - of this license document, but changing it is not allowed. - - Preamble - - The licenses for most software are designed to take away your -freedom to share and change it. By contrast, the GNU General Public -License is intended to guarantee your freedom to share and change free -software--to make sure the software is free for all its users. This -General Public License applies to most of the Free Software -Foundation's software and to any other program whose authors commit to -using it. (Some other Free Software Foundation software is covered by -the GNU Library General Public License instead.) You can apply it to -your programs, too. - - When we speak of free software, we are referring to freedom, not -price. Our General Public Licenses are designed to make sure that you -have the freedom to distribute copies of free software (and charge for -this service if you wish), that you receive source code or can get it -if you want it, that you can change the software or use pieces of it -in new free programs; and that you know you can do these things. - - To protect your rights, we need to make restrictions that forbid -anyone to deny you these rights or to ask you to surrender the rights. -These restrictions translate to certain responsibilities for you if you -distribute copies of the software, or if you modify it. - - For example, if you distribute copies of such a program, whether -gratis or for a fee, you must give the recipients all the rights that -you have. You must make sure that they, too, receive or can get the -source code. And you must show them these terms so they know their -rights. - - We protect your rights with two steps: (1) copyright the software, and -(2) offer you this license which gives you legal permission to copy, -distribute and/or modify the software. - - Also, for each author's protection and ours, we want to make certain -that everyone understands that there is no warranty for this free -software. If the software is modified by someone else and passed on, we -want its recipients to know that what they have is not the original, so -that any problems introduced by others will not reflect on the original -authors' reputations. - - Finally, any free program is threatened constantly by software -patents. We wish to avoid the danger that redistributors of a free -program will individually obtain patent licenses, in effect making the -program proprietary. To prevent this, we have made it clear that any -patent must be licensed for everyone's free use or not licensed at all. - - The precise terms and conditions for copying, distribution and -modification follow. - - GNU GENERAL PUBLIC LICENSE - TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION - - 0. This License applies to any program or other work which contains -a notice placed by the copyright holder saying it may be distributed -under the terms of this General Public License. The "Program", below, -refers to any such program or work, and a "work based on the Program" -means either the Program or any derivative work under copyright law: -that is to say, a work containing the Program or a portion of it, -either verbatim or with modifications and/or translated into another -language. (Hereinafter, translation is included without limitation in -the term "modification".) Each licensee is addressed as "you". - -Activities other than copying, distribution and modification are not -covered by this License; they are outside its scope. The act of -running the Program is not restricted, and the output from the Program -is covered only if its contents constitute a work based on the -Program (independent of having been made by running the Program). -Whether that is true depends on what the Program does. - - 1. You may copy and distribute verbatim copies of the Program's -source code as you receive it, in any medium, provided that you -conspicuously and appropriately publish on each copy an appropriate -copyright notice and disclaimer of warranty; keep intact all the -notices that refer to this License and to the absence of any warranty; -and give any other recipients of the Program a copy of this License -along with the Program. - -You may charge a fee for the physical act of transferring a copy, and -you may at your option offer warranty protection in exchange for a fee. - - 2. You may modify your copy or copies of the Program or any portion -of it, thus forming a work based on the Program, and copy and -distribute such modifications or work under the terms of Section 1 -above, provided that you also meet all of these conditions: - - a) You must cause the modified files to carry prominent notices - stating that you changed the files and the date of any change. - - b) You must cause any work that you distribute or publish, that in - whole or in part contains or is derived from the Program or any - part thereof, to be licensed as a whole at no charge to all third - parties under the terms of this License. - - c) If the modified program normally reads commands interactively - when run, you must cause it, when started running for such - interactive use in the most ordinary way, to print or display an - announcement including an appropriate copyright notice and a - notice that there is no warranty (or else, saying that you provide - a warranty) and that users may redistribute the program under - these conditions, and telling the user how to view a copy of this - License. (Exception: if the Program itself is interactive but - does not normally print such an announcement, your work based on - the Program is not required to print an announcement.) - -These requirements apply to the modified work as a whole. If -identifiable sections of that work are not derived from the Program, -and can be reasonably considered independent and separate works in -themselves, then this License, and its terms, do not apply to those -sections when you distribute them as separate works. But when you -distribute the same sections as part of a whole which is a work based -on the Program, the distribution of the whole must be on the terms of -this License, whose permissions for other licensees extend to the -entire whole, and thus to each and every part regardless of who wrote it. - -Thus, it is not the intent of this section to claim rights or contest -your rights to work written entirely by you; rather, the intent is to -exercise the right to control the distribution of derivative or -collective works based on the Program. - -In addition, mere aggregation of another work not based on the Program -with the Program (or with a work based on the Program) on a volume of -a storage or distribution medium does not bring the other work under -the scope of this License. - - 3. You may copy and distribute the Program (or a work based on it, -under Section 2) in object code or executable form under the terms of -Sections 1 and 2 above provided that you also do one of the following: - - a) Accompany it with the complete corresponding machine-readable - source code, which must be distributed under the terms of Sections - 1 and 2 above on a medium customarily used for software interchange; or, - - b) Accompany it with a written offer, valid for at least three - years, to give any third party, for a charge no more than your - cost of physically performing source distribution, a complete - machine-readable copy of the corresponding source code, to be - distributed under the terms of Sections 1 and 2 above on a medium - customarily used for software interchange; or, - - c) Accompany it with the information you received as to the offer - to distribute corresponding source code. (This alternative is - allowed only for noncommercial distribution and only if you - received the program in object code or executable form with such - an offer, in accord with Subsection b above.) - -The source code for a work means the preferred form of the work for -making modifications to it. For an executable work, complete source -code means all the source code for all modules it contains, plus any -associated interface definition files, plus the scripts used to -control compilation and installation of the executable. However, as a -special exception, the source code distributed need not include -anything that is normally distributed (in either source or binary -form) with the major components (compiler, kernel, and so on) of the -operating system on which the executable runs, unless that component -itself accompanies the executable. - -If distribution of executable or object code is made by offering -access to copy from a designated place, then offering equivalent -access to copy the source code from the same place counts as -distribution of the source code, even though third parties are not -compelled to copy the source along with the object code. - - 4. You may not copy, modify, sublicense, or distribute the Program -except as expressly provided under this License. Any attempt -otherwise to copy, modify, sublicense or distribute the Program is -void, and will automatically terminate your rights under this License. -However, parties who have received copies, or rights, from you under -this License will not have their licenses terminated so long as such -parties remain in full compliance. - - 5. You are not required to accept this License, since you have not -signed it. However, nothing else grants you permission to modify or -distribute the Program or its derivative works. These actions are -prohibited by law if you do not accept this License. Therefore, by -modifying or distributing the Program (or any work based on the -Program), you indicate your acceptance of this License to do so, and -all its terms and conditions for copying, distributing or modifying -the Program or works based on it. - - 6. Each time you redistribute the Program (or any work based on the -Program), the recipient automatically receives a license from the -original licensor to copy, distribute or modify the Program subject to -these terms and conditions. You may not impose any further -restrictions on the recipients' exercise of the rights granted herein. -You are not responsible for enforcing compliance by third parties to -this License. - - 7. If, as a consequence of a court judgment or allegation of patent -infringement or for any other reason (not limited to patent issues), -conditions are imposed on you (whether by court order, agreement or -otherwise) that contradict the conditions of this License, they do not -excuse you from the conditions of this License. If you cannot -distribute so as to satisfy simultaneously your obligations under this -License and any other pertinent obligations, then as a consequence you -may not distribute the Program at all. For example, if a patent -license would not permit royalty-free redistribution of the Program by -all those who receive copies directly or indirectly through you, then -the only way you could satisfy both it and this License would be to -refrain entirely from distribution of the Program. - -If any portion of this section is held invalid or unenforceable under -any particular circumstance, the balance of the section is intended to -apply and the section as a whole is intended to apply in other -circumstances. - -It is not the purpose of this section to induce you to infringe any -patents or other property right claims or to contest validity of any -such claims; this section has the sole purpose of protecting the -integrity of the free software distribution system, which is -implemented by public license practices. Many people have made -generous contributions to the wide range of software distributed -through that system in reliance on consistent application of that -system; it is up to the author/donor to decide if he or she is willing -to distribute software through any other system and a licensee cannot -impose that choice. - -This section is intended to make thoroughly clear what is believed to -be a consequence of the rest of this License. - - 8. If the distribution and/or use of the Program is restricted in -certain countries either by patents or by copyrighted interfaces, the -original copyright holder who places the Program under this License -may add an explicit geographical distribution limitation excluding -those countries, so that distribution is permitted only in or among -countries not thus excluded. In such case, this License incorporates -the limitation as if written in the body of this License. - - 9. The Free Software Foundation may publish revised and/or new versions -of the General Public License from time to time. Such new versions will -be similar in spirit to the present version, but may differ in detail to -address new problems or concerns. - -Each version is given a distinguishing version number. If the Program -specifies a version number of this License which applies to it and "any -later version", you have the option of following the terms and conditions -either of that version or of any later version published by the Free -Software Foundation. If the Program does not specify a version number of -this License, you may choose any version ever published by the Free Software -Foundation. - - 10. If you wish to incorporate parts of the Program into other free -programs whose distribution conditions are different, write to the author -to ask for permission. For software which is copyrighted by the Free -Software Foundation, write to the Free Software Foundation; we sometimes -make exceptions for this. Our decision will be guided by the two goals -of preserving the free status of all derivatives of our free software and -of promoting the sharing and reuse of software generally. - - NO WARRANTY - - 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY -FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN -OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES -PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED -OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS -TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE -PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, -REPAIR OR CORRECTION. - - 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING -WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR -REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, -INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING -OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED -TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY -YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER -PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE -POSSIBILITY OF SUCH DAMAGES. - - END OF TERMS AND CONDITIONS - - How to Apply These Terms to Your New Programs - - If you develop a new program, and you want it to be of the greatest -possible use to the public, the best way to achieve this is to make it -free software which everyone can redistribute and change under these terms. - - To do so, attach the following notices to the program. It is safest -to attach them to the start of each source file to most effectively -convey the exclusion of warranty; and each file should have at least -the "copyright" line and a pointer to where the full notice is found. - - - Copyright (C) - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software - Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - - -Also add information on how to contact you by electronic and paper mail. - -If the program is interactive, make it output a short notice like this -when it starts in an interactive mode: - - Gnomovision version 69, Copyright (C) year name of author - Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. - This is free software, and you are welcome to redistribute it - under certain conditions; type `show c' for details. - -The hypothetical commands `show w' and `show c' should show the appropriate -parts of the General Public License. Of course, the commands you use may -be called something other than `show w' and `show c'; they could even be -mouse-clicks or menu items--whatever suits your program. - -You should also get your employer (if you work as a programmer) or your -school, if any, to sign a "copyright disclaimer" for the program, if -necessary. Here is a sample; alter the names: - - Yoyodyne, Inc., hereby disclaims all copyright interest in the program - `Gnomovision' (which makes passes at compilers) written by James Hacker. - - , 1 April 1989 - Ty Coon, President of Vice - -This General Public License does not permit incorporating your program into -proprietary programs. If your program is a subroutine library, you may -consider it more useful to permit linking proprietary applications with the -library. If this is what you want to do, use the GNU Library General -Public License instead of this License. diff --git a/libibcommon/ChangeLog b/libibcommon/ChangeLog deleted file mode 100644 index 0fdeaa9..0000000 --- a/libibcommon/ChangeLog +++ /dev/null @@ -1,21 +0,0 @@ -2007-06-29 Hal Rosenstock - - * Release version 1.0.4 - -2007-06-26 Hal Rosenstock - - * src/sysfs.c: Change uint to unsigned for strict ANSI - -2007-06-26 Michael S. Tsirkin - - * include/infiniband/common.h: Change uint to unsigned - for strict ANSI - -2007-01-25 Hal Rosenstock - - * Release version 1.0.2. - -2006-11-20 Sasha Khapyorsky - - * include/infiniband/common.h: Enable strict format/args - checking for printf() style functions diff --git a/libibcommon/Makefile.am b/libibcommon/Makefile.am deleted file mode 100644 index 00e5bc8..0000000 --- a/libibcommon/Makefile.am +++ /dev/null @@ -1,32 +0,0 @@ - -SUBDIRS = . - -INCLUDES = -I$(srcdir)/include/infiniband - -lib_LTLIBRARIES = libibcommon.la - -libibcommon_la_CFLAGS = -Wall - -if HAVE_LD_VERSION_SCRIPT -libibcommon_version_script = -Wl,--version-script=$(srcdir)/src/libibcommon.map -else -libibcommon_version_script = -endif - -libibcommon_la_SOURCES = src/stack.c src/util.c src/time.c src/hash.c -libibcommon_la_LDFLAGS = -version-info $(ibcommon_api_version) \ - -export-dynamic $(libibcommon_version_script) -libibcommon_la_DEPENDENCIES = $(srcdir)/src/libibcommon.map - -libibcommonincludedir = $(includedir)/infiniband - -libibcommoninclude_HEADERS = $(srcdir)/include/infiniband/common.h - -EXTRA_DIST = $(srcdir)/include/infiniband/common.h \ - libibcommon.spec.in libibcommon.spec \ - $(srcdir)/src/libibcommon.map libibcommon.ver autogen.sh - -dist-hook: - if [ -x $(top_srcdir)/../gen_chlog.sh ] ; then \ - $(top_srcdir)/../gen_chlog.sh $(PACKAGE) > $(distdir)/ChangeLog ; \ - fi diff --git a/libibcommon/autogen.sh b/libibcommon/autogen.sh deleted file mode 100755 index 4827884..0000000 --- a/libibcommon/autogen.sh +++ /dev/null @@ -1,11 +0,0 @@ -#! /bin/sh - -# create config dir if not exist -test -d config || mkdir config - -set -x -aclocal -I config -libtoolize --force --copy -autoheader -automake --foreign --add-missing --copy -autoconf diff --git a/libibcommon/configure.in b/libibcommon/configure.in deleted file mode 100644 index 0f2fc33..0000000 --- a/libibcommon/configure.in +++ /dev/null @@ -1,52 +0,0 @@ -dnl Process this file with autoconf to produce a configure script. - -AC_PREREQ(2.57) -AC_INIT(libibcommon, 1.2.0, general at lists.openfabrics.org) -AC_CONFIG_SRCDIR([src/stack.c]) -AC_CONFIG_AUX_DIR(config) -AM_CONFIG_HEADER(config.h) -AM_INIT_AUTOMAKE - -AC_SUBST(RELEASE, ${RELEASE:-unknown}) -AC_SUBST(TARBALL, ${TARBALL:-${PACKAGE}-${VERSION}.tar.gz}) - -dnl the library version info is available in the file: libibcommon.ver -ibcommon_api_version=`grep LIBVERSION $srcdir/libibcommon.ver | sed 's/LIBVERSION=//'` -if test -z $ibcommon_api_version; then - ibcommon_api_version=1:0:0 -fi -AC_SUBST(ibcommon_api_version) - -dnl Checks for programs -AC_PROG_CC -AC_PROG_CPP -AC_PROG_INSTALL -AC_PROG_LN_S -AC_PROG_MAKE_SET -AM_PROG_LIBTOOL - -dnl Checks for header files. -AC_HEADER_STDC -AC_CHECK_HEADERS([fcntl.h inttypes.h netinet/in.h stdint.h stdlib.h string.h sys/ioctl.h syslog.h unistd.h]) - -dnl Checks for library functions -AC_TYPE_SIGNAL -AC_FUNC_VPRINTF -AC_CHECK_FUNCS([strrchr strtoul strtoull]) - -dnl Checks for typedefs, structures, and compiler characteristics. -AC_C_CONST -AC_C_INLINE -AC_STRUCT_TM - -AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, - if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then - ac_cv_version_script=yes - else - ac_cv_version_script=no - fi) - -AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test "$ac_cv_version_script" = "yes") - -AC_CONFIG_FILES([Makefile libibcommon.spec]) -AC_OUTPUT diff --git a/libibcommon/include/infiniband/common.h b/libibcommon/include/infiniband/common.h deleted file mode 100644 index 287703f..0000000 --- a/libibcommon/include/infiniband/common.h +++ /dev/null @@ -1,136 +0,0 @@ -/* - * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ -#ifndef __COMMON_H__ -#define __COMMON_H__ - -#include -#include -#include -#include -#include - -#ifdef __cplusplus -# define BEGIN_C_DECLS extern "C" { -# define END_C_DECLS } -#else /* !__cplusplus */ -# define BEGIN_C_DECLS -# define END_C_DECLS -#endif /* __cplusplus */ - -BEGIN_C_DECLS - -#if __BYTE_ORDER == __LITTLE_ENDIAN -#ifndef ntohll -static inline uint64_t ntohll(uint64_t x) { - return bswap_64(x); -} -#endif -#ifndef htonll -static inline uint64_t htonll(uint64_t x) { - return bswap_64(x); -} -#endif -#elif __BYTE_ORDER == __BIG_ENDIAN -#ifndef ntohll -static inline uint64_t ntohll(uint64_t x) { - return x; -} -#endif -#ifndef htonll -static inline uint64_t htonll(uint64_t x) { - return x; -} -#endif -#endif /* __BYTE_ORDER == __BIG_ENDIAN */ - -/***************************** - * COMMON MACHINE INDEPENDENT - */ - -/* Misc. macros: */ -/** align value \a l to \a size (ceil) */ -#define ALIGN(l, size) (((l) + ((size) - 1)) / (size) * (size)) - -/** align value \a l to \a sizeof 32 bit int (ceil) */ -#define ALIGN32(l) (ALIGN((l), sizeof(uint32))) - -/** printf style debugging MACRO, conmmon header includes name of function */ -#define IBWARN(fmt, args...) ibwarn(__FUNCTION__, fmt, ## args) - -/** printf style debugging MACRO, conmmon header includes name of function */ -#define LOG(fmt, args...) logmsg(__FUNCTION__, fmt, ## args) - -/** printf style abort MACRO, common header includes name of function */ -#define IBPANIC(fmt, args...) ibpanic(__FUNCTION__, fmt, ## args) - -/** abort program if expression \a x is \b false */ -#define SANITY(x) if (common.sanity && !(x))\ - ibpanic(__FUNCTION__,\ - "sanity check <%s> failed: line %d",\ - (x), __LINE__) - -/** avoid unused compilation warning */ -#ifndef USED -#define USED(x) while(0) {void *v = &(x); printf("%p", v);} -#endif - -/** define index macro for string array generated by enumstr.awk */ -#define ENUM_STR_DEF(enumname, last, val) (((unsigned)(val) < last) ? enumname ## _str[val] : "???") -#define ENUM_STR_ARRAY(name) char * name ## _str[] - -#ifdef __GNUC__ -#define IBCOMMON_STRICT_FORMAT __attribute__((format(printf, 2, 3))) -#else -#define IBCOMMON_STRICT_FORMAT -#endif - -/* util.c: debugging and tracing */ -void ibwarn(const char * const fn, char *msg, ...) IBCOMMON_STRICT_FORMAT; -void ibpanic(const char * const fn, char *msg, ...) IBCOMMON_STRICT_FORMAT; -void logmsg(const char *const fn, char *msg, ...) IBCOMMON_STRICT_FORMAT; - -void xdump(FILE *file, char *msg, void *p, int size); - -/* stack.c */ -void stack_dump(void); -void enable_stack_dump(int loop); - -/* time.c */ -uint64_t getcurrenttime(void); - -/* hash.c */ -uint32_t fhash(uint8_t *k, int length, uint32_t initval); - -END_C_DECLS - -#endif /* __COMMON_H__ */ diff --git a/libibcommon/libibcommon.spec.in b/libibcommon/libibcommon.spec.in deleted file mode 100644 index bd328b0..0000000 --- a/libibcommon/libibcommon.spec.in +++ /dev/null @@ -1,72 +0,0 @@ - -%define RELEASE @RELEASE@ -%define rel %{?CUSTOM_RELEASE} %{!?CUSTOM_RELEASE:%RELEASE} - -Summary: OpenFabrics Alliance InfiniBand management common library -Name: libibcommon -Version: @VERSION@ -Release: %rel%{?dist} -License: GPLv2 or BSD -Group: System Environment/Libraries -BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) -Source: http://www.openfabrics.org/downloads/management/@TARBALL@ -Url: http://openfabrics.org/ -Requires(post): /sbin/ldconfig -Requires(postun): /sbin/ldconfig -BuildRequires: libtool - -%description -libibcommon provides common utility functions for the OFA diagnostic and -management tools. - -%package devel -Summary: Development files for the libibcommon library -Group: System Environment/Libraries -Requires: %{name} = %{version}-%{release} -Requires(post): /sbin/ldconfig -Requires(postun): /sbin/ldconfig - -%description devel -Development files for the libibcommon library. - -%package static -Summary: Static library files for the libibcommon library -Group: System Environment/Libraries -Requires: %{name} = %{version}-%{release} - -%description static -Static library files for the libibcommon library. - -%prep -%setup -q - -%build -%configure -make %{?_smp_mflags} - -%install -make DESTDIR=${RPM_BUILD_ROOT} install -# remove unpackaged files from the buildroot -rm -f $RPM_BUILD_ROOT%{_libdir}/*.la - -%clean -rm -rf $RPM_BUILD_ROOT - -%post -p /sbin/ldconfig -%postun -p /sbin/ldconfig -%post devel -p /sbin/ldconfig -%postun devel -p /sbin/ldconfig - -%files -%defattr(-,root,root) -%{_libdir}/libibcommon*.so.* -%doc AUTHORS COPYING ChangeLog - -%files devel -%defattr(-,root,root) -%{_libdir}/libibcommon.so -%{_includedir}/infiniband/*.h - -%files static -%defattr(-,root,root) -%{_libdir}/libibcommon.a diff --git a/libibcommon/libibcommon.ver b/libibcommon/libibcommon.ver deleted file mode 100644 index 22b16d9..0000000 --- a/libibcommon/libibcommon.ver +++ /dev/null @@ -1,9 +0,0 @@ -# In this file we track the current API version -# of the IB common interface (and libraries) -# The version is built of the following -# tree numbers: -# API_REV:RUNNING_REV:AGE -# API_REV - advance on any added API -# RUNNING_REV - advance any change to the vendor files -# AGE - number of backward versions the API still supports -LIBVERSION=2:1:1 diff --git a/libibcommon/src/hash.c b/libibcommon/src/hash.c deleted file mode 100644 index 05fbff2..0000000 --- a/libibcommon/src/hash.c +++ /dev/null @@ -1,153 +0,0 @@ -/* - * Copyright (c) 2005 Voltaire Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -/* - * By Bob Jenkins, 1996. bob_jenkins at burtleburtle.net. You may use this - * code any way you wish, private, educational, or commercial. It's free. - * - * See http://burtleburtle.net/bob/hash/evahash.html - * Use for hash table lookup, or anything where one collision in 2^^32 is - * acceptable. Do NOT use for cryptographic purposes. - */ - -#include - -#define hashsize(n) ((uint32)1<<(n)) -#define hashmask(n) (hashsize(n)-1) - - -/* --------------------------------------------------------------------- -mix -- mix 3 32-bit values reversibly. -For every delta with one or two bits set, and the deltas of all three - high bits or all three low bits, whether the original value of a,b,c - is almost all zero or is uniformly distributed, -* If mix() is run forward or backward, at least 32 bits in a,b,c - have at least 1/4 probability of changing. -* If mix() is run forward, every bit of c will change between 1/3 and - 2/3 of the time. (Well, 22/100 and 78/100 for some 2-bit deltas.) -mix() was built out of 36 single-cycle latency instructions in a - structure that could supported 2x parallelism, like so: - a -= b; - a -= c; x = (c>>13); - b -= c; a ^= x; - b -= a; x = (a<<8); - c -= a; b ^= x; - c -= b; x = (b>>13); - ... - Unfortunately, superscalar Pentiums and Sparcs can't take advantage - of that parallelism. They've also turned some of those single-cycle - latency instructions into multi-cycle latency instructions. Still, - this is the fastest good hash I could find. There were about 2^^68 - to choose from. I only looked at a billion or so. --------------------------------------------------------------------- -*/ -#define mix(a,b,c) \ -{ \ - a -= b; a -= c; a ^= (c>>13); \ - b -= c; b -= a; b ^= (a<<8); \ - c -= a; c -= b; c ^= (b>>13); \ - a -= b; a -= c; a ^= (c>>12); \ - b -= c; b -= a; b ^= (a<<16); \ - c -= a; c -= b; c ^= (b>>5); \ - a -= b; a -= c; a ^= (c>>3); \ - b -= c; b -= a; b ^= (a<<10); \ - c -= a; c -= b; c ^= (b>>15); \ -} - -/* --------------------------------------------------------------------- -fhash() -- hash a variable-length key into a 32-bit value - k : the key (the unaligned variable-length array of bytes) - len : the length of the key, counting by bytes - initval : can be any 4-byte value -Returns a 32-bit value. Every bit of the key affects every bit of -the return value. Every 1-bit and 2-bit delta achieves avalanche. -About 6*len+35 instructions. - -The best hash table sizes are powers of 2. There is no need to do -mod a prime (mod is sooo slow!). If you need less than 32 bits, -use a bitmask. For example, if you need only 10 bits, do - h = (h & hashmask(10)); -In which case, the hash table should have hashsize(10) elements. - -If you are hashing n strings (uint8 **)k, do it like this: - for (i=0, h=0; i= 12) { - a += (k[0] + ((uint32_t)k[1]<<8) + - ((uint32_t)k[2]<<16) + ((uint32_t)k[3]<<24)); - b += (k[4] + ((uint32_t)k[5]<<8) + ((uint32_t)k[6]<<16) + - ((uint32_t)k[7]<<24)); - c += (k[8] + ((uint32_t)k[9]<<8) + ((uint32_t)k[10]<<16) + - ((uint32_t)k[11]<<24)); - mix(a, b, c); - k += 12; len -= 12; - } - - /* handle the last 11 bytes */ - c += length; - switch (len) { /* all the case statements fall through */ - case 11: c += ((uint32_t)k[10]<<24); - case 10: c += ((uint32_t)k[9]<<16); - case 9 : c += ((uint32_t)k[8]<<8); - /* the first byte of c is reserved for the length */ - case 8 : b += ((uint32_t)k[7]<<24); - case 7 : b += ((uint32_t)k[6]<<16); - case 6 : b += ((uint32_t)k[5]<<8); - case 5 : b += k[4]; - case 4 : a += ((uint32_t)k[3]<<24); - case 3 : a += ((uint32_t)k[2]<<16); - case 2 : a += ((uint32_t)k[1]<<8); - case 1 : a += k[0]; - /* case 0: nothing left to add */ - } - - mix(a, b, c); - - return c; -} diff --git a/libibcommon/src/libibcommon.map b/libibcommon/src/libibcommon.map deleted file mode 100644 index f1f693a..0000000 --- a/libibcommon/src/libibcommon.map +++ /dev/null @@ -1,12 +0,0 @@ -IBCOMMON_1.0 { - global: - enable_stack_dump; - stack_dump; - getcurrenttime; - fhash; - logmsg; - ibpanic; - ibwarn; - xdump; - local: *; -}; diff --git a/libibcommon/src/stack.c b/libibcommon/src/stack.c deleted file mode 100644 index a51edae..0000000 --- a/libibcommon/src/stack.c +++ /dev/null @@ -1,178 +0,0 @@ -/* - * Copyright (c) 2004,2005 Voltaire Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -#define _GNU_SOURCE - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "common.h" - -static int loop_on_panic; - -void -stack_dump(void) -{ - if (!__builtin_frame_address(1)) - return - syslog(LOG_ALERT, "#1 %p\n", __builtin_return_address(1)); - - if (!__builtin_frame_address(2)) - return - syslog(LOG_ALERT, "#2 %p\n", __builtin_return_address(2)); - - if (!__builtin_frame_address(3)) - return - syslog(LOG_ALERT, "#3 %p\n", __builtin_return_address(3)); - - if (!__builtin_frame_address(4)) - return - syslog(LOG_ALERT, "#4 %p\n", __builtin_return_address(4)); - - if (!__builtin_frame_address(5)) - return - syslog(LOG_ALERT, "#5 %p\n", __builtin_return_address(5)); - - if (!__builtin_frame_address(6)) - return - syslog(LOG_ALERT, "#6 %p\n", __builtin_return_address(6)); - - if (!__builtin_frame_address(7)) - return - syslog(LOG_ALERT, "#7 %p\n", __builtin_return_address(7)); - - if (!__builtin_frame_address(8)) - return - syslog(LOG_ALERT, "#8 %p\n", __builtin_return_address(8)); - - if (!__builtin_frame_address(9)) - return - syslog(LOG_ALERT, "#9 %p\n", __builtin_return_address(9)); - - if (!__builtin_frame_address(10)) - return - syslog(LOG_ALERT, "#10 %p\n", __builtin_return_address(10)); - - if (!__builtin_frame_address(11)) - return - syslog(LOG_ALERT, "#11 %p\n", __builtin_return_address(11)); - - if (!__builtin_frame_address(12)) - return - syslog(LOG_ALERT, "#12 %p\n", __builtin_return_address(12)); - - if (!__builtin_frame_address(13)) - return - syslog(LOG_ALERT, "#13 %p\n", __builtin_return_address(13)); - - if (!__builtin_frame_address(14)) - return - syslog(LOG_ALERT, "#14 %p\n", __builtin_return_address(14)); - - if (!__builtin_frame_address(15)) - return - syslog(LOG_ALERT, "#15 %p\n", __builtin_return_address(15)); - - if (!__builtin_frame_address(16)) - return - syslog(LOG_ALERT, "#16 %p\n", __builtin_return_address(16)); - - if (!__builtin_frame_address(17)) - return - syslog(LOG_ALERT, "#17 %p\n", __builtin_return_address(17)); - - if (!__builtin_frame_address(18)) - return - syslog(LOG_ALERT, "#18 %p\n", __builtin_return_address(18)); -} - -static void -handler(int x) -{ - static int in; - time_t tm; - - if (!in) { - in++; - - syslog(LOG_ALERT, "*** exception handler: died with signal %d", x); - stack_dump(); - - fflush(NULL); - - tm = time(0); - fprintf(stderr, "%s *** exception handler: died with signal %d pid %d\n", - ctime(&tm), x, getpid()); - - fflush(NULL); - } - - if (loop_on_panic) { - fprintf(stderr, "exception handler: entering tight loop ... pid %d\n",getpid()); - for (; ; ) - ; - } - - signal(x, SIG_DFL); -} - -void -enable_stack_dump(int loop) -{ - loop_on_panic = loop; - signal(SIGILL, handler); - signal(SIGBUS, handler); - signal(SIGSEGV, handler); - signal(SIGABRT, handler); -} diff --git a/libibcommon/src/time.c b/libibcommon/src/time.c deleted file mode 100644 index 7354d6a..0000000 --- a/libibcommon/src/time.c +++ /dev/null @@ -1,47 +0,0 @@ -/* - * Copyright (c) 2004,2005 Voltaire Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -#include -#include - -/** - * getcurrenttime: Returns micro seconds elapsed from epoch. - */ -uint64_t -getcurrenttime(void) -{ - struct timeval tv; - - gettimeofday(&tv, 0); - return (uint64_t)tv.tv_sec * 1000000 + tv.tv_usec; -} diff --git a/libibcommon/src/util.c b/libibcommon/src/util.c deleted file mode 100644 index 4b91644..0000000 --- a/libibcommon/src/util.c +++ /dev/null @@ -1,132 +0,0 @@ -/* - * Copyright (c) 2004-2008 Voltaire Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -#define _GNU_SOURCE - -#if HAVE_CONFIG_H -# include -#endif /* HAVE_CONFIG_H */ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include - -void -ibwarn(const char * const fn, char *msg, ...) -{ - char buf[512]; - va_list va; - int n; - - va_start(va, msg); - n = vsnprintf(buf, sizeof(buf), msg, va); - va_end(va); - - printf("ibwarn: [%d] %s: %s\n", getpid(), fn, buf); -} - -void -ibpanic(const char * const fn, char *msg, ...) -{ - char buf[512]; - va_list va; - int n; - - va_start(va, msg); - n = vsnprintf(buf, sizeof(buf), msg, va); - va_end(va); - - printf("ibpanic: [%d] %s: %s: (%m)\n", getpid(), fn, buf); - syslog(LOG_ALERT, "ibpanic: [%d] %s: %s: (%m)\n", getpid(), fn, buf); - - exit(-1); -} - -void -logmsg(const char * const fn, char *msg, ...) -{ - char buf[512]; - va_list va; - int n; - - va_start(va, msg); - n = vsnprintf(buf, sizeof(buf), msg, va); - va_end(va); - - syslog(LOG_ALERT, "[%d] %s: %s: (%m)\n", getpid(), fn, buf); -} - -void -xdump(FILE *file, char *msg, void *p, int size) -{ -#define HEX(x) ((x) < 10 ? '0' + (x) : 'a' + ((x) -10)) - uint8_t *cp = p; - int i; - - if (msg) - fputs(msg, file); - - for (i = 0; i < size;) { - fputc(HEX(*cp >> 4), file); - fputc(HEX(*cp & 0xf), file); - if (++i >= size) - break; - fputc(HEX(cp[1] >> 4), file); - fputc(HEX(cp[1] & 0xf), file); - if ((++i) % 16) - fputc(' ', file); - else - fputc('\n', file); - cp += 2; - } - if (i % 16) { - fputc('\n', file); - } -} diff --git a/make.dist b/make.dist index b5b587c..b668046 100755 --- a/make.dist +++ b/make.dist @@ -52,7 +52,7 @@ if [ -z "$1" ]; then usage; exit 1; fi if [ "$1" != "daily" -a "$1" != "release" ]; then usage; exit 1; fi if [ -z "$TARGETS" ]; then - TARGETS="libibcommon libibumad libibmad opensm infiniband-diags" + TARGETS="libibumad libibmad opensm infiniband-diags" fi # Is the repo clean? -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sun Dec 21 19:20:46 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 Dec 2008 05:20:46 +0200 Subject: [ofa-general] [PATCH] opensm/osm_mesh: make mesh_info static and const Message-ID: <20081222032046.GM28259@sashak.voltaire.com> Make mesh_info static and const. Signed-off-by: Sasha Khapyorsky --- opensm/opensm/osm_mesh.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index e9723b0..9e3e9de 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -54,7 +54,7 @@ /* * characteristic polynomials for selected 1d through 8d tori */ -struct _mesh_info { +static const struct mesh_info { int dimension; /* dimension of the torus */ int size[MAX_DIMENSION]; /* size of the torus */ int degree; /* degree of polynomial */ @@ -246,7 +246,7 @@ static char *poly_print(int n, int *coeff) * * return a nonzero value if polynomials differ else 0 */ -static int poly_diff(int n, int *p, switch_t *s) +static int poly_diff(int n, const int *p, switch_t *s) { if (s->node->num_links != n) return 1; @@ -691,7 +691,7 @@ static void classify_mesh_type(lash_t *p_lash, int sw) osm_log_t *p_log = &p_lash->p_osm->log; int i; switch_t *s = p_lash->switches[sw]; - struct _mesh_info *t; + const struct mesh_info *t; OSM_LOG_ENTER(p_log); @@ -1378,7 +1378,7 @@ int osm_do_mesh_analysis(lash_t *p_lash) p += sprintf( p, "%snode shape is ", (mesh->num_class == 1)? "" : "most common "); if (s->node->type) { - struct _mesh_info *t = &mesh_info[s->node->type]; + const struct mesh_info *t = &mesh_info[s->node->type]; for (i = 0; i < t->dimension; i++) { p += sprintf(p, "%s%d%s", i? " x " : "", t->size[i], -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Sun Dec 21 19:21:44 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Mon, 22 Dec 2008 05:21:44 +0200 Subject: [ofa-general] [PATCH] opensm/osm_mesh: simplify mesh node links and ports allocation Message-ID: <20081222032144.GN28259@sashak.voltaire.com> Simplify mesh node links and ports allocation - use zero sized arrays and alloc node and link structures as single memory chunk. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_mesh.h | 8 ++++---- opensm/opensm/osm_mesh.c | 24 ++++++------------------ 2 files changed, 10 insertions(+), 22 deletions(-) diff --git a/opensm/include/opensm/osm_mesh.h b/opensm/include/opensm/osm_mesh.h index 9e23498..173fa86 100644 --- a/opensm/include/opensm/osm_mesh.h +++ b/opensm/include/opensm/osm_mesh.h @@ -48,17 +48,15 @@ struct _switch; typedef struct _link { int switch_id; int link_id; - int *ports; - int num_ports; int next_port; + int num_ports; + int ports[0]; } link_t; /* * per switch node mesh info */ typedef struct _mesh_node { - unsigned int num_links; /* number of 'links' to adjacent switches */ - link_t **links; /* per link information */ int *axes; /* used to hold and reorder assigned axes */ int *coord; /* mesh coordinates of switch */ int **matrix; /* distances between adjacant switches */ @@ -67,6 +65,8 @@ typedef struct _mesh_node { int dimension; /* apparent dimension of mesh around node */ int temp; /* temporary holder for distance info */ int type; /* index of node type in mesh_info array */ + unsigned int num_links; /* number of 'links' to adjacent switches */ + link_t *links[0]; /* per link information */ } mesh_node_t; void osm_mesh_node_delete(struct _lash *p_lash, struct _switch *sw); diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index 9e3e9de..263d29e 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -1253,16 +1253,9 @@ void osm_mesh_node_delete(lash_t *p_lash, switch_t *sw) OSM_LOG_ENTER(p_log); if (node) { - if (node->links) { - for (i = 0; i < num_ports; i++) { - if (node->links[i]) { - if (node->links[i]->ports) - free(node->links[i]->ports); - free(node->links[i]); - } - } - free(node->links); - } + for (i = 0; i < num_ports; i++) + if (node->links[i]) + free(node->links[i]); if (node->poly) free(node->poly); @@ -1301,17 +1294,12 @@ int osm_mesh_node_create(lash_t *p_lash, switch_t *sw) OSM_LOG_ENTER(p_log); - if (!(node = sw->node = calloc(1, sizeof(mesh_node_t)))) + if (!(node = sw->node = calloc(1, sizeof(mesh_node_t) + num_ports * sizeof(link_t *)))) goto err; - if (!(node->links = calloc(num_ports, sizeof(link_t *)))) - goto err; - - for (i = 0; i < num_ports; i++) { - if (!(node->links[i] = calloc(1, sizeof(link_t))) || - !(node->links[i]->ports = calloc(num_ports, sizeof(int)))) + for (i = 0; i < num_ports; i++) + if (!(node->links[i] = calloc(1, sizeof(link_t) + num_ports * sizeof(int)))) goto err; - } if (!(node->axes = calloc(num_ports, sizeof(int)))) goto err; -- 1.6.0.4.766.g6fc4a From yevgenyp at mellanox.co.il Mon Dec 22 00:13:31 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 22 Dec 2008 10:13:31 +0200 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: <3b5e77ad0812210821k66346d5frb036bc6bc9e8894c@mail.gmail.com> References: <4907348E.7060508@mellanox.co.il> <3b5e77ad0812210414n73765c2iccf2fc98c492c07c@mail.gmail.com> <3b5e77ad0812210821k66346d5frb036bc6bc9e8894c@mail.gmail.com> Message-ID: <494F4C2B.50109@mellanox.co.il> Roland, > We encountered a problem that when a machine didn't support the > required number of vectors (nvec), > instead of trying to get 2 vectors like in the previous version, it > didn't use MSI-X at all - causing a major performance degradation. > Maybe in a case of failure we should try lowering the number of > vectors to 2 (like in the previous version) or the return value of > pci_enable_msix and goto no_msi only in case of a second failure. > > Ron I agree with Ron on this issue, trying to get the number of vectors that was returned by pci_enable_msix seems to be the better solution in case of failure. As for the rest of the patch, I didn't find any major problems, except for a small typo: > + mdev->profile.prof[i].rx_ring_num = dev->caps.num_comp_vectors;; The driver worked fine with the new addition. Thanks, Yevgeny From yevgenyp at mellanox.co.il Mon Dec 22 00:44:52 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 22 Dec 2008 10:44:52 +0200 Subject: [ofa-general][PATCH 2/3] mlx4: Default value for automatic completion vector selection Message-ID: <494F5384.1090304@mellanox.co.il> When the vector number passed to mlx4_cq_alloc is MLX4_LEAST_ATTACHED_VECTOR (0xffffffff), the driver selects the completion vector that has the least CQ's attached to it and attaches the CQ to the chosen vector. IB_CQ_VECTOR_LEAST_ATTACHED is defined in rdma/ib_verbs.h, when mlx4_ib driver, receives this cq vector number, it uses MLX4_LEAST_ATTACHED_VECTOR an CQ creation. Signed-off-by: Yevgeny Petrilin --- Roland, I modified this patch to match the changes you made in the "multiple completion vectors" patch. Thanks, Yevgeny drivers/infiniband/hw/mlx4/cq.c | 4 +++- drivers/net/mlx4/cq.c | 22 +++++++++++++++++++++- drivers/net/mlx4/en_cq.c | 2 +- drivers/net/mlx4/mlx4.h | 1 + include/linux/mlx4/device.h | 2 ++ include/rdma/ib_verbs.h | 10 +++++++++- 6 files changed, 37 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 2198753..1f284cd 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -222,7 +222,9 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector } err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, - cq->db.dma, &cq->mcq, vector, 0); + cq->db.dma, &cq->mcq, + vector == IB_CQ_VECTOR_LEAST_ATTACHED ? + MLX4_LEAST_ATTACHED_VECTOR : vector, 0); if (err) goto err_dbmap; diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index ac57b6a..515046e 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -187,6 +187,22 @@ int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq, } EXPORT_SYMBOL_GPL(mlx4_cq_resize); +static int mlx4_find_least_loaded_vector(struct mlx4_priv *priv) +{ + int i; + int index = 0; + int min = priv->eq_table.eq[0].load; + + for (i = 1; i < priv->dev.caps.num_comp_vectors; i++) { + if (priv->eq_table.eq[i].load < min) { + index = i; + min = priv->eq_table.eq[i].load; + } + } + + return index; +} + int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, unsigned vector, int collapsed) @@ -198,7 +214,9 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, u64 mtt_addr; int err; - if (vector >= dev->caps.num_comp_vectors) + if (vector == MLX4_LEAST_ATTACHED_VECTOR) + vector = mlx4_find_least_loaded_vector(priv); + else if (vector >= dev->caps.num_comp_vectors) return -EINVAL; cq->vector = vector; @@ -245,6 +263,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, if (err) goto err_radix; + priv->eq_table.eq[cq->vector].load++; cq->cons_index = 0; cq->arm_sn = 1; cq->uar = uar; @@ -282,6 +301,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq) mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn); synchronize_irq(priv->eq_table.eq[cq->vector].irq); + priv->eq_table.eq[cq->vector].load--; spin_lock_irq(&cq_table->lock); radix_tree_delete(&cq_table->tree, cq->cqn); diff --git a/drivers/net/mlx4/en_cq.c b/drivers/net/mlx4/en_cq.c index 674f836..ae2f989 100644 --- a/drivers/net/mlx4/en_cq.c +++ b/drivers/net/mlx4/en_cq.c @@ -56,7 +56,7 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, cq->vector = ring % mdev->dev->caps.num_comp_vectors; } else { cq->buf_size = sizeof(struct mlx4_cqe); - cq->vector = 0; + cq->vector = MLX4_LEAST_ATTACHED_VECTOR; } cq->ring = ring; diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index e0213ba..5491ecd 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -136,6 +136,7 @@ struct mlx4_eq { u16 irq; u16 have_irq; int nent; + int load; struct mlx4_buf_list *page_list; struct mlx4_mtt mtt; }; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 8f659cc..8d8c25d 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -169,6 +169,8 @@ enum { MLX4_NUM_FEXCH = 64 * 1024, }; +#define MLX4_LEAST_ATTACHED_VECTOR 0xffffffff + static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor) { return (major << 32) | (minor << 16) | subminor; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 936e333..e76e028 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1448,6 +1448,13 @@ static inline int ib_post_recv(struct ib_qp *qp, return qp->device->post_recv(qp, recv_wr, bad_recv_wr); } +/* + * IB_CQ_VECTOR_LEAST_ATTACHED: The constant specifies that + * the CQ will be attached to the completion vector that has + * the least number of CQs already attached to it. + */ +#define IB_CQ_VECTOR_LEAST_ATTACHED 0xffffffff + /** * ib_create_cq - Creates a CQ on the specified device. * @device: The device on which to create the CQ. @@ -1459,7 +1466,8 @@ static inline int ib_post_recv(struct ib_qp *qp, * the associated completion and event handlers. * @cqe: The minimum size of the CQ. * @comp_vector - Completion vector used to signal completion events. - * Must be >= 0 and < context->num_comp_vectors. + * Must be >= 0 and < context->num_comp_vectors + * or IB_CQ_VECTOR_LEAST_ATTACHED. * * Users can examine the cq structure to determine the actual CQ size. */ -- 1.5.4 From yevgenyp at mellanox.co.il Mon Dec 22 02:00:04 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 22 Dec 2008 12:00:04 +0200 Subject: [ofa-general] [PATCH 1/9] mlx4_en: Memory leak on completion queue free Message-ID: <494F6524.1030301@mellanox.co.il> If port is being destroyed without being activated before, CQ resources are not freed. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/en_cq.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/net/mlx4/en_cq.c b/drivers/net/mlx4/en_cq.c index ae2f989..4aa3b0f 100644 --- a/drivers/net/mlx4/en_cq.c +++ b/drivers/net/mlx4/en_cq.c @@ -71,6 +71,8 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, err = mlx4_en_map_buffer(&cq->wqres.buf); if (err) mlx4_free_hwq_res(mdev->dev, &cq->wqres, cq->buf_size); + else + cq->buf = (struct mlx4_cqe *) cq->wqres.buf.direct.buf; return err; } @@ -85,7 +87,6 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq) cq->mcq.arm_db = cq->wqres.db.db + 1; *cq->mcq.set_ci_db = 0; *cq->mcq.arm_db = 0; - cq->buf = (struct mlx4_cqe *) cq->wqres.buf.direct.buf; memset(cq->buf, 0, cq->buf_size); err = mlx4_cq_alloc(mdev->dev, cq->size, &cq->wqres.mtt, &mdev->priv_uar, -- 1.5.4 From yevgenyp at mellanox.co.il Mon Dec 22 02:00:22 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 22 Dec 2008 12:00:22 +0200 Subject: [ofa-general] [PATCH 2/9] mlx4_en: Removed TX locking when polling TX cq Message-ID: <494F6536.6040107@mellanox.co.il> There is no need to synchronize the polling with the transmit function. The only place to synchronize is when we process the cq from the transmit function. Also removed spin_lock_irq, and using spin_trylock, if somebody else is already processing the cq, no need to wait for it to finish. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/en_tx.c | 24 +++++++++++++----------- 1 files changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c index 8592f8f..1f25821 100644 --- a/drivers/net/mlx4/en_tx.c +++ b/drivers/net/mlx4/en_tx.c @@ -404,14 +404,12 @@ void mlx4_en_tx_irq(struct mlx4_cq *mcq) struct mlx4_en_priv *priv = netdev_priv(cq->dev); struct mlx4_en_tx_ring *ring = &priv->tx_ring[cq->ring]; - spin_lock_irq(&ring->comp_lock); cq->armed = 0; + if (!spin_trylock(&ring->comp_lock)) + return; mlx4_en_process_tx_cq(cq->dev, cq); - if (ring->blocked) - mlx4_en_arm_cq(priv, cq); - else - mod_timer(&cq->timer, jiffies + 1); - spin_unlock_irq(&ring->comp_lock); + mod_timer(&cq->timer, jiffies + 1); + spin_unlock(&ring->comp_lock); } @@ -424,8 +422,10 @@ void mlx4_en_poll_tx_cq(unsigned long data) INC_PERF_COUNTER(priv->pstats.tx_poll); - netif_tx_lock(priv->dev); - spin_lock_irq(&ring->comp_lock); + if (!spin_trylock(&ring->comp_lock)) { + mod_timer(&cq->timer, jiffies + MLX4_EN_TX_POLL_TIMEOUT); + return; + } mlx4_en_process_tx_cq(cq->dev, cq); inflight = (u32) (ring->prod - ring->cons - ring->last_nr_txbb); @@ -435,8 +435,7 @@ void mlx4_en_poll_tx_cq(unsigned long data) if (inflight && priv->port_up) mod_timer(&cq->timer, jiffies + MLX4_EN_TX_POLL_TIMEOUT); - spin_unlock_irq(&ring->comp_lock); - netif_tx_unlock(priv->dev); + spin_unlock(&ring->comp_lock); } static struct mlx4_en_tx_desc *mlx4_en_bounce_to_desc(struct mlx4_en_priv *priv, @@ -479,7 +478,10 @@ static inline void mlx4_en_xmit_poll(struct mlx4_en_priv *priv, int tx_ind) /* Poll the CQ every mlx4_en_TX_MODER_POLL packets */ if ((++ring->poll_cnt & (MLX4_EN_TX_POLL_MODER - 1)) == 0) - mlx4_en_process_tx_cq(priv->dev, cq); + if (spin_trylock(&ring->comp_lock)) { + mlx4_en_process_tx_cq(priv->dev, cq); + spin_unlock(&ring->comp_lock); + } } static void *get_frag_ptr(struct sk_buff *skb) -- 1.5.4 From yevgenyp at mellanox.co.il Mon Dec 22 02:00:31 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 22 Dec 2008 12:00:31 +0200 Subject: [ofa-general] [PATCH 3/9] mlx4_en: Removed redundant cq->armed flag Message-ID: <494F653F.2010806@mellanox.co.il> Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/en_cq.c | 1 - drivers/net/mlx4/en_tx.c | 5 ++--- drivers/net/mlx4/mlx4_en.h | 1 - 3 files changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/net/mlx4/en_cq.c b/drivers/net/mlx4/en_cq.c index 4aa3b0f..04a2804 100644 --- a/drivers/net/mlx4/en_cq.c +++ b/drivers/net/mlx4/en_cq.c @@ -140,7 +140,6 @@ int mlx4_en_set_cq_moder(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq) int mlx4_en_arm_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq) { - cq->armed = 1; mlx4_cq_arm(&cq->mcq, MLX4_CQ_DB_REQ_NOT, priv->mdev->uar_map, &priv->mdev->uar_lock); diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c index 1f25821..ff4d752 100644 --- a/drivers/net/mlx4/en_tx.c +++ b/drivers/net/mlx4/en_tx.c @@ -379,8 +379,8 @@ static void mlx4_en_process_tx_cq(struct net_device *dev, struct mlx4_en_cq *cq) /* Wakeup Tx queue if this ring stopped it */ if (unlikely(ring->blocked)) { - if (((u32) (ring->prod - ring->cons) <= - ring->size - HEADROOM - MAX_DESC_TXBBS) && !cq->armed) { + if ((u32) (ring->prod - ring->cons) <= + ring->size - HEADROOM - MAX_DESC_TXBBS) { /* TODO: support multiqueue netdevs. Currently, we block * when *any* ring is full. Note that: @@ -404,7 +404,6 @@ void mlx4_en_tx_irq(struct mlx4_cq *mcq) struct mlx4_en_priv *priv = netdev_priv(cq->dev); struct mlx4_en_tx_ring *ring = &priv->tx_ring[cq->ring]; - cq->armed = 0; if (!spin_trylock(&ring->comp_lock)) return; mlx4_en_process_tx_cq(cq->dev, cq); diff --git a/drivers/net/mlx4/mlx4_en.h b/drivers/net/mlx4/mlx4_en.h index 98ddc08..45b975f 100644 --- a/drivers/net/mlx4/mlx4_en.h +++ b/drivers/net/mlx4/mlx4_en.h @@ -311,7 +311,6 @@ struct mlx4_en_cq { enum cq_type is_tx; u16 moder_time; u16 moder_cnt; - int armed; struct mlx4_cqe *buf; #define MLX4_EN_OPCODE_ERROR 0x1e }; -- 1.5.4 From yevgenyp at mellanox.co.il Mon Dec 22 02:00:49 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 22 Dec 2008 12:00:49 +0200 Subject: [ofa-general] [PATCH 4/9] mlx4_en: Verify number of RX rings doesn't exceed MAX_RX_RINGS Message-ID: <494F6551.3040500@mellanox.co.il> Required in cases were dev->caps.num_comp_vectors > MAX_RX_RINGS. For current values this would happen on machines that have more then 16 cores. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/en_main.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/net/mlx4/en_main.c b/drivers/net/mlx4/en_main.c index e44e018..34f3a19 100644 --- a/drivers/net/mlx4/en_main.c +++ b/drivers/net/mlx4/en_main.c @@ -170,7 +170,8 @@ static void *mlx4_en_add(struct mlx4_dev *dev) mlx4_info(mdev, "Using %d tx rings for port:%d\n", mdev->profile.prof[i].tx_ring_num, i); if (!mdev->profile.prof[i].rx_ring_num) { - mdev->profile.prof[i].rx_ring_num = dev->caps.num_comp_vectors;; + mdev->profile.prof[i].rx_ring_num = + min_t(int, dev->caps.num_comp_vectors, MAX_RX_RINGS); mlx4_info(mdev, "Defaulting to %d rx rings for port:%d\n", mdev->profile.prof[i].rx_ring_num, i); } else -- 1.5.4 From yevgenyp at mellanox.co.il Mon Dec 22 02:00:59 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 22 Dec 2008 12:00:59 +0200 Subject: [ofa-general] [PATCH 5/9] mlx4_en: Removed Interrupt moderation module parameters Message-ID: <494F655B.1070405@mellanox.co.il> They are controlled through Ethtool interface, no need to have two ways to modify them. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/en_netdev.c | 14 +++----------- drivers/net/mlx4/en_params.c | 10 ---------- drivers/net/mlx4/mlx4_en.h | 3 --- 3 files changed, 3 insertions(+), 24 deletions(-) diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c index 96e709d..8b5e8f0 100644 --- a/drivers/net/mlx4/en_netdev.c +++ b/drivers/net/mlx4/en_netdev.c @@ -369,7 +369,6 @@ static struct net_device_stats *mlx4_en_get_stats(struct net_device *dev) static void mlx4_en_set_default_moderation(struct mlx4_en_priv *priv) { - struct mlx4_en_dev *mdev = priv->mdev; struct mlx4_en_cq *cq; int i; @@ -379,15 +378,8 @@ static void mlx4_en_set_default_moderation(struct mlx4_en_priv *priv) * satisfy our coelsing target. * - moder_time is set to a fixed value. */ - priv->rx_frames = (mdev->profile.rx_moder_cnt == - MLX4_EN_AUTO_CONF) ? - MLX4_EN_RX_COAL_TARGET / - priv->dev->mtu + 1 : - mdev->profile.rx_moder_cnt; - priv->rx_usecs = (mdev->profile.rx_moder_time == - MLX4_EN_AUTO_CONF) ? - MLX4_EN_RX_COAL_TIME : - mdev->profile.rx_moder_time; + priv->rx_frames = MLX4_EN_RX_COAL_TARGET / priv->dev->mtu + 1; + priv->rx_usecs = MLX4_EN_RX_COAL_TIME; mlx4_dbg(INTR, priv, "Default coalesing params for mtu:%d - " "rx_frames:%d rx_usecs:%d\n", priv->dev->mtu, priv->rx_frames, priv->rx_usecs); @@ -411,7 +403,7 @@ static void mlx4_en_set_default_moderation(struct mlx4_en_priv *priv) priv->pkt_rate_high = MLX4_EN_RX_RATE_HIGH; priv->rx_usecs_high = MLX4_EN_RX_COAL_TIME_HIGH; priv->sample_interval = MLX4_EN_SAMPLE_INTERVAL; - priv->adaptive_rx_coal = mdev->profile.auto_moder; + priv->adaptive_rx_coal = 1; priv->last_moder_time = MLX4_EN_AUTO_CONF; priv->last_moder_jiffies = 0; priv->last_moder_packets = 0; diff --git a/drivers/net/mlx4/en_params.c b/drivers/net/mlx4/en_params.c index 95706ee..86f4031 100644 --- a/drivers/net/mlx4/en_params.c +++ b/drivers/net/mlx4/en_params.c @@ -71,13 +71,6 @@ MLX4_EN_PARM_INT(pfctx, 0, "Priority based Flow Control policy on TX[7:0]." MLX4_EN_PARM_INT(pfcrx, 0, "Priority based Flow Control policy on RX[7:0]." " Per priority bit mask"); -/* Interrupt moderation tunning */ -MLX4_EN_PARM_INT(rx_moder_cnt, MLX4_EN_AUTO_CONF, - "Max coalesced descriptors for Rx interrupt moderation"); -MLX4_EN_PARM_INT(rx_moder_time, MLX4_EN_AUTO_CONF, - "Timeout following last packet for Rx interrupt moderation"); -MLX4_EN_PARM_INT(auto_moder, 1, "Enable dynamic interrupt moderation"); - MLX4_EN_PARM_INT(rx_ring_num1, 0, "Number or Rx rings for port 1 (0 = #cores)"); MLX4_EN_PARM_INT(rx_ring_num2, 0, "Number or Rx rings for port 2 (0 = #cores)"); @@ -92,9 +85,6 @@ int mlx4_en_get_profile(struct mlx4_en_dev *mdev) struct mlx4_en_profile *params = &mdev->profile; int i; - params->rx_moder_cnt = min_t(int, rx_moder_cnt, MLX4_EN_AUTO_CONF); - params->rx_moder_time = min_t(int, rx_moder_time, MLX4_EN_AUTO_CONF); - params->auto_moder = auto_moder; params->rss_xor = (rss_xor != 0); params->rss_mask = rss_mask & 0x1f; params->num_lro = min_t(int, num_lro , MLX4_EN_MAX_LRO_DESCRIPTORS); diff --git a/drivers/net/mlx4/mlx4_en.h b/drivers/net/mlx4/mlx4_en.h index 45b975f..63c42e4 100644 --- a/drivers/net/mlx4/mlx4_en.h +++ b/drivers/net/mlx4/mlx4_en.h @@ -333,9 +333,6 @@ struct mlx4_en_profile { u8 rss_mask; u32 active_ports; u32 small_pkt_int; - int rx_moder_cnt; - int rx_moder_time; - int auto_moder; u8 no_reset; struct mlx4_en_port_profile prof[MLX4_MAX_PORTS + 1]; }; -- 1.5.4 From yevgenyp at mellanox.co.il Mon Dec 22 02:01:13 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 22 Dec 2008 12:01:13 +0200 Subject: [ofa-general] [PATCH 6/9] mlx4_en: Remove pauses module parameters Message-ID: <494F6569.3080105@mellanox.co.il> >From 8bd9271caddadecb6310d7306dc50075c371f8c7 Mon Sep 17 00:00:00 2001 From: Yevgeny Petrilin Date: Sun, 14 Dec 2008 11:37:01 +0200 Subject: [PATCH] mlx4_en: Remove pauses module parameters. They are controlled through Ethtool interface. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/en_params.c | 10 ++-------- 1 files changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/net/mlx4/en_params.c b/drivers/net/mlx4/en_params.c index 86f4031..047b37f 100644 --- a/drivers/net/mlx4/en_params.c +++ b/drivers/net/mlx4/en_params.c @@ -60,12 +60,6 @@ MLX4_EN_PARM_INT(num_lro, MLX4_EN_MAX_LRO_DESCRIPTORS, "Number of LRO sessions per ring or disabled (0)"); /* Priority pausing */ -MLX4_EN_PARM_INT(pptx, MLX4_EN_DEF_TX_PAUSE, - "Pause policy on TX: 0 never generate pause frames " - "1 generate pause frames according to RX buffer threshold"); -MLX4_EN_PARM_INT(pprx, MLX4_EN_DEF_RX_PAUSE, - "Pause policy on RX: 0 ignore received pause frames " - "1 respect received pause frames"); MLX4_EN_PARM_INT(pfctx, 0, "Priority based Flow Control policy on TX[7:0]." " Per priority bit mask"); MLX4_EN_PARM_INT(pfcrx, 0, "Priority based Flow Control policy on RX[7:0]." @@ -89,9 +83,9 @@ int mlx4_en_get_profile(struct mlx4_en_dev *mdev) params->rss_mask = rss_mask & 0x1f; params->num_lro = min_t(int, num_lro , MLX4_EN_MAX_LRO_DESCRIPTORS); for (i = 1; i <= MLX4_MAX_PORTS; i++) { - params->prof[i].rx_pause = pprx; + params->prof[i].rx_pause = 1; params->prof[i].rx_ppp = pfcrx; - params->prof[i].tx_pause = pptx; + params->prof[i].tx_pause = 1; params->prof[i].tx_ppp = pfctx; } if (pfcrx || pfctx) { -- 1.5.4 From yevgenyp at mellanox.co.il Mon Dec 22 02:01:23 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 22 Dec 2008 12:01:23 +0200 Subject: [ofa-general] [PATCH 7/9] mlx4_en: Always allocate RX ring for each completion vector Message-ID: <494F6573.2070405@mellanox.co.il> Removed module parameter specifying number of rings. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/en_main.c | 12 ++++-------- drivers/net/mlx4/en_params.c | 5 ----- 2 files changed, 4 insertions(+), 13 deletions(-) diff --git a/drivers/net/mlx4/en_main.c b/drivers/net/mlx4/en_main.c index 34f3a19..eda72dd 100644 --- a/drivers/net/mlx4/en_main.c +++ b/drivers/net/mlx4/en_main.c @@ -169,14 +169,10 @@ static void *mlx4_en_add(struct mlx4_dev *dev) mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_ETH) { mlx4_info(mdev, "Using %d tx rings for port:%d\n", mdev->profile.prof[i].tx_ring_num, i); - if (!mdev->profile.prof[i].rx_ring_num) { - mdev->profile.prof[i].rx_ring_num = - min_t(int, dev->caps.num_comp_vectors, MAX_RX_RINGS); - mlx4_info(mdev, "Defaulting to %d rx rings for port:%d\n", - mdev->profile.prof[i].rx_ring_num, i); - } else - mlx4_info(mdev, "Using %d rx rings for port:%d\n", - mdev->profile.prof[i].rx_ring_num, i); + mdev->profile.prof[i].rx_ring_num = + min_t(int, dev->caps.num_comp_vectors, MAX_RX_RINGS); + mlx4_info(mdev, "Defaulting to %d rx rings for port:%d\n", + mdev->profile.prof[i].rx_ring_num, i); } /* Create our own workqueue for reset/multicast tasks diff --git a/drivers/net/mlx4/en_params.c b/drivers/net/mlx4/en_params.c index 047b37f..6483ae9 100644 --- a/drivers/net/mlx4/en_params.c +++ b/drivers/net/mlx4/en_params.c @@ -65,9 +65,6 @@ MLX4_EN_PARM_INT(pfctx, 0, "Priority based Flow Control policy on TX[7:0]." MLX4_EN_PARM_INT(pfcrx, 0, "Priority based Flow Control policy on RX[7:0]." " Per priority bit mask"); -MLX4_EN_PARM_INT(rx_ring_num1, 0, "Number or Rx rings for port 1 (0 = #cores)"); -MLX4_EN_PARM_INT(rx_ring_num2, 0, "Number or Rx rings for port 2 (0 = #cores)"); - MLX4_EN_PARM_INT(tx_ring_size1, MLX4_EN_AUTO_CONF, "Tx ring size for port 1"); MLX4_EN_PARM_INT(tx_ring_size2, MLX4_EN_AUTO_CONF, "Tx ring size for port 2"); MLX4_EN_PARM_INT(rx_ring_size1, MLX4_EN_AUTO_CONF, "Rx ring size for port 1"); @@ -95,8 +92,6 @@ int mlx4_en_get_profile(struct mlx4_en_dev *mdev) params->prof[1].tx_ring_num = 1; params->prof[2].tx_ring_num = 1; } - params->prof[1].rx_ring_num = min_t(int, rx_ring_num1, MAX_RX_RINGS); - params->prof[2].rx_ring_num = min_t(int, rx_ring_num2, MAX_RX_RINGS); if (tx_ring_size1 == MLX4_EN_AUTO_CONF) tx_ring_size1 = MLX4_EN_DEF_TX_RING_SIZE; -- 1.5.4 From yevgenyp at mellanox.co.il Mon Dec 22 02:01:32 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 22 Dec 2008 12:01:32 +0200 Subject: [ofa-general] [PATCH 8/9] mlx4_en: Added "set_ringparam" Ethtool interface implementation Message-ID: <494F657C.2080100@mellanox.co.il> Now using Ethtool to determine ring sizes, removed the module parameters that controlled those values. Modifying ring size requires restart of the interface. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/en_netdev.c | 8 ++-- drivers/net/mlx4/en_params.c | 80 ++++++++++++++++++++++++++--------------- drivers/net/mlx4/mlx4_en.h | 6 +++ 3 files changed, 61 insertions(+), 33 deletions(-) diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c index 8b5e8f0..07a939a 100644 --- a/drivers/net/mlx4/en_netdev.c +++ b/drivers/net/mlx4/en_netdev.c @@ -552,7 +552,7 @@ static void mlx4_en_linkstate(struct work_struct *work) } -static int mlx4_en_start_port(struct net_device *dev) +int mlx4_en_start_port(struct net_device *dev) { struct mlx4_en_priv *priv = netdev_priv(dev); struct mlx4_en_dev *mdev = priv->mdev; @@ -707,7 +707,7 @@ cq_err: } -static void mlx4_en_stop_port(struct net_device *dev) +void mlx4_en_stop_port(struct net_device *dev) { struct mlx4_en_priv *priv = netdev_priv(dev); struct mlx4_en_dev *mdev = priv->mdev; @@ -826,7 +826,7 @@ static int mlx4_en_close(struct net_device *dev) return 0; } -static void mlx4_en_free_resources(struct mlx4_en_priv *priv) +void mlx4_en_free_resources(struct mlx4_en_priv *priv) { int i; @@ -845,7 +845,7 @@ static void mlx4_en_free_resources(struct mlx4_en_priv *priv) } } -static int mlx4_en_alloc_resources(struct mlx4_en_priv *priv) +int mlx4_en_alloc_resources(struct mlx4_en_priv *priv) { struct mlx4_en_dev *mdev = priv->mdev; struct mlx4_en_port_profile *prof = priv->prof; diff --git a/drivers/net/mlx4/en_params.c b/drivers/net/mlx4/en_params.c index 6483ae9..cfeef0f 100644 --- a/drivers/net/mlx4/en_params.c +++ b/drivers/net/mlx4/en_params.c @@ -65,12 +65,6 @@ MLX4_EN_PARM_INT(pfctx, 0, "Priority based Flow Control policy on TX[7:0]." MLX4_EN_PARM_INT(pfcrx, 0, "Priority based Flow Control policy on RX[7:0]." " Per priority bit mask"); -MLX4_EN_PARM_INT(tx_ring_size1, MLX4_EN_AUTO_CONF, "Tx ring size for port 1"); -MLX4_EN_PARM_INT(tx_ring_size2, MLX4_EN_AUTO_CONF, "Tx ring size for port 2"); -MLX4_EN_PARM_INT(rx_ring_size1, MLX4_EN_AUTO_CONF, "Rx ring size for port 1"); -MLX4_EN_PARM_INT(rx_ring_size2, MLX4_EN_AUTO_CONF, "Rx ring size for port 2"); - - int mlx4_en_get_profile(struct mlx4_en_dev *mdev) { struct mlx4_en_profile *params = &mdev->profile; @@ -84,6 +78,8 @@ int mlx4_en_get_profile(struct mlx4_en_dev *mdev) params->prof[i].rx_ppp = pfcrx; params->prof[i].tx_pause = 1; params->prof[i].tx_ppp = pfctx; + params->prof[i].tx_ring_size = MLX4_EN_DEF_TX_RING_SIZE; + params->prof[i].rx_ring_size = MLX4_EN_DEF_RX_RING_SIZE; } if (pfcrx || pfctx) { params->prof[1].tx_ring_num = MLX4_EN_TX_RING_NUM; @@ -93,29 +89,6 @@ int mlx4_en_get_profile(struct mlx4_en_dev *mdev) params->prof[2].tx_ring_num = 1; } - if (tx_ring_size1 == MLX4_EN_AUTO_CONF) - tx_ring_size1 = MLX4_EN_DEF_TX_RING_SIZE; - params->prof[1].tx_ring_size = - (tx_ring_size1 < MLX4_EN_MIN_TX_SIZE) ? - MLX4_EN_MIN_TX_SIZE : roundup_pow_of_two(tx_ring_size1); - - if (tx_ring_size2 == MLX4_EN_AUTO_CONF) - tx_ring_size2 = MLX4_EN_DEF_TX_RING_SIZE; - params->prof[2].tx_ring_size = - (tx_ring_size2 < MLX4_EN_MIN_TX_SIZE) ? - MLX4_EN_MIN_TX_SIZE : roundup_pow_of_two(tx_ring_size2); - - if (rx_ring_size1 == MLX4_EN_AUTO_CONF) - rx_ring_size1 = MLX4_EN_DEF_RX_RING_SIZE; - params->prof[1].rx_ring_size = - (rx_ring_size1 < MLX4_EN_MIN_RX_SIZE) ? - MLX4_EN_MIN_RX_SIZE : roundup_pow_of_two(rx_ring_size1); - - if (rx_ring_size2 == MLX4_EN_AUTO_CONF) - rx_ring_size2 = MLX4_EN_DEF_RX_RING_SIZE; - params->prof[2].rx_ring_size = - (rx_ring_size2 < MLX4_EN_MIN_RX_SIZE) ? - MLX4_EN_MIN_RX_SIZE : roundup_pow_of_two(rx_ring_size2); return 0; } @@ -412,6 +385,54 @@ static void mlx4_en_get_pauseparam(struct net_device *dev, pause->rx_pause = priv->prof->rx_pause; } +static int mlx4_en_set_ringparam(struct net_device *dev, + struct ethtool_ringparam *param) +{ + struct mlx4_en_priv *priv = netdev_priv(dev); + struct mlx4_en_dev *mdev = priv->mdev; + u32 rx_size, tx_size; + int port_up = 0; + int err = 0; + + if (param->rx_jumbo_pending || param->rx_mini_pending) + return -EINVAL; + + rx_size = roundup_pow_of_two(param->rx_pending); + rx_size = max_t(u32, rx_size, MLX4_EN_MIN_RX_SIZE); + tx_size = roundup_pow_of_two(param->tx_pending); + tx_size = max_t(u32, tx_size, MLX4_EN_MIN_TX_SIZE); + + if (rx_size == priv->prof->rx_ring_size && + tx_size == priv->prof->tx_ring_size) + return 0; + + mutex_lock(&mdev->state_lock); + if (priv->port_up) { + port_up = 1; + mlx4_en_stop_port(dev); + } + + mlx4_en_free_resources(priv); + + priv->prof->tx_ring_size = tx_size; + priv->prof->rx_ring_size = rx_size; + + err = mlx4_en_alloc_resources(priv); + if (err) { + mlx4_err(mdev, "Failed reallocating port resources\n"); + goto out; + } + if (port_up) { + err = mlx4_en_start_port(dev); + if (err) + mlx4_err(mdev, "Failed starting port\n"); + } + +out: + mutex_unlock(&mdev->state_lock); + return err; +} + static void mlx4_en_get_ringparam(struct net_device *dev, struct ethtool_ringparam *param) { @@ -451,6 +472,7 @@ const struct ethtool_ops mlx4_en_ethtool_ops = { .get_pauseparam = mlx4_en_get_pauseparam, .set_pauseparam = mlx4_en_set_pauseparam, .get_ringparam = mlx4_en_get_ringparam, + .set_ringparam = mlx4_en_set_ringparam, .get_flags = ethtool_op_get_flags, .set_flags = ethtool_op_set_flags, }; diff --git a/drivers/net/mlx4/mlx4_en.h b/drivers/net/mlx4/mlx4_en.h index 63c42e4..76c9ad3 100644 --- a/drivers/net/mlx4/mlx4_en.h +++ b/drivers/net/mlx4/mlx4_en.h @@ -489,6 +489,12 @@ void mlx4_en_destroy_netdev(struct net_device *dev); int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port, struct mlx4_en_port_profile *prof); +int mlx4_en_start_port(struct net_device *dev); +void mlx4_en_stop_port(struct net_device *dev); + +void mlx4_en_free_resources(struct mlx4_en_priv *priv); +int mlx4_en_alloc_resources(struct mlx4_en_priv *priv); + int mlx4_en_get_profile(struct mlx4_en_dev *mdev); int mlx4_en_create_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq, -- 1.5.4 From yevgenyp at mellanox.co.il Mon Dec 22 02:01:40 2008 From: yevgenyp at mellanox.co.il (Yevgeny Petrilin) Date: Mon, 22 Dec 2008 12:01:40 +0200 Subject: [ofa-general] [PATCH 9/9] mlx4_en: Multi queue support Message-ID: <494F6584.2030304@mellanox.co.il> Added a function that performs hashing on the TX traffic. The hashing is only done for TCP or UDP packets, all other packets are sent to a default queue. We use an indirection table with an entry for each hash result. For each entry in the table, we hold statistics regarding the stream that corresponds to that entry. Packets are then directed to a TX queue according to stream's pattern. A ring is opened for each queue. Signed-off-by: Yevgeny Petrilin --- drivers/net/mlx4/en_netdev.c | 16 +++++++++- drivers/net/mlx4/en_params.c | 9 +---- drivers/net/mlx4/en_tx.c | 64 ++++++++++++++++++++++++++++++++--------- drivers/net/mlx4/mlx4_en.h | 17 ++++++++++- 4 files changed, 81 insertions(+), 25 deletions(-) diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c index 07a939a..a08f28a 100644 --- a/drivers/net/mlx4/en_netdev.c +++ b/drivers/net/mlx4/en_netdev.c @@ -645,6 +645,16 @@ int mlx4_en_start_port(struct net_device *dev) ++tx_index; } + for (i = 0; i < MLX4_EN_TX_HASH_SIZE; i++) { + memset(&priv->tx_hash[i], 0, sizeof(struct mlx4_en_tx_hash_entry)); + /* + * Initially, all streams are assigned to the rings + * that should handle the small packages streams, (the lower ring + * indixes) then moved according the stream charasteristics. + */ + priv->tx_hash[i].ring = i & (MLX4_EN_NUM_HASH_RINGS / 2 - 1); + } + /* Configure port */ err = mlx4_SET_PORT_general(mdev->dev, priv->port, priv->rx_skb_size + ETH_FCS_LEN, @@ -953,7 +963,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port, int i; int err; - dev = alloc_etherdev(sizeof(struct mlx4_en_priv)); + dev = alloc_etherdev_mq(sizeof(struct mlx4_en_priv), prof->tx_ring_num); if (dev == NULL) { mlx4_err(mdev, "Net device allocation failed\n"); return -ENOMEM; @@ -1016,7 +1026,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port, priv->allocated = 1; /* Populate Tx priority mappings */ - mlx4_en_set_prio_map(priv, priv->tx_prio_map, prof->tx_ring_num); + mlx4_en_set_prio_map(priv, priv->tx_prio_map, + prof->tx_ring_num - MLX4_EN_NUM_HASH_RINGS); /* * Initialize netdev entry points @@ -1025,6 +1036,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port, dev->open = &mlx4_en_open; dev->stop = &mlx4_en_close; dev->hard_start_xmit = &mlx4_en_xmit; + dev->select_queue = &mlx4_en_select_queue; dev->get_stats = &mlx4_en_get_stats; dev->set_multicast_list = &mlx4_en_set_multicast; dev->set_mac_address = &mlx4_en_set_mac; diff --git a/drivers/net/mlx4/en_params.c b/drivers/net/mlx4/en_params.c index cfeef0f..e50e882 100644 --- a/drivers/net/mlx4/en_params.c +++ b/drivers/net/mlx4/en_params.c @@ -80,13 +80,8 @@ int mlx4_en_get_profile(struct mlx4_en_dev *mdev) params->prof[i].tx_ppp = pfctx; params->prof[i].tx_ring_size = MLX4_EN_DEF_TX_RING_SIZE; params->prof[i].rx_ring_size = MLX4_EN_DEF_RX_RING_SIZE; - } - if (pfcrx || pfctx) { - params->prof[1].tx_ring_num = MLX4_EN_TX_RING_NUM; - params->prof[2].tx_ring_num = MLX4_EN_TX_RING_NUM; - } else { - params->prof[1].tx_ring_num = 1; - params->prof[2].tx_ring_num = 1; + params->prof[i].tx_ring_num = MLX4_EN_NUM_HASH_RINGS + 1 + + (!!pfcrx) * MLX4_EN_NUM_PPP_RINGS; } return 0; diff --git a/drivers/net/mlx4/en_tx.c b/drivers/net/mlx4/en_tx.c index ff4d752..2b8cc17 100644 --- a/drivers/net/mlx4/en_tx.c +++ b/drivers/net/mlx4/en_tx.c @@ -297,7 +297,7 @@ void mlx4_en_set_prio_map(struct mlx4_en_priv *priv, u16 *prio_map, u32 ring_num int block = 8 / ring_num; int extra = 8 - (block * ring_num); int num = 0; - u16 ring = 1; + u16 ring = MLX4_EN_NUM_HASH_RINGS + 1; int prio; if (ring_num == 1) { @@ -392,7 +392,7 @@ static void mlx4_en_process_tx_cq(struct net_device *dev, struct mlx4_en_cq *cq) * transmission on that ring would stop the queue. */ ring->blocked = 0; - netif_wake_queue(dev); + netif_tx_wake_queue(netdev_get_tx_queue(dev, cq->ring)); priv->port_stats.wake_queue++; } } @@ -612,21 +612,55 @@ static void build_inline_wqe(struct mlx4_en_tx_desc *tx_desc, struct sk_buff *sk tx_desc->ctrl.fence_size = (real_size / 16) & 0x3f; } -static int get_vlan_info(struct mlx4_en_priv *priv, struct sk_buff *skb, - u16 *vlan_tag) +u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb) { - int tx_ind; + struct mlx4_en_priv *priv = netdev_priv(dev); + u16 vlan_tag = 0; + u16 tx_ind = 0; + struct tcphdr *th = tcp_hdr(skb); + struct iphdr *iph = ip_hdr(skb); + struct mlx4_en_tx_hash_entry *entry; + u32 hash_index; /* Obtain VLAN information if present */ if (priv->vlgrp && vlan_tx_tag_present(skb)) { - *vlan_tag = vlan_tx_tag_get(skb); + vlan_tag = vlan_tx_tag_get(skb); /* Set the Tx ring to use according to vlan priority */ - tx_ind = priv->tx_prio_map[*vlan_tag >> 13]; - } else { - *vlan_tag = 0; - tx_ind = 0; + tx_ind = priv->tx_prio_map[vlan_tag >> 13]; + if (tx_ind) + return tx_ind; + } + + /* Hashing is only done for TCP/IP or UDP/IP packets */ + if (be16_to_cpu(skb->protocol) != ETH_P_IP) + return MLX4_EN_NUM_HASH_RINGS; + + hash_index = be32_to_cpu(iph->daddr) & MLX4_EN_TX_HASH_MASK; + switch (iph->protocol) { + case 17: + break; + case 6: + hash_index = (hash_index ^ be16_to_cpu(th->dest ^ th->source)) & + MLX4_EN_TX_HASH_MASK; + break; + default: + return MLX4_EN_NUM_HASH_RINGS; + } + + entry = &priv->tx_hash[hash_index]; + if (skb->len > MLX4_EN_SMALL_PKT_SIZE) + entry->big_pkts++; + else + entry->small_pkts++; + + if (unlikely(!(++entry->cnt))) { + tx_ind = hash_index & (MLX4_EN_NUM_HASH_RINGS / 2 - 1); + if (2 * entry->big_pkts > entry->small_pkts) + tx_ind += MLX4_EN_NUM_HASH_RINGS / 2; + entry->small_pkts = entry->big_pkts = 0; + entry->ring = tx_ind; } - return tx_ind; + return entry->ring; } int mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) @@ -646,7 +680,7 @@ int mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) dma_addr_t dma; u32 index; __be32 op_own; - u16 vlan_tag; + u16 vlan_tag = 0; int i; int lso_header_size; void *fragptr; @@ -669,15 +703,17 @@ int mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } - tx_ind = get_vlan_info(priv, skb, &vlan_tag); + tx_ind = skb->queue_mapping; ring = &priv->tx_ring[tx_ind]; + if (priv->vlgrp && vlan_tx_tag_present(skb)) + vlan_tag = vlan_tx_tag_get(skb); /* Check available TXBBs And 2K spare for prefetch */ if (unlikely(((int)(ring->prod - ring->cons)) > ring->size - HEADROOM - MAX_DESC_TXBBS)) { /* every full Tx ring stops queue. * TODO: implement multi-queue support (per-queue stop) */ - netif_stop_queue(dev); + netif_tx_stop_queue(netdev_get_tx_queue(dev, tx_ind)); ring->blocked = 1; priv->port_stats.queue_stopped++; diff --git a/drivers/net/mlx4/mlx4_en.h b/drivers/net/mlx4/mlx4_en.h index 76c9ad3..f0c5936 100644 --- a/drivers/net/mlx4/mlx4_en.h +++ b/drivers/net/mlx4/mlx4_en.h @@ -119,8 +119,12 @@ enum { #define MLX4_EN_MIN_RX_SIZE (MLX4_EN_ALLOC_SIZE / SMP_CACHE_BYTES) #define MLX4_EN_MIN_TX_SIZE (4096 / TXBB_SIZE) -#define MLX4_EN_TX_RING_NUM 9 -#define MLX4_EN_DEF_TX_RING_SIZE 1024 +#define MLX4_EN_SMALL_PKT_SIZE 128 +#define MLX4_EN_TX_HASH_SIZE 256 +#define MLX4_EN_TX_HASH_MASK (MLX4_EN_TX_HASH_SIZE - 1) +#define MLX4_EN_NUM_HASH_RINGS 8 +#define MLX4_EN_NUM_PPP_RINGS 8 +#define MLX4_EN_DEF_TX_RING_SIZE 512 #define MLX4_EN_DEF_RX_RING_SIZE 1024 /* Target number of bytes to coalesce with interrupt moderation */ @@ -416,6 +420,13 @@ struct mlx4_en_frag_info { }; +struct mlx4_en_tx_hash_entry { + u8 cnt; + unsigned int small_pkts; + unsigned int big_pkts; + u16 ring; +}; + struct mlx4_en_priv { struct mlx4_en_dev *mdev; struct mlx4_en_port_profile *prof; @@ -471,6 +482,7 @@ struct mlx4_en_priv { struct mlx4_en_rx_ring rx_ring[MAX_RX_RINGS]; struct mlx4_en_cq tx_cq[MAX_TX_RINGS]; struct mlx4_en_cq rx_cq[MAX_RX_RINGS]; + struct mlx4_en_tx_hash_entry tx_hash[MLX4_EN_TX_HASH_SIZE]; struct work_struct mcast_task; struct work_struct mac_task; struct delayed_work refill_task; @@ -508,6 +520,7 @@ int mlx4_en_arm_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq); void mlx4_en_poll_tx_cq(unsigned long data); void mlx4_en_tx_irq(struct mlx4_cq *mcq); int mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev); +u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb); int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv, struct mlx4_en_tx_ring *ring, u32 size, u16 stride); -- 1.5.4 From vlad at lists.openfabrics.org Mon Dec 22 03:33:59 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 22 Dec 2008 03:33:59 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081222-0200 daily build status Message-ID: <20081222113359.93162E603C4@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From ronli.voltaire at gmail.com Mon Dec 22 03:47:28 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Mon, 22 Dec 2008 13:47:28 +0200 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: <494F4C2B.50109@mellanox.co.il> References: <4907348E.7060508@mellanox.co.il> <3b5e77ad0812210414n73765c2iccf2fc98c492c07c@mail.gmail.com> <3b5e77ad0812210821k66346d5frb036bc6bc9e8894c@mail.gmail.com> <494F4C2B.50109@mellanox.co.il> Message-ID: <3b5e77ad0812220347x64bd8ebfod4f31c9f7b80abb9@mail.gmail.com> Is it possible that only one vector will be supported by the system, yielding the function pci_enable_msix to return 1? If so, the stack will not work properly when using only 1 vector. Ron On Mon, Dec 22, 2008 at 10:13 AM, Yevgeny Petrilin wrote: > Roland, > >> We encountered a problem that when a machine didn't support the >> required number of vectors (nvec), >> instead of trying to get 2 vectors like in the previous version, it >> didn't use MSI-X at all - causing a major performance degradation. >> Maybe in a case of failure we should try lowering the number of >> vectors to 2 (like in the previous version) or the return value of >> pci_enable_msix and goto no_msi only in case of a second failure. >> >> Ron > I agree with Ron on this issue, trying to get the number of vectors that was returned by > pci_enable_msix seems to be the better solution in case of failure. > > As for the rest of the patch, I didn't find any major problems, except for a small typo: >> + mdev->profile.prof[i].rx_ring_num = dev->caps.num_comp_vectors;; > The driver worked fine with the new addition. > > Thanks, > Yevgeny > > > From rpearson at systemfabricworks.com Mon Dec 22 07:07:34 2008 From: rpearson at systemfabricworks.com (Robert Pearson) Date: Mon, 22 Dec 2008 09:07:34 -0600 Subject: [ofa-general] RE: [PATCH] opensm/osm_mesh: simplify mesh node links and ports allocation In-Reply-To: <20081222032144.GN28259@sashak.voltaire.com> References: <20081222032144.GN28259@sashak.voltaire.com> Message-ID: <007a01c96447$05c32320$11496960$@com> Thanks Sasha for your support and help. - Bob -----Original Message----- From: Sasha Khapyorsky [mailto:sashak at voltaire.com] Sent: Sunday, December 21, 2008 9:22 PM To: OpenIB Cc: Robert Pearson Subject: [PATCH] opensm/osm_mesh: simplify mesh node links and ports allocation Simplify mesh node links and ports allocation - use zero sized arrays and alloc node and link structures as single memory chunk. Signed-off-by: Sasha Khapyorsky --- opensm/include/opensm/osm_mesh.h | 8 ++++---- opensm/opensm/osm_mesh.c | 24 ++++++------------------ 2 files changed, 10 insertions(+), 22 deletions(-) diff --git a/opensm/include/opensm/osm_mesh.h b/opensm/include/opensm/osm_mesh.h index 9e23498..173fa86 100644 --- a/opensm/include/opensm/osm_mesh.h +++ b/opensm/include/opensm/osm_mesh.h @@ -48,17 +48,15 @@ struct _switch; typedef struct _link { int switch_id; int link_id; - int *ports; - int num_ports; int next_port; + int num_ports; + int ports[0]; } link_t; /* * per switch node mesh info */ typedef struct _mesh_node { - unsigned int num_links; /* number of 'links' to adjacent switches */ - link_t **links; /* per link information */ int *axes; /* used to hold and reorder assigned axes */ int *coord; /* mesh coordinates of switch */ int **matrix; /* distances between adjacant switches */ @@ -67,6 +65,8 @@ typedef struct _mesh_node { int dimension; /* apparent dimension of mesh around node */ int temp; /* temporary holder for distance info */ int type; /* index of node type in mesh_info array */ + unsigned int num_links; /* number of 'links' to adjacent switches */ + link_t *links[0]; /* per link information */ } mesh_node_t; void osm_mesh_node_delete(struct _lash *p_lash, struct _switch *sw); diff --git a/opensm/opensm/osm_mesh.c b/opensm/opensm/osm_mesh.c index 9e3e9de..263d29e 100644 --- a/opensm/opensm/osm_mesh.c +++ b/opensm/opensm/osm_mesh.c @@ -1253,16 +1253,9 @@ void osm_mesh_node_delete(lash_t *p_lash, switch_t *sw) OSM_LOG_ENTER(p_log); if (node) { - if (node->links) { - for (i = 0; i < num_ports; i++) { - if (node->links[i]) { - if (node->links[i]->ports) - free(node->links[i]->ports); - free(node->links[i]); - } - } - free(node->links); - } + for (i = 0; i < num_ports; i++) + if (node->links[i]) + free(node->links[i]); if (node->poly) free(node->poly); @@ -1301,17 +1294,12 @@ int osm_mesh_node_create(lash_t *p_lash, switch_t *sw) OSM_LOG_ENTER(p_log); - if (!(node = sw->node = calloc(1, sizeof(mesh_node_t)))) + if (!(node = sw->node = calloc(1, sizeof(mesh_node_t) + num_ports * sizeof(link_t *)))) goto err; - if (!(node->links = calloc(num_ports, sizeof(link_t *)))) - goto err; - - for (i = 0; i < num_ports; i++) { - if (!(node->links[i] = calloc(1, sizeof(link_t))) || - !(node->links[i]->ports = calloc(num_ports, sizeof(int)))) + for (i = 0; i < num_ports; i++) + if (!(node->links[i] = calloc(1, sizeof(link_t) + num_ports * sizeof(int)))) goto err; - } if (!(node->axes = calloc(num_ports, sizeof(int)))) goto err; -- 1.6.0.4.766.g6fc4a From rdreier at cisco.com Mon Dec 22 07:16:15 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 Dec 2008 07:16:15 -0800 Subject: [ofa-general][PATCH 1/3]mlx4: Multiple completion vectors support In-Reply-To: <3b5e77ad0812220347x64bd8ebfod4f31c9f7b80abb9@mail.gmail.com> (Ron Livne's message of "Mon, 22 Dec 2008 13:47:28 +0200") References: <4907348E.7060508@mellanox.co.il> <3b5e77ad0812210414n73765c2iccf2fc98c492c07c@mail.gmail.com> <3b5e77ad0812210821k66346d5frb036bc6bc9e8894c@mail.gmail.com> <494F4C2B.50109@mellanox.co.il> <3b5e77ad0812220347x64bd8ebfod4f31c9f7b80abb9@mail.gmail.com> Message-ID: > Is it possible that only one vector will be supported by the system, > yielding the function pci_enable_msix to return 1? Possible (though very unlikely I guess). Anyway I handled this in the latest patch: mlx4_core: Add support for multiple completion event vectors When using MSI-X mode, create a completion event queue for each CPU. Report the number of completion EQs in a new struct mlx4_caps member, num_comp_vectors, and extend the mlx4_cq_alloc() interface with a vector parameter so that consumers can specify which completion EQ should be used to report events for the CQ being created. Signed-off-by: Yevgeny Petrilin Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/infiniband/hw/mlx4/main.c | 2 +- drivers/net/mlx4/cq.c | 11 +++- drivers/net/mlx4/en_cq.c | 9 ++- drivers/net/mlx4/en_main.c | 4 +- drivers/net/mlx4/eq.c | 117 ++++++++++++++++++++++++++++--------- drivers/net/mlx4/main.c | 53 ++++++++++++----- drivers/net/mlx4/mlx4.h | 14 ++--- drivers/net/mlx4/profile.c | 4 +- include/linux/mlx4/device.h | 4 +- 10 files changed, 157 insertions(+), 63 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 1830849..2198753 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -222,7 +222,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector } err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar, - cq->db.dma, &cq->mcq, 0); + cq->db.dma, &cq->mcq, vector, 0); if (err) goto err_dbmap; diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 2e80f8f..dcefe1f 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -578,7 +578,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB) ibdev->num_ports++; ibdev->ib_dev.phys_port_cnt = ibdev->num_ports; - ibdev->ib_dev.num_comp_vectors = 1; + ibdev->ib_dev.num_comp_vectors = dev->caps.num_comp_vectors; ibdev->ib_dev.dma_device = &dev->pdev->dev; ibdev->ib_dev.uverbs_abi_ver = MLX4_IB_UVERBS_ABI_VERSION; diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c index b7ad282..ac57b6a 100644 --- a/drivers/net/mlx4/cq.c +++ b/drivers/net/mlx4/cq.c @@ -189,7 +189,7 @@ EXPORT_SYMBOL_GPL(mlx4_cq_resize); int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, - int collapsed) + unsigned vector, int collapsed) { struct mlx4_priv *priv = mlx4_priv(dev); struct mlx4_cq_table *cq_table = &priv->cq_table; @@ -198,6 +198,11 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, u64 mtt_addr; int err; + if (vector >= dev->caps.num_comp_vectors) + return -EINVAL; + + cq->vector = vector; + cq->cqn = mlx4_bitmap_alloc(&cq_table->bitmap); if (cq->cqn == -1) return -ENOMEM; @@ -227,7 +232,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, cq_context->flags = cpu_to_be32(!!collapsed << 18); cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index); - cq_context->comp_eqn = priv->eq_table.eq[MLX4_EQ_COMP].eqn; + cq_context->comp_eqn = priv->eq_table.eq[vector].eqn; cq_context->log_page_size = mtt->page_shift - MLX4_ICM_PAGE_SHIFT; mtt_addr = mlx4_mtt_addr(dev, mtt); @@ -276,7 +281,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq) if (err) mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn); - synchronize_irq(priv->eq_table.eq[MLX4_EQ_COMP].irq); + synchronize_irq(priv->eq_table.eq[cq->vector].irq); spin_lock_irq(&cq_table->lock); radix_tree_delete(&cq_table->tree, cq->cqn); diff --git a/drivers/net/mlx4/en_cq.c b/drivers/net/mlx4/en_cq.c index 1368a80..674f836 100644 --- a/drivers/net/mlx4/en_cq.c +++ b/drivers/net/mlx4/en_cq.c @@ -51,10 +51,13 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, int err; cq->size = entries; - if (mode == RX) + if (mode == RX) { cq->buf_size = cq->size * sizeof(struct mlx4_cqe); - else + cq->vector = ring % mdev->dev->caps.num_comp_vectors; + } else { cq->buf_size = sizeof(struct mlx4_cqe); + cq->vector = 0; + } cq->ring = ring; cq->is_tx = mode; @@ -86,7 +89,7 @@ int mlx4_en_activate_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq *cq) memset(cq->buf, 0, cq->buf_size); err = mlx4_cq_alloc(mdev->dev, cq->size, &cq->wqres.mtt, &mdev->priv_uar, - cq->wqres.db.dma, &cq->mcq, cq->is_tx); + cq->wqres.db.dma, &cq->mcq, cq->vector, cq->is_tx); if (err) return err; diff --git a/drivers/net/mlx4/en_main.c b/drivers/net/mlx4/en_main.c index 4b9794e..c1c0585 100644 --- a/drivers/net/mlx4/en_main.c +++ b/drivers/net/mlx4/en_main.c @@ -170,9 +170,9 @@ static void *mlx4_en_add(struct mlx4_dev *dev) mlx4_info(mdev, "Using %d tx rings for port:%d\n", mdev->profile.prof[i].tx_ring_num, i); if (!mdev->profile.prof[i].rx_ring_num) { - mdev->profile.prof[i].rx_ring_num = 1; + mdev->profile.prof[i].rx_ring_num = dev->caps.num_comp_vectors; mlx4_info(mdev, "Defaulting to %d rx rings for port:%d\n", - 1, i); + mdev->profile.prof[i].rx_ring_num, i); } else mlx4_info(mdev, "Using %d rx rings for port:%d\n", mdev->profile.prof[i].rx_ring_num, i); diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c index de16933..5d867eb 100644 --- a/drivers/net/mlx4/eq.c +++ b/drivers/net/mlx4/eq.c @@ -266,7 +266,7 @@ static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr) writel(priv->eq_table.clr_mask, priv->eq_table.clr_int); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) work |= mlx4_eq_int(dev, &priv->eq_table.eq[i]); return IRQ_RETVAL(work); @@ -304,6 +304,17 @@ static int mlx4_HW2SW_EQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox, MLX4_CMD_TIME_CLASS_A); } +static int mlx4_num_eq_uar(struct mlx4_dev *dev) +{ + /* + * Each UAR holds 4 EQ doorbells. To figure out how many UARs + * we need to map, take the difference of highest index and + * the lowest index we'll use and add 1. + */ + return (dev->caps.num_comp_vectors + 1 + dev->caps.reserved_eqs) / 4 - + dev->caps.reserved_eqs / 4 + 1; +} + static void __iomem *mlx4_get_eq_uar(struct mlx4_dev *dev, struct mlx4_eq *eq) { struct mlx4_priv *priv = mlx4_priv(dev); @@ -483,9 +494,11 @@ static void mlx4_free_irqs(struct mlx4_dev *dev) if (eq_table->have_irq) free_irq(dev->pdev->irq, dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) if (eq_table->eq[i].have_irq) free_irq(eq_table->eq[i].irq, eq_table->eq + i); + + kfree(eq_table->irq_names); } static int mlx4_map_clr_int(struct mlx4_dev *dev) @@ -551,57 +564,93 @@ void mlx4_unmap_eq_icm(struct mlx4_dev *dev) __free_page(priv->eq_table.icm_page); } +int mlx4_alloc_eq_table(struct mlx4_dev *dev) +{ + struct mlx4_priv *priv = mlx4_priv(dev); + + priv->eq_table.eq = kcalloc(dev->caps.num_eqs - dev->caps.reserved_eqs, + sizeof *priv->eq_table.eq, GFP_KERNEL); + if (!priv->eq_table.eq) + return -ENOMEM; + + return 0; +} + +void mlx4_free_eq_table(struct mlx4_dev *dev) +{ + kfree(mlx4_priv(dev)->eq_table.eq); +} + int mlx4_init_eq_table(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); int err; int i; + priv->eq_table.uar_map = kcalloc(sizeof *priv->eq_table.uar_map, + mlx4_num_eq_uar(dev), GFP_KERNEL); + if (!priv->eq_table.uar_map) { + err = -ENOMEM; + goto err_out_free; + } + err = mlx4_bitmap_init(&priv->eq_table.bitmap, dev->caps.num_eqs, dev->caps.num_eqs - 1, dev->caps.reserved_eqs, 0); if (err) - return err; + goto err_out_free; - for (i = 0; i < ARRAY_SIZE(priv->eq_table.uar_map); ++i) + for (i = 0; i < mlx4_num_eq_uar(dev); ++i) priv->eq_table.uar_map[i] = NULL; err = mlx4_map_clr_int(dev); if (err) - goto err_out_free; + goto err_out_bitmap; priv->eq_table.clr_mask = swab32(1 << (priv->eq_table.inta_pin & 31)); priv->eq_table.clr_int = priv->clr_base + (priv->eq_table.inta_pin < 32 ? 4 : 0); - err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, - (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_COMP : 0, - &priv->eq_table.eq[MLX4_EQ_COMP]); - if (err) - goto err_out_unmap; + priv->eq_table.irq_names = kmalloc(16 * dev->caps.num_comp_vectors, GFP_KERNEL); + if (!priv->eq_table.irq_names) { + err = -ENOMEM; + goto err_out_bitmap; + } + + for (i = 0; i < dev->caps.num_comp_vectors; ++i) { + err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE, + (dev->flags & MLX4_FLAG_MSI_X) ? i : 0, + &priv->eq_table.eq[i]); + if (err) + goto err_out_unmap; + } err = mlx4_create_eq(dev, MLX4_NUM_ASYNC_EQE + MLX4_NUM_SPARE_EQE, - (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_ASYNC : 0, - &priv->eq_table.eq[MLX4_EQ_ASYNC]); + (dev->flags & MLX4_FLAG_MSI_X) ? dev->caps.num_comp_vectors : 0, + &priv->eq_table.eq[dev->caps.num_comp_vectors]); if (err) goto err_out_comp; if (dev->flags & MLX4_FLAG_MSI_X) { - static const char *eq_name[] = { - [MLX4_EQ_COMP] = DRV_NAME " (comp)", - [MLX4_EQ_ASYNC] = DRV_NAME " (async)" - }; + static const char async_eq_name[] = "mlx4-async"; + const char *eq_name; + + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) { + if (i < dev->caps.num_comp_vectors) { + snprintf(priv->eq_table.irq_names + i * 16, 16, + "mlx4-comp-%d", i); + eq_name = priv->eq_table.irq_names + i * 16; + } else + eq_name = async_eq_name; - for (i = 0; i < MLX4_NUM_EQ; ++i) { err = request_irq(priv->eq_table.eq[i].irq, - mlx4_msi_x_interrupt, - 0, eq_name[i], priv->eq_table.eq + i); + mlx4_msi_x_interrupt, 0, eq_name, + priv->eq_table.eq + i); if (err) goto err_out_async; priv->eq_table.eq[i].have_irq = 1; } - } else { err = request_irq(dev->pdev->irq, mlx4_interrupt, IRQF_SHARED, DRV_NAME, dev); @@ -612,28 +661,36 @@ int mlx4_init_eq_table(struct mlx4_dev *dev) } err = mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 0, - priv->eq_table.eq[MLX4_EQ_ASYNC].eqn); + priv->eq_table.eq[dev->caps.num_comp_vectors].eqn); if (err) mlx4_warn(dev, "MAP_EQ for async EQ %d failed (%d)\n", - priv->eq_table.eq[MLX4_EQ_ASYNC].eqn, err); + priv->eq_table.eq[dev->caps.num_comp_vectors].eqn, err); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) eq_set_ci(&priv->eq_table.eq[i], 1); return 0; err_out_async: - mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_ASYNC]); + mlx4_free_eq(dev, &priv->eq_table.eq[dev->caps.num_comp_vectors]); err_out_comp: - mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP]); + i = dev->caps.num_comp_vectors - 1; err_out_unmap: + while (i >= 0) { + mlx4_free_eq(dev, &priv->eq_table.eq[i]); + --i; + } mlx4_unmap_clr_int(dev); mlx4_free_irqs(dev); -err_out_free: +err_out_bitmap: mlx4_bitmap_cleanup(&priv->eq_table.bitmap); + +err_out_free: + kfree(priv->eq_table.uar_map); + return err; } @@ -643,18 +700,20 @@ void mlx4_cleanup_eq_table(struct mlx4_dev *dev) int i; mlx4_MAP_EQ(dev, MLX4_ASYNC_EVENT_MASK, 1, - priv->eq_table.eq[MLX4_EQ_ASYNC].eqn); + priv->eq_table.eq[dev->caps.num_comp_vectors].eqn); mlx4_free_irqs(dev); - for (i = 0; i < MLX4_NUM_EQ; ++i) + for (i = 0; i < dev->caps.num_comp_vectors + 1; ++i) mlx4_free_eq(dev, &priv->eq_table.eq[i]); mlx4_unmap_clr_int(dev); - for (i = 0; i < ARRAY_SIZE(priv->eq_table.uar_map); ++i) + for (i = 0; i < mlx4_num_eq_uar(dev); ++i) if (priv->eq_table.uar_map[i]) iounmap(priv->eq_table.uar_map[i]); mlx4_bitmap_cleanup(&priv->eq_table.bitmap); + + kfree(priv->eq_table.uar_map); } diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c index 90a0281..710c79e 100644 --- a/drivers/net/mlx4/main.c +++ b/drivers/net/mlx4/main.c @@ -421,9 +421,7 @@ static int mlx4_init_cmpt_table(struct mlx4_dev *dev, u64 cmpt_base, ((u64) (MLX4_CMPT_TYPE_EQ * cmpt_entry_sz) << MLX4_CMPT_SHIFT), cmpt_entry_sz, - roundup_pow_of_two(MLX4_NUM_EQ + - dev->caps.reserved_eqs), - MLX4_NUM_EQ + dev->caps.reserved_eqs, 0, 0); + dev->caps.num_eqs, dev->caps.num_eqs, 0, 0); if (err) goto err_cq; @@ -810,12 +808,12 @@ static int mlx4_setup_hca(struct mlx4_dev *dev) if (dev->flags & MLX4_FLAG_MSI_X) { mlx4_warn(dev, "NOP command failed to generate MSI-X " "interrupt IRQ %d).\n", - priv->eq_table.eq[MLX4_EQ_ASYNC].irq); + priv->eq_table.eq[dev->caps.num_comp_vectors].irq); mlx4_warn(dev, "Trying again without MSI-X.\n"); } else { mlx4_err(dev, "NOP command failed to generate interrupt " "(IRQ %d), aborting.\n", - priv->eq_table.eq[MLX4_EQ_ASYNC].irq); + priv->eq_table.eq[dev->caps.num_comp_vectors].irq); mlx4_err(dev, "BIOS or ACPI interrupt routing problem?\n"); } @@ -908,31 +906,50 @@ err_uar_table_free: static void mlx4_enable_msi_x(struct mlx4_dev *dev) { struct mlx4_priv *priv = mlx4_priv(dev); - struct msix_entry entries[MLX4_NUM_EQ]; + struct msix_entry *entries; + int nreq; int err; int i; if (msi_x) { - for (i = 0; i < MLX4_NUM_EQ; ++i) + nreq = min(dev->caps.num_eqs - dev->caps.reserved_eqs, + num_possible_cpus() + 1); + entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL); + if (!entries) + goto no_msi; + + for (i = 0; i < nreq; ++i) entries[i].entry = i; - err = pci_enable_msix(dev->pdev, entries, ARRAY_SIZE(entries)); + retry: + err = pci_enable_msix(dev->pdev, entries, nreq); if (err) { - if (err > 0) - mlx4_info(dev, "Only %d MSI-X vectors available, " - "not using MSI-X\n", err); + /* Try again if at least 2 vectors are available */ + if (err > 1) { + mlx4_info(dev, "Requested %d vectors, " + "but only %d MSI-X vectors available, " + "trying again\n", nreq, err); + nreq = err; + goto retry; + } + goto no_msi; } - for (i = 0; i < MLX4_NUM_EQ; ++i) + dev->caps.num_comp_vectors = nreq - 1; + for (i = 0; i < nreq; ++i) priv->eq_table.eq[i].irq = entries[i].vector; dev->flags |= MLX4_FLAG_MSI_X; + + kfree(entries); return; } no_msi: - for (i = 0; i < MLX4_NUM_EQ; ++i) + dev->caps.num_comp_vectors = 1; + + for (i = 0; i < 2; ++i) priv->eq_table.eq[i].irq = dev->pdev->irq; } @@ -1074,6 +1091,10 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) if (err) goto err_cmd; + err = mlx4_alloc_eq_table(dev); + if (err) + goto err_close; + mlx4_enable_msi_x(dev); err = mlx4_setup_hca(dev); @@ -1084,7 +1105,7 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) } if (err) - goto err_close; + goto err_free_eq; for (port = 1; port <= dev->caps.num_ports; port++) { err = mlx4_init_port_info(dev, port); @@ -1114,6 +1135,9 @@ err_port: mlx4_cleanup_pd_table(dev); mlx4_cleanup_uar_table(dev); +err_free_eq: + mlx4_free_eq_table(dev); + err_close: if (dev->flags & MLX4_FLAG_MSI_X) pci_disable_msix(pdev); @@ -1177,6 +1201,7 @@ static void mlx4_remove_one(struct pci_dev *pdev) iounmap(priv->kar); mlx4_uar_free(dev, &priv->driver_uar); mlx4_cleanup_uar_table(dev); + mlx4_free_eq_table(dev); mlx4_close_hca(dev); mlx4_cmd_cleanup(dev); diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index 34c909d..e0213ba 100644 --- a/drivers/net/mlx4/mlx4.h +++ b/drivers/net/mlx4/mlx4.h @@ -63,12 +63,6 @@ enum { }; enum { - MLX4_EQ_ASYNC, - MLX4_EQ_COMP, - MLX4_NUM_EQ -}; - -enum { MLX4_NUM_PDS = 1 << 15 }; @@ -205,10 +199,11 @@ struct mlx4_cq_table { struct mlx4_eq_table { struct mlx4_bitmap bitmap; + char *irq_names; void __iomem *clr_int; - void __iomem *uar_map[(MLX4_NUM_EQ + 6) / 4]; + void __iomem **uar_map; u32 clr_mask; - struct mlx4_eq eq[MLX4_NUM_EQ]; + struct mlx4_eq *eq; u64 icm_virt; struct page *icm_page; dma_addr_t icm_dma; @@ -328,6 +323,9 @@ void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap); int mlx4_reset(struct mlx4_dev *dev); +int mlx4_alloc_eq_table(struct mlx4_dev *dev); +void mlx4_free_eq_table(struct mlx4_dev *dev); + int mlx4_init_pd_table(struct mlx4_dev *dev); int mlx4_init_uar_table(struct mlx4_dev *dev); int mlx4_init_mr_table(struct mlx4_dev *dev); diff --git a/drivers/net/mlx4/profile.c b/drivers/net/mlx4/profile.c index 9ca42b2..919fb9e 100644 --- a/drivers/net/mlx4/profile.c +++ b/drivers/net/mlx4/profile.c @@ -107,7 +107,9 @@ u64 mlx4_make_profile(struct mlx4_dev *dev, profile[MLX4_RES_AUXC].num = request->num_qp; profile[MLX4_RES_SRQ].num = request->num_srq; profile[MLX4_RES_CQ].num = request->num_cq; - profile[MLX4_RES_EQ].num = MLX4_NUM_EQ + dev_cap->reserved_eqs; + profile[MLX4_RES_EQ].num = min(dev_cap->max_eqs, + dev_cap->reserved_eqs + + num_possible_cpus() + 1); profile[MLX4_RES_DMPT].num = request->num_mpt; profile[MLX4_RES_CMPT].num = MLX4_NUM_CMPTS; profile[MLX4_RES_MTT].num = request->num_mtt; diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index 371086f..8f659cc 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -206,6 +206,7 @@ struct mlx4_caps { int reserved_cqs; int num_eqs; int reserved_eqs; + int num_comp_vectors; int num_mpts; int num_mtt_segs; int fmr_reserved_mtts; @@ -328,6 +329,7 @@ struct mlx4_cq { int arm_sn; int cqn; + unsigned vector; atomic_t refcount; struct completion free; @@ -437,7 +439,7 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt, struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq, - int collapsed); + unsigned vector, int collapsed); void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq); int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base); From weiny2 at llnl.gov Mon Dec 22 09:14:27 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Mon, 22 Dec 2008 09:14:27 -0800 Subject: [ofa-general] PATCH[2/6] Windows port of libibmad - dump.c In-Reply-To: References: <20081218172201.3a7bddae.weiny2@llnl.gov> <000501c96179$4b7ba020$435a180a@amr.corp.intel.com> <20081218182504.47edc451.weiny2@llnl.gov> Message-ID: <20081222091427.5ae88bbb.weiny2@llnl.gov> On Thu, 18 Dec 2008 18:55:30 -0800 "Davis, Arlin R" wrote: > > >> >Where is xdump used? > >> > >> dump.c, rpc.c, and serv.c call it. > >> > >> It looks like it was a call implemented by libibcommon, and > >I think Arlin's > >> patches remove libibcommon from being used by libibmad or the diags. > >> > > > >Did he reimplement IBWARN, IBPANIC, and all the sys_read_* > >functions? I think > >those come from ibcommon and are used by the diags. > > Yes, I did reimplement IBWARN and IBPANIC but the sys_read_* was > not used by anything in libibmad or infiniband_diags so I am > holding off on that. If we just need gid or guid I would rather > just pick that up with verbs for portability. > Ah, I missed it, sorry. Ira From rdreier at cisco.com Mon Dec 22 14:25:00 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 22 Dec 2008 14:25:00 -0800 Subject: [ofa-general] InfiniBand/RDMA merge plans for 2.6.29 Message-ID: I guess now is probably a good time to send out my plans for the 2.6.29 cycle. I'm not operating at full speed (since I'm kind of on paternity leave and kind of on holiday break), but I'll try to get through as much of my backlog as possible. I should mention that I've changed the way I manage my git tree slightly. I've started using "topic" branches to organize the patches I have queued; so for example I'll put IPoIB changes in my "ipoib" branch and ehca changes in my "ehca" branch. Then I merge everything that I consider ready for the next merge window into my "for-next" branch (which is included as part of Stephen Rothwell's linux-next tree). Any topics that I'm not sure are ready get to stay on their own branch until they are ready (or get dropped); for example, I've had an "xrc" branch for XRC work for a while now. Anyway, here are all the pending things that I'm aware of. As usual, if something isn't already in my tree and isn't listed below, I probably missed it or dropped it by mistake. Please remind me again in that case. As usual, when submitting a patch: - Give a good changelog that explains what issue your patch addresses, how you address the issue, how serious the issue is, and any other information that would be useful to someone evaluating your patch or reading it years from now. - Please make sure that you include a "Signed-off-by:" line, and put any extra junk that should not go into the final kernel log *after* the "---" line so that git tools strip it off automatically. Make the subject line be appropriate for inclusion in the kernel log as well once the leading "[PATCH ...]" stuff is stripped off. I waste a lot of time fixing patches by hand that could otherwise be spent doing something productive like watching youtube. - Run your patch through checkpatch.pl so I don't have to nag you to fix trivial issues (or spend time fixing them myself). - Read your patch over so I don't see a memory leak or deadlock as soon as I look at it. - Build your patch with sparse checking ("C=2 CF=-D__CHECK_ENDIAN__") and make sure it doesn't introduce new warnings. (A big bonus in goodwill for sending patches that fix old warnings) - Test your patch on a kernel with things like slab debugging and lockdep turned on. And while you're waiting for me to get to your patch, I sure wouldn't mind if you read and commented on someone else's patch. None of this means you shouldn't remind me about pending patches, since I often lose track of things and drop them accidentally. Core: - Multiple CQ event vector support. I don't see much progress on making this actually usable or useful, but maybe the best way to make progress on getting the correct interface for applications to actually figure out which vector to use is to merge the multiple vector support for some low-level drivers. I have the mlx4 changes merged, and I plan to try and do some analogous changes for mthca. A resurrection of the ehca multiple vector support would be welcome as well. - I plan to merge Aleksey's patches for RDMA CM IPv6 support when I have a version that applies to the current kernel. Might be a good idea for cxgb3 and nes guys to look at this and make sure that we at least don't have any kernel crashes caused if someone tries to make an IPv6 connection. ULPs: - I sincerely appreciate everyone who resent IPoIB patches recently. I will review them and apply this week most likely. HW specific: - A bunch of ipath fixes. - A bunch of nes cleanups and fixes. - A few ehca fixes and cleanups. Here are a few topics that I believe will not be ready in time for the 2.6.29 window and will need to wait for 2.6.30 at least: - Jack's XRC patch set. I think we're getting closer to converging here, but I still want to get a clean solution for the "free XRC domain with other objects still associated because multiple contexts share XRCDs" problem. Here all the patches I already have in my for-next branch: Chien Tung (2): RDMA/nes: Add loopback check to make_cm_node() RDMA/nes: Cleanup warnings Dave Olson (4): IB/ipath: Don't count IB symbol and link errors unless link is UP IB/ipath: Only do 1X workaround on rev1 chips IB/ipath: Fix spi_pioindex value IB/ipath: Add locking for interrupt use of ipath_pd contexts vs free David Disseldorp (1): IB/iser: Avoid recv buffer exhaustion caused by unexpected PDUs Faisal Latif (6): RDMA/nes: Cleanup cqp_request list usage RDMA/nes: Lock down connected_nodes list while processing it RDMA/nes: Avoid race between MPA request and reset event to rdma_cm RDMA/nes: Forward packets for a new connection with stale APBVT entry RDMA/nes: Fix TCP compliance test failures RDMA/nes: Check cqp_avail_reqs is empty after locking the list Joachim Fenkes (1): IB/ehca: Fix locking for shca_list_lock Julia Lawall (1): IB/ehca: Remove redundant test of vpage Michael Ellerman (1): IB/ipath: Fix pointer-to-pointer thinko in ipath_fs.c Ralph Campbell (3): IB/ipath: Improve UD loopback performance by allocating temp array only once IB/ipath: Fix PSN of send WQEs after an RDMA read resend IB/ipath: Check return value of dma_map_single() Roland Dreier (1): mlx4_core: Delete incorrect comment Stefan Roscher (1): IB/ehca: Replace modulus operations in flush error completion path Yevgeny Petrilin (1): mlx4_core: Add support for multiple completion event vectors From sean.hefty at intel.com Mon Dec 22 22:20:02 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 22 Dec 2008 22:20:02 -0800 Subject: [ofa-general] Re: [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: <20081221152708.GO25208@sashak.voltaire.com> References: <20081211162031.0c591f54.weiny2@llnl.gov> <20081221152708.GO25208@sashak.voltaire.com> Message-ID: <000001c964c6$7d8462a0$c4e0180a@amr.corp.intel.com> >> +void >> +print_port(ibnd_node_t *node, ibnd_port_t *port) >> +{ >> + char remote_guid_str[256]; >> + char remote_str[256]; >> + char link_str[256]; >> + char width_msg[256]; >> + char speed_msg[256]; >> + char ext_port_str[256]; >> + >> + if (!port) >> + return; >> + >> + remote_guid_str[0] = '\0'; >> + remote_str[0] = '\0'; >> + link_str[0] = '\0'; >> + width_msg[0] = '\0'; >> + speed_msg[0] = '\0'; >> + >> + if (port->remoteport) { >> + char remote_name_buf[256]; This function ends up using a lot of stack space. - Sean From sean.hefty at intel.com Mon Dec 22 22:44:35 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 22 Dec 2008 22:44:35 -0800 Subject: [ofa-general] PATCH[1/6] Windows port of libibmad - mad.h In-Reply-To: <20081221205124.GE28259@sashak.voltaire.com> References: <20081221205124.GE28259@sashak.voltaire.com> Message-ID: <000101c964c9$ebcaa460$c4e0180a@amr.corp.intel.com> >What is the purpose of those patches? RFC? Inclusion to a main stream? The goal is inclusion to the mainline, but giving a more concrete view of the changes invovled. >> diff -aur libibmad-1.2.2/include/infiniband/mad.h >libibmad/include/infiniband/mad.h >> --- libibmad-1.2.2/include/infiniband/mad.h 2008-08-31 07:15:05.000000000 - >0700 >> +++ libibmad/include/infiniband/mad.h 2008-12-17 17:02:54.873046600 -0800 >> @@ -33,8 +33,10 @@ >> #ifndef _MAD_H_ >> #define _MAD_H_ >> >> -#include >> -#include >> +/* use complib for portability */ >> +#include >> +#include >> +#include > >Currently libibmad doesn't depend from complib. It would be really nice >to not new dependencies (normally we build libibmad before complib, >which is part of OpenSM). The management stack does depend on complib, so I'm not sure that we buy much by avoiding this dependency. To avoid it, we need an alternate solution for problems that complib is alreading solving. (Arlin wrote these patches, so my list may be off or incomplete.) cl_types.h - provides the definitions for uint32_t and similar definitions cl_byteswap.h - provides ntohll typdef cl_debug - added cl_msg_out and CL_ASSERT defs >I wrote in another email. It would be nice to minimize a number of >needed changes and number of #ifdef introduced. Use of complib is probably the best alternative to minimizing #ifdefs. >If we will add "extern" keyword for exported symbols and somewhere in >windows-specific header file it will be redefined as > >#define extern __declspec(dllexport) I don't think we want to get into redefining keywords. >> -void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); >> +void _set_field(void *buf, int base_offs, ib_field_t *f, uint32_t val); > >What is the change here? Maybe whitespaces which were added/stripped by >mailer, but I don't see this. It looks like the whitespace was changed from a tab to a space, but you removed these anyway. >> +MAD_EXPORT uint32_t mad_get_field(void *buf, int base_offs, int field); >> +MAD_EXPORT void mad_set_field(void *buf, int base_offs, int field, uint32_t >val); > >Windows don't like "inline"? The compiler doesn't allow it in the header file. - Sean From tziporet at mellanox.co.il Mon Dec 22 23:42:15 2008 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Tue, 23 Dec 2008 09:42:15 +0200 Subject: [ofa-general] ConnectX FW 2.6.0 is available Message-ID: <5D49E7A8952DC44FB38C38FA0D758EAD0146618E@mtlexch01.mtl.com> People requested that I will notify on the new ConnectX FW release (2.6.0_ URL of firmware downloads homepage: http://www.mellanox.com/content/pages.php?pg=firmware_download Main changes and new features in this release include: - Support at GA-level for VPI - Support for QDR interoperability with InfiniScale IV switch platforms For the full list of features and other details, please see the Release Notes on the firmware page. Note: The following OFED 1.4 features can be activated only with FW 2.6.0: - Use the same device as one port IB and one port Eth. - Fast register MR send queue work requests. - Local DMA L_Key. - Raw Ethertype QP support (one QP per port) -- receive only. Tziporet From ronli.voltaire at gmail.com Mon Dec 22 23:54:44 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Tue, 23 Dec 2008 09:54:44 +0200 Subject: ***SPAM*** Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.29 In-Reply-To: References: Message-ID: <3b5e77ad0812222354y5d4effe8xc6750d6620e20c75@mail.gmail.com> Roland, Any chance create_qp_flags will get in? Thanks, Ron On Tue, Dec 23, 2008 at 12:25 AM, Roland Dreier wrote: > I guess now is probably a good time to send out my plans for the > 2.6.29 cycle. I'm not operating at full speed (since I'm kind of on > paternity leave and kind of on holiday break), but I'll try to get > through as much of my backlog as possible. > > I should mention that I've changed the way I manage my git tree > slightly. I've started using "topic" branches to organize the patches > I have queued; so for example I'll put IPoIB changes in my "ipoib" > branch and ehca changes in my "ehca" branch. Then I merge everything > that I consider ready for the next merge window into my "for-next" > branch (which is included as part of Stephen Rothwell's linux-next > tree). Any topics that I'm not sure are ready get to stay on their > own branch until they are ready (or get dropped); for example, I've > had an "xrc" branch for XRC work for a while now. > > Anyway, here are all the pending things that I'm aware of. As usual, > if something isn't already in my tree and isn't listed below, I > probably missed it or dropped it by mistake. Please remind me again > in that case. > > As usual, when submitting a patch: > > - Give a good changelog that explains what issue your patch > addresses, how you address the issue, how serious the issue is, and > any other information that would be useful to someone evaluating > your patch or reading it years from now. > > - Please make sure that you include a "Signed-off-by:" line, and put > any extra junk that should not go into the final kernel log *after* > the "---" line so that git tools strip it off automatically. Make > the subject line be appropriate for inclusion in the kernel log as > well once the leading "[PATCH ...]" stuff is stripped off. I waste a > lot of time fixing patches by hand that could otherwise be spent > doing something productive like watching youtube. > > - Run your patch through checkpatch.pl so I don't have to nag you to > fix trivial issues (or spend time fixing them myself). > > - Read your patch over so I don't see a memory leak or deadlock as > soon as I look at it. > > - Build your patch with sparse checking ("C=2 CF=-D__CHECK_ENDIAN__") > and make sure it doesn't introduce new warnings. (A big bonus in > goodwill for sending patches that fix old warnings) > > - Test your patch on a kernel with things like slab debugging and > lockdep turned on. > > And while you're waiting for me to get to your patch, I sure wouldn't > mind if you read and commented on someone else's patch. None of this > means you shouldn't remind me about pending patches, since I often > lose track of things and drop them accidentally. > > Core: > > - Multiple CQ event vector support. I don't see much progress on > making this actually usable or useful, but maybe the best way to > make progress on getting the correct interface for applications to > actually figure out which vector to use is to merge the multiple > vector support for some low-level drivers. I have the mlx4 changes > merged, and I plan to try and do some analogous changes for mthca. > A resurrection of the ehca multiple vector support would be welcome > as well. > > - I plan to merge Aleksey's patches for RDMA CM IPv6 support when I > have a version that applies to the current kernel. Might be a good > idea for cxgb3 and nes guys to look at this and make sure that we > at least don't have any kernel crashes caused if someone tries to > make an IPv6 connection. > > ULPs: > > - I sincerely appreciate everyone who resent IPoIB patches recently. > I will review them and apply this week most likely. > > HW specific: > > - A bunch of ipath fixes. > > - A bunch of nes cleanups and fixes. > > - A few ehca fixes and cleanups. > > Here are a few topics that I believe will not be ready in time for the > 2.6.29 window and will need to wait for 2.6.30 at least: > > - Jack's XRC patch set. I think we're getting closer to converging > here, but I still want to get a clean solution for the "free XRC > domain with other objects still associated because multiple > contexts share XRCDs" problem. > > Here all the patches I already have in my for-next branch: > > Chien Tung (2): > RDMA/nes: Add loopback check to make_cm_node() > RDMA/nes: Cleanup warnings > > Dave Olson (4): > IB/ipath: Don't count IB symbol and link errors unless link is UP > IB/ipath: Only do 1X workaround on rev1 chips > IB/ipath: Fix spi_pioindex value > IB/ipath: Add locking for interrupt use of ipath_pd contexts vs free > > David Disseldorp (1): > IB/iser: Avoid recv buffer exhaustion caused by unexpected PDUs > > Faisal Latif (6): > RDMA/nes: Cleanup cqp_request list usage > RDMA/nes: Lock down connected_nodes list while processing it > RDMA/nes: Avoid race between MPA request and reset event to rdma_cm > RDMA/nes: Forward packets for a new connection with stale APBVT entry > RDMA/nes: Fix TCP compliance test failures > RDMA/nes: Check cqp_avail_reqs is empty after locking the list > > Joachim Fenkes (1): > IB/ehca: Fix locking for shca_list_lock > > Julia Lawall (1): > IB/ehca: Remove redundant test of vpage > > Michael Ellerman (1): > IB/ipath: Fix pointer-to-pointer thinko in ipath_fs.c > > Ralph Campbell (3): > IB/ipath: Improve UD loopback performance by allocating temp array only once > IB/ipath: Fix PSN of send WQEs after an RDMA read resend > IB/ipath: Check return value of dma_map_single() > > Roland Dreier (1): > mlx4_core: Delete incorrect comment > > Stefan Roscher (1): > IB/ehca: Replace modulus operations in flush error completion path > > Yevgeny Petrilin (1): > mlx4_core: Add support for multiple completion event vectors > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From vlad at lists.openfabrics.org Tue Dec 23 03:20:47 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 23 Dec 2008 03:20:47 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081223-0200 daily build status Message-ID: <20081223112047.CDE4CE60397@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From dorfman.eli at gmail.com Tue Dec 23 04:58:39 2008 From: dorfman.eli at gmail.com (Eli Dorfman (Voltaire)) Date: Tue, 23 Dec 2008 14:58:39 +0200 Subject: [ofa-general] ***SPAM*** [PATCH] infiniband-diags Add support for PortXmitWait counter Message-ID: <4950E07F.6090104@gmail.com> Add support for PortXmitWait counter Show PortCounters::PortXmitWait when this capability is supported by the firmware. If not supported show this counter as 0. Signed-off-by: Eli Dorfman --- infiniband-diags/src/perfquery.c | 10 +++++++++- libibmad/include/infiniband/mad.h | 1 + libibmad/src/fields.c | 1 + 3 files changed, 11 insertions(+), 1 deletions(-) diff --git a/infiniband-diags/src/perfquery.c b/infiniband-diags/src/perfquery.c index 7a53e92..4166fff 100644 --- a/infiniband-diags/src/perfquery.c +++ b/infiniband-diags/src/perfquery.c @@ -68,6 +68,7 @@ struct perf_count { uint32_t rcvdata; uint32_t xmtpkts; uint32_t rcvpkts; + uint32_t xmtwait; }; struct perf_count_ext { @@ -210,6 +211,8 @@ static void aggregate_perfcounters(void) aggregate_32bit(&perf_count.xmtpkts, val); mad_decode_field(pc, IB_PC_RCV_PKTS_F, &val); aggregate_32bit(&perf_count.rcvpkts, val); + mad_decode_field(pc, IB_PC_XMT_WAIT_F, &val); + aggregate_32bit(&perf_count.xmtwait, val); } static void output_aggregate_perfcounters(ib_portid_t *portid) @@ -299,9 +302,14 @@ static void dump_perfcounters(int extended, int timeout, uint16_t cap_mask, ib_p if (extended != 1) { if (!port_performance_query(pc, portid, port, timeout)) IBERROR("perfquery"); + if (!(cap_mask & 0x1000)) { + /* if PortCounters:PortXmitWait not suppported clear this counter */ + perf_count.xmtwait = 0; + mad_encode_field(pc, IB_PC_XMT_WAIT_F, &perf_count.xmtwait); + } if (aggregate) aggregate_perfcounters(); - else + else mad_dump_perfcounters(buf, sizeof buf, pc, sizeof pc); } else { if (!(cap_mask & 0x200)) /* 1.2 errata: bit 9 is extended counter support */ diff --git a/libibmad/include/infiniband/mad.h b/libibmad/include/infiniband/mad.h index c2ad148..6c313f9 100644 --- a/libibmad/include/infiniband/mad.h +++ b/libibmad/include/infiniband/mad.h @@ -413,6 +413,7 @@ enum MAD_FIELDS { IB_PC_RCV_BYTES_F, IB_PC_XMT_PKTS_F, IB_PC_RCV_PKTS_F, + IB_PC_XMT_WAIT_F, IB_PC_LAST_F, /* diff --git a/libibmad/src/fields.c b/libibmad/src/fields.c index 6942e85..116e432 100644 --- a/libibmad/src/fields.c +++ b/libibmad/src/fields.c @@ -247,6 +247,7 @@ ib_field_t ib_mad_f [] = { [IB_PC_RCV_BYTES_F] {224, 32, "RcvData", mad_dump_uint}, [IB_PC_XMT_PKTS_F] {256, 32, "XmtPkts", mad_dump_uint}, [IB_PC_RCV_PKTS_F] {288, 32, "RcvPkts", mad_dump_uint}, + [IB_PC_XMT_WAIT_F] {320, 32, "XmtWait", mad_dump_uint}, /* * SMInfo -- 1.5.5 From rdreier at cisco.com Tue Dec 23 07:32:36 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 Dec 2008 07:32:36 -0800 Subject: [ofa-general] Re: [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: <000001c964c6$7d8462a0$c4e0180a@amr.corp.intel.com> (Sean Hefty's message of "Mon, 22 Dec 2008 22:20:02 -0800") References: <20081211162031.0c591f54.weiny2@llnl.gov> <20081221152708.GO25208@sashak.voltaire.com> <000001c964c6$7d8462a0$c4e0180a@amr.corp.intel.com> Message-ID: > This function ends up using a lot of stack space. This is userspace... are a few KB on the stack really an issue? From rdreier at cisco.com Tue Dec 23 07:33:06 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 Dec 2008 07:33:06 -0800 Subject: [ofa-general] InfiniBand/RDMA merge plans for 2.6.29 In-Reply-To: <3b5e77ad0812222354y5d4effe8xc6750d6620e20c75@mail.gmail.com> (Ron Livne's message of "Tue, 23 Dec 2008 09:54:44 +0200") References: <3b5e77ad0812222354y5d4effe8xc6750d6620e20c75@mail.gmail.com> Message-ID: > Any chance create_qp_flags will get in? Nope, since that's stuck behind XRC. From ronli.voltaire at gmail.com Tue Dec 23 08:45:48 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Tue, 23 Dec 2008 18:45:48 +0200 Subject: ***SPAM*** Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.29 In-Reply-To: References: <3b5e77ad0812222354y5d4effe8xc6750d6620e20c75@mail.gmail.com> Message-ID: <3b5e77ad0812230845w3178e803u416ecf1f68a64725@mail.gmail.com> I know it was stuck there, that's why I've sent a new version that doesn't rely on it. I've implemented the more_ops (in libibvers) in my patches - that's the only thing I needed from XRC. Ron On Tue, Dec 23, 2008 at 5:33 PM, Roland Dreier wrote: > > Any chance create_qp_flags will get in? > > Nope, since that's stuck behind XRC. > From chu11 at llnl.gov Tue Dec 23 10:29:02 2008 From: chu11 at llnl.gov (Al Chu) Date: Tue, 23 Dec 2008 10:29:02 -0800 Subject: ***SPAM*** Re: [ofa-general] [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: <20081211162031.0c591f54.weiny2@llnl.gov> References: <20081211162031.0c591f54.weiny2@llnl.gov> Message-ID: <1230056943.23747.21.camel@auk31.llnl.gov> Hey Ira, Tiny comment below: On Thu, 2008-12-11 at 16:20 -0800, Ira Weiny wrote: > >From d615162e547f3a2b2d1acd8c79c24ee691c96c95 Mon Sep 17 00:00:00 2001 > From: Ira Weiny > Date: Wed, 26 Nov 2008 12:54:47 -0800 > Subject: [PATCH] Create a new library libibnetdisc > > This encompasses the functionality of ibnetdiscover in a C library. It returns > a single "ibnd_fabric_t" object which represents the data found during the > scan. The NodeInfo, PortInfo, and SwitchInfo are preserved from the queries > made on the fabric to be used by the calling function as they see fit. > > This greatly benefits some diags like iblinkinfo.pl. This diag in particular > was re-written using this library in C and has shown an 85% speed up on a ~1000 > node cluster. > > Previous iblinkinfo.pl > real 3m35.876s > user 0m13.210s > sys 1m1.046s > > New iblinkinfotest > real 0m32.869s > user 0m0.067s > sys 0m0.140s > > Signed-off-by: Ira Weiny > --- > infiniband-diags/Makefile.am | 1 + > infiniband-diags/configure.in | 31 +- > infiniband-diags/libibnetdisc/Makefile.am | 66 ++ > .../libibnetdisc/include/infiniband/ibnetdisc.h | 276 ++++++ > infiniband-diags/libibnetdisc/libibnetdisc.ver | 9 + > infiniband-diags/libibnetdisc/man/ibnd_debug.3 | 2 + > .../libibnetdisc/man/ibnd_destroy_fabric.3 | 2 + > .../libibnetdisc/man/ibnd_discover_fabric.3 | 49 ++ > .../libibnetdisc/man/ibnd_find_node_dr.3 | 2 + > .../libibnetdisc/man/ibnd_find_node_guid.3 | 25 + > .../libibnetdisc/man/ibnd_iter_nodes.3 | 24 + > .../libibnetdisc/man/ibnd_iter_nodes_type.3 | 2 + > .../libibnetdisc/man/ibnd_linkspeed_str.3 | 2 + > .../libibnetdisc/man/ibnd_linkstate_str.3 | 2 + > .../libibnetdisc/man/ibnd_linkwidth_str.3 | 26 + > .../libibnetdisc/man/ibnd_node_type_str.3 | 2 + > .../libibnetdisc/man/ibnd_node_type_str_short.3 | 2 + > .../libibnetdisc/man/ibnd_physstate_str.3 | 2 + > .../libibnetdisc/man/ibnd_show_progress.3 | 2 + > .../libibnetdisc/man/ibnd_update_node.3 | 21 + > infiniband-diags/libibnetdisc/src/chassis.c | 818 ++++++++++++++++++ > infiniband-diags/libibnetdisc/src/chassis.h | 85 ++ > infiniband-diags/libibnetdisc/src/ibnetdisc.c | 872 ++++++++++++++++++++ > infiniband-diags/libibnetdisc/src/internal.h | 82 ++ > infiniband-diags/libibnetdisc/src/libibnetdisc.map | 27 + > .../libibnetdisc/test/iblinkinfotest.c | 395 +++++++++ > infiniband-diags/libibnetdisc/test/ibnetdisctest.c | 675 +++++++++++++++ > infiniband-diags/libibnetdisc/test/testleaks.c | 268 ++++++ > 28 files changed, 3769 insertions(+), 1 deletions(-) > create mode 100644 infiniband-diags/libibnetdisc/Makefile.am > create mode 100644 infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h > create mode 100644 infiniband-diags/libibnetdisc/libibnetdisc.ver > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_debug.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_update_node.3 > create mode 100644 infiniband-diags/libibnetdisc/src/chassis.c > create mode 100644 infiniband-diags/libibnetdisc/src/chassis.h > create mode 100644 infiniband-diags/libibnetdisc/src/ibnetdisc.c > create mode 100644 infiniband-diags/libibnetdisc/src/internal.h > create mode 100644 infiniband-diags/libibnetdisc/src/libibnetdisc.map > create mode 100644 infiniband-diags/libibnetdisc/test/iblinkinfotest.c > create mode 100644 infiniband-diags/libibnetdisc/test/ibnetdisctest.c > create mode 100644 infiniband-diags/libibnetdisc/test/testleaks.c > > diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am > index c22ba5e..8e8c3c1 100644 > --- a/infiniband-diags/Makefile.am > +++ b/infiniband-diags/Makefile.am > @@ -1,3 +1,4 @@ > +SUBDIRS = libibnetdisc > > INCLUDES = -I$(top_builddir)/include/ -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband > > diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in > index 5509fec..7c346e2 100644 > --- a/infiniband-diags/configure.in > +++ b/infiniband-diags/configure.in > @@ -145,6 +145,34 @@ IBSCRIPTPATH_TMP2="`echo $IBSCRIPTPATH_TMP1 | sed 's/^NONE/$ac_default_prefix/'` > IBSCRIPTPATH="`eval echo $IBSCRIPTPATH_TMP2`" > AC_SUBST(IBSCRIPTPATH) > > +dnl Begin libibnetdisc stuff > +AC_CHECK_HEADERS([stdint.h stdlib.h string.h syslog.h unistd.h]) > +AC_CHECK_FUNCS([strrchr strtoul strtoull]) > + > +ibnetdisc_api_version=`grep LIBVERSION $srcdir/libibnetdisc/libibnetdisc.ver | sed 's/LIBVERSION=//'` > +if test -z $ibnetdisc_api_version; then > + echo "FAILED to find $srcdir/libibnetdisc/libibnetdisc.ver" > + exit 1 > +fi > +AC_SUBST(ibnetdisc_api_version) > +AC_DEFINE_UNQUOTED(API_VERSION, > + ["$ibnetdisc_api_version"], > + [The API version of this library]) > + > +AC_MSG_CHECKING(for --enable-test-utils) > +AC_ARG_ENABLE(test-utils, > +[ --enable-test-utils build additional test utilities (default=no)], > +[case "${enableval}" in > + yes) tutils=yes ;; > + no) tutils=no ;; > + *) AC_MSG_ERROR(bad value ${enableval} for --enable-test-utils) ;; > +esac],[tutils=no]) > +AM_CONDITIONAL(ENABLE_TEST_UTILS, test x$tutils = xyes) > +AC_MSG_RESULT(${tutils=no}) > + > +dnl End libibnetdisc stuff > + > + > AC_CONFIG_FILES([\ > Makefile \ > infiniband-diags.spec \ > @@ -165,6 +193,7 @@ AC_CONFIG_FILES([\ > scripts/ibhosts \ > scripts/ibnodes \ > scripts/ibswitches \ > - scripts/ibrouters > + scripts/ibrouters \ > + libibnetdisc/Makefile > ]) > AC_OUTPUT > diff --git a/infiniband-diags/libibnetdisc/Makefile.am b/infiniband-diags/libibnetdisc/Makefile.am > new file mode 100644 > index 0000000..7b478b1 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/Makefile.am > @@ -0,0 +1,66 @@ > + > +#SUBDIRS = . > + > +INCLUDES = -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband > + > +lib_LTLIBRARIES = libibnetdisc.la > +sbin_PROGRAMS = > + > +if ENABLE_TEST_UTILS > +sbin_PROGRAMS += test/ibnetdisctest \ > + test/iblinkinfotest \ > + test/testleaks > +endif > + > +DBGFLAGS = -g > + > +if HAVE_LD_VERSION_SCRIPT > +libibnetdisc_version_script = -Wl,--version-script=$(srcdir)/src/libibnetdisc.map > +else > +libibnetdisc_version_script = > +endif > + > +libibnetdisc_la_SOURCES = src/ibnetdisc.c src/chassis.c src/chassis.h > +libibnetdisc_la_CFLAGS = -Wall $(DBGFLAGS) > +libibnetdisc_la_LDFLAGS = -version-info $(ibnetdisc_api_version) \ > + -export-dynamic $(libibnetdisc_version_script) \ > + -losmcomp -libmad > +libibnetdisc_la_DEPENDENCIES = $(srcdir)/src/libibnetdisc.map > + > +libibnetdiscincludedir = $(includedir)/infiniband > + > +test_ibnetdisctest_SOURCES = test/ibnetdisctest.c > +test_ibnetdisctest_CFLAGS = -Wall $(DBGFLAGS) > +test_ibnetdisctest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ > + -libcommon -libnetdisc > + > +test_iblinkinfotest_SOURCES = test/iblinkinfotest.c > +test_iblinkinfotest_CFLAGS = -Wall $(DBGFLAGS) > +test_iblinkinfotest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ > + -libcommon -libnetdisc > + > +test_testleaks_SOURCES = test/testleaks.c > +test_testleaks_CFLAGS = -Wall $(DBGFLAGS) > +test_testleaks_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ > + -libcommon -libnetdisc > + > +libibnetdiscinclude_HEADERS = $(srcdir)/include/infiniband/ibnetdisc.h > + > +man_MANS = man/ibnd_debug.3 \ > + man/ibnd_destroy_fabric.3 \ > + man/ibnd_discover_fabric.3 \ > + man/ibnd_find_node_dr.3 \ > + man/ibnd_find_node_guid.3 \ > + man/ibnd_iter_nodes.3 \ > + man/ibnd_iter_nodes_type.3 \ > + man/ibnd_linkspeed_str.3 \ > + man/ibnd_linkstate_str.3 \ > + man/ibnd_linkwidth_str.3 \ > + man/ibnd_node_type_str.3 \ > + man/ibnd_physstate_str.3 \ > + man/ibnd_update_node.3 \ > + man/ibnd_show_progress.3 > + > +EXTRA_DIST = libibnetdisc.spec.in libibnetdisc.spec \ > + $(srcdir)/src/libibnetdisc.map libibnetdisc.ver autogen.sh > + > diff --git a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h > new file mode 100644 > index 0000000..cdee2bd > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h > @@ -0,0 +1,276 @@ > +/* > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#ifndef _IBNETDISC_H_ > +#define _IBNETDISC_H_ > + > +#include > +#include > + > +#define MAXHOPS 63 > + > +/* HASH table defines */ > +#define HASHGUID(guid) ((uint32_t)(((uint32_t)(guid) * 101) ^ ((uint32_t)((guid) >> 32) * 103))) > +#define HTSZ 137 > + > +#define IBND_DEBUG(str, args...) \ > + if (ibdebug) printf("%s:%d; "str, __FILE__, __LINE__, ##args) > +#define IBND_ERROR(str, args...) \ > + fprintf(stderr, "%s:%d; "str, __FILE__, __LINE__, ##args) I believe the "args ..." and "##args" are only for gcc. Not sure how much this portability issue matters for OFED. Personally, I always do the: #define MYDEBUG(x) printf x MYDEBUG(("lala: %s", somestrvar)); trick. Al > +/** ========================================================================= > + * ENUM definitions > + */ > +typedef enum { > + IBND_CA_NODE = 1, > + IBND_SWITCH_NODE = 2, > + IBND_ROUTER_NODE = 3 > +} ibnd_node_type_t; > + > +typedef enum { > + IBND_LINK_DOWN = 1, > + IBND_LINK_INIT = 2, > + IBND_LINK_ARMED = 3, > + IBND_LINK_ACTIVE = 4 > +} ibnd_link_state_t; > + > +/** ========================================================================= > + * Node > + */ > +typedef struct switch_info { > + int smaenhsp0; > +} ibnd_switch_info_t; > + > +typedef struct node_info { > + int base_ver; > + int class_ver; > + int type; > + int numports; > + uint64_t sysimgguid; > + uint64_t nodeguid; > + uint64_t nodeportguid; > + uint16_t partition_cap; > + uint32_t devid; > + uint32_t revision; > + int localport; > + uint32_t vendid; > +} ibnd_node_info_t; > + > +struct ib_fabric; /* forward declare */ > +struct chassis; /* forward declare */ > +struct port; /* forward declare */ > + > +typedef struct node { > + struct node *next; /* all node list in fabric */ > + struct ib_fabric *fabric; /* the fabric node belongs to */ > + > + ib_portid_t path_portid; /* path from "from_node" */ > + int dist; /* num of hops from "from_node" */ > + int smalid; > + int smalmc; > + ibnd_switch_info_t sw_info; > + ibnd_node_info_t info; > + char nodedesc[64]; > + struct port **ports; /* in order array of port pointers */ > + /* the size of this array is info.numports + 1 */ > + /* items MAY BE NULL! (ie 0 == switches only) */ > + > + /* chassis info */ > + struct node *next_chassis_node; /* next node in ibnd_chassis_t->nodes */ > + struct chassis *chassis; /* if != NULL the chassis this node belongs to */ > + unsigned char ch_type; > + unsigned char ch_anafanum; > + unsigned char ch_slotnum; > + unsigned char ch_slot; > +} ibnd_node_t; > + > +/** ========================================================================= > + * Port > + */ > +typedef struct port_info { > + int lid; > + int smlid; > + int link_speed_supported; > + int link_speed_enabled; > + int link_speed_active; > + int link_state; > + int phys_state; > + int link_down_def_state; > + int mkey_prot_bits; > + int lmc; > + int neighbor_mtu; > + int smsl; > + int init_type; > + int vl_capability; > + int vl_high_limit; > + int vl_arb_high_cap; > + int vl_arb_low_cap; > + int init_reply; > + int mtu_cap; > + int vl_stall_count; > + int hoq_lifetime; > + int oper_vls; > + int partition_enforce_in; > + int partition_enforce_out; > + int filter_raw_in; > + int filter_raw_out; > + int mkey_violations; > + int pkey_violations; > + int qkey_violations; > + int guid_capabilities; > + int client_rereg; > + int subnet_timeout; > + int response_time_val; > + int local_phys_error; > + int overrun_error; > + int max_credit_hint; > + uint32_t link_round_trip; > + int local_port; > + int link_width_supported; > + int link_width_enabled; > + int link_width_active; > + int diag_code; > + int mkey_lease; > + uint32_t capability_mask; > + uint64_t mkey; > + uint64_t gid_prefix; > +} ibnd_port_info_t; > +typedef struct port { > + uint64_t guid; > + int portnum; > + int ext_portnum; /* optional if != 0 external port num */ > + ibnd_node_t *node; /* node this port belongs to */ > + ibnd_port_info_t info; > + struct port *remoteport; /* null if SMA, or does not exist */ > +} ibnd_port_t; > + > + > +/** ========================================================================= > + * Chassis data > + */ > +typedef struct chassis { > + struct chassis *next; > + uint64_t chassisguid; > + int chassisnum; > + > + /* generic grouping by SystemImageGUID */ > + int nodecount; > + ibnd_node_t *nodes; > + > + /* specific to voltaire type nodes */ > +#define SPINES_MAX_NUM 12 > +#define LINES_MAX_NUM 36 > + ibnd_node_t *spinenode[SPINES_MAX_NUM + 1]; > + ibnd_node_t *linenode[LINES_MAX_NUM + 1]; > +} ibnd_chassis_t; > + > +/** ========================================================================= > + * Fabric > + * Main fabric object which is returned and represents the data discovered > + */ > +typedef struct ib_fabric { > + /* the node the discover was initiated from > + * "from" parameter in ibnd_discover_fabric > + * or by default the node you ar running on > + */ > + ibnd_node_t *from_node; > + /* NULL term list of all nodes in the fabric */ > + ibnd_node_t *nodes; > + /* NULL terminated list of all chassis found in the fabric */ > + ibnd_chassis_t *chassis; > + int maxhops_discovered; > +} ibnd_fabric_t; > + > + > +/** ========================================================================= > + * Initialization (fabric operations) > + */ > +void ibnd_debug(int i); > +void ibnd_show_progress(int i); > + > +ibnd_fabric_t *ibnd_discover_fabric(char *dev_name, int dev_port, > + int timeout_ms, ib_portid_t *from, int hops); > + /** > + * dev_name: (required) local device name to use to access the fabric > + * dev_port: (required) local device port to use to access the fabric > + * timeout_ms: (required) gives the timeout for a _SINGLE_ query on > + * the fabric. So if there are mutiple nodes not > + * responding this may result in a lengthy delay. > + * from: (optional) specify the node to start scanning from. > + * If NULL start from the node we are running on. > + * hops: (optional) Specify how much of the fabric to traverse. > + * negative value == scan entire fabric > + */ > +void ibnd_destroy_fabric(ibnd_fabric_t *fabric); > + > +/** ========================================================================= > + * Node operations > + */ > +ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid); > +ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str); > +ibnd_node_t *ibnd_update_node(ibnd_node_t *node); > + > +typedef void (*ibnd_iter_node_func_t)(ibnd_node_t *node, void *user_data); > +void ibnd_iter_nodes(ibnd_fabric_t *fabric, > + ibnd_iter_node_func_t func, > + void *user_data); > +void ibnd_iter_nodes_type(ibnd_fabric_t *fabric, > + ibnd_iter_node_func_t func, > + ibnd_node_type_t node_type, > + void *user_data); > + > +/** ========================================================================= > + * Str convert functions > + */ > +char *ibnd_linkwidth_str(int link_width); > +char *ibnd_linkstate_str(int link_state); > +char *ibnd_physstate_str(int phys_state); > +const char *ibnd_node_type_str(ibnd_node_t *node); > +const char *ibnd_node_type_str_short(ibnd_node_t *node); > +char *ibnd_linkspeed_str(int link_speed, int data_rate); > + /* if data_rate == 0 use "SDR", "DDR", etc. */ > + /* if data_rate == 1 use "2.5 Gbps", "5.0 Gbps", etc. */ > + > +/** ========================================================================= > + * Chassis queries > + */ > +uint64_t ibnd_get_chassis_guid(ibnd_fabric_t *fabric, unsigned char chassisnum); > +char *ibnd_get_chassis_type(ibnd_node_t *node); > +char *ibnd_get_chassis_slot_str(ibnd_node_t *node, char *str, size_t size); > + > +int ibnd_is_xsigo_guid(uint64_t guid); > +int ibnd_is_xsigo_tca(uint64_t guid); > +int ibnd_is_xsigo_hca(uint64_t guid); > + > +#endif /* _IBNETDISC_H_ */ > diff --git a/infiniband-diags/libibnetdisc/libibnetdisc.ver b/infiniband-diags/libibnetdisc/libibnetdisc.ver > new file mode 100644 > index 0000000..a0a5f3c > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/libibnetdisc.ver > @@ -0,0 +1,9 @@ > +# In this file we track the current API version > +# of the IB net discover interface (and libraries) > +# The version is built of the following > +# tree numbers: > +# API_REV:RUNNING_REV:AGE > +# API_REV - advance on any added API > +# RUNNING_REV - advance any change to the vendor files > +# AGE - number of backward versions the API still supports > +LIBVERSION=1:0:0 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_debug.3 b/infiniband-diags/libibnetdisc/man/ibnd_debug.3 > new file mode 100644 > index 0000000..a4076fc > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_debug.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_DEBUG 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_discover_fabric.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 b/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 > new file mode 100644 > index 0000000..8fe20ae > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_DESTROY_FABRIC 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_discover_fabric.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 b/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 > new file mode 100644 > index 0000000..44d8c65 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 > @@ -0,0 +1,49 @@ > +.TH IBND_DISCOVER_FABRIC 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.SH "NAME" > +ibnd_discover_fabric, ibnd_destroy_fabric, ibnd_debug ibnd_show_progress \- initialize ibnetdiscover library. > +.SH "SYNOPSIS" > +.nf > +.B #include > +.sp > +.BI "ibnd_fabric_t *ibnd_discover_fabric(char *dev_name, int dev_port, int timeout_ms, ib_portid_t *from, int hops)" > +.BI "void ibnd_destroy_fabric(ibnd_fabric_t *fabric)" > +.BI "void ibnd_debug(int i)" > +.BI "void ibnd_show_progress(int i)" > + > + > +.SH "DESCRIPTION" > +.B ibnd_discover_fabric() > +Discover the fabric connected to the port specified by dev_name and dev_port, using a timeout specified. The "from" and "hops" parameters are optional and allow one to scan part of a fabric by specifying a node "from" and a number of hops away from that node to scan, "hops". This gives the user a "sub-fabric" which is "centered" anywhere they chose. > + > +.B ibnd_destroy_fabric() > +free all memory and resources associated with the fabric. > + > +.B ibnd_debug() > +Set the debug level to be printed as library operations take place. > + > +.B ibnd_debug() > +Indicate that the library should print debug output which shows it's progress > +through the fabric. > + > +.SH "RETURN VALUE" > +.B ibnd_discover_fabric() > +return NULL on failure, otherwise a valid ibnd_fabric_t object. > + > +.B ibnd_destory_fabric(), ibnd_debug() > +NONE > + > +.SH "EXAMPLES" > + > +.B Discover the entire fabric connected to device "mthca0", port 1. > + > + ibnd_discover_fabric("mthca0", 1, 100, NULL, 0); > + > +.B Discover only a single node and those nodes connected to it. > + > + str2drpath(&(port_id.drpath), from, 0, 0); > + > + ibnd_discover_fabric("mthca0", 1, 100, &port_id, 1); > + > +.SH "AUTHORS" > +.TP > +Ira Weiny > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 b/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 > new file mode 100644 > index 0000000..612e501 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_FIND_NODE_DR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_find_node_guid.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 b/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 > new file mode 100644 > index 0000000..676b528 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 > @@ -0,0 +1,25 @@ > +.TH IBND_FIND_NODE_GUID 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.SH "NAME" > +ibnd_find_node_guid, ibnd_find_node_dr \- given a fabric object find the node object within it which matches the guid or directed route specified. > + > +.SH "SYNOPSIS" > +.nf > +.B #include > +.sp > +.BI "ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid)" > +.BI "ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str)" > + > +.SH "DESCRIPTION" > +.B ibnd_find_node_guid() > +Given a fabric object and a guid, return the ibnd_node_t object with that node guid. > +.B ibnd_find_node_dr() > +Given a fabric object and a directed route, return the ibnd_node_t object with > +that directed route. > + > +.SH "RETURN VALUE" > +.B ibnd_find_node_guid(), ibnd_find_node_dr() > +return NULL on failure, otherwise a valid ibnd_node_t object. > + > +.SH "AUTHORS" > +.TP > +Ira Weiny > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 > new file mode 100644 > index 0000000..7199dfb > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 > @@ -0,0 +1,24 @@ > +.TH IBND_ITER_NODES 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.SH "NAME" > +ibnd_iter_nodes, ibnd_iter_nodes_type \- given a fabric object and a function itterate over the nodes in the fabric. > + > +.SH "SYNOPSIS" > +.nf > +.B #include > +.sp > +.BI "void ibnd_iter_nodes(ibnd_fabric_t *fabric, ibnd_iter_func_t func, void *user_data)" > +.BI "void ibnd_iter_nodes_type(ibnd_fabric_t *fabric, ibnd_iter_func_t func, ibnd_node_type_t type, void *user_data)" > + > +.SH "DESCRIPTION" > +.B ibnd_iter_nodes() > +Itterate through all the nodes in the fabric and call "func" on them. > +.B ibnd_iter_nodes_type() > +The same as ibnd_iter_nodes except to limit the iteration to the nodes with the specified type. > + > +.SH "RETURN VALUE" > +.B ibnd_iter_nodes(), ibnd_iter_nodes_type() > +NONE > + > +.SH "AUTHORS" > +.TP > +Ira Weiny > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 > new file mode 100644 > index 0000000..878547c > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_FIND_NODES_TYPE 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_find_nodes.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 > new file mode 100644 > index 0000000..128cd3e > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_LINKSPEED_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_linkwidth_str.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 > new file mode 100644 > index 0000000..2fa9189 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_LINKSTATE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_linkwidth_str.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 > new file mode 100644 > index 0000000..2cd4f0a > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 > @@ -0,0 +1,26 @@ > +.TH IBND_LINKWIDTH_STR 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.SH "NAME" > +ibnd_linkwidth_str, ibnd_linkspeed_str, ibnd_linkstate_str, ibnd_physstate_str, ibnd_node_type_str \- prety string functions. > + > +.SH "SYNOPSIS" > +.nf > +.B #include > +.sp > +.BI > +.BI "char *ibnd_linkwidth_str(int link_width)" > +.BI "char *ibnd_linkspeed_str(int link_speed)" > +.BI "char *ibnd_linkstate_str(int link_state)" > +.BI "char *ibnd_physstate_str(int phys_state)" > +.BI "const char *ibnd_node_type_str(ibnd_node_t *node)" > +.BI "const char *ibnd_node_type_str_short(ibnd_node_t *node)" > + > +.SH "DESCRIPTION" > +Return user readable strings for the values given. > + > +.BI "const char *ibnd_node_type_str_short(ibnd_node_t *node)" > +Returns a shorter abbreviated version of the string. > + > + > +.SH "AUTHORS" > +.TP > +Ira Weiny > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 > new file mode 100644 > index 0000000..77dbf07 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_NODE_TYPE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_linkwidth_str.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 > new file mode 100644 > index 0000000..62feb6e > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_NODE_TYPE_STR_SHORT 3 "Aug 05, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_linkwidth_str.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 > new file mode 100644 > index 0000000..aeeaeb7 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_PHYSSTATE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_physstate_str.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 b/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 > new file mode 100644 > index 0000000..280af31 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 > @@ -0,0 +1,2 @@ > +.\".TH IBND_SHOW_PROGRESS 3 "Nov 26, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.so man3/ibnd_discover_fabric.3 > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 b/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 > new file mode 100644 > index 0000000..d3aa206 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 > @@ -0,0 +1,21 @@ > +.TH IBND_UPDATE_NODE 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > +.SH "NAME" > +ibnd_update_node \- Update the node specified with new data from the fabric. > + > +.SH "SYNOPSIS" > +.nf > +.B #include > +.sp > +.BI "ibnd_node_t *ibnd_update_node(ibnd_node_t *node)" > + > +.SH "DESCRIPTION" > +.B ibnd_update_node() > +Update the node info, port info, and node description of the node specified. > + > +.SH "RETURN VALUE" > +.B ibnd_update_node() > +Return NULL on failure, otherwise a valid ibnd_node_t object which is part of the fabric object. > + > +.SH "AUTHORS" > +.TP > +Ira Weiny > diff --git a/infiniband-diags/libibnetdisc/src/chassis.c b/infiniband-diags/libibnetdisc/src/chassis.c > new file mode 100644 > index 0000000..41f325e > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/src/chassis.c > @@ -0,0 +1,818 @@ > +/* > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +/*========================================================*/ > +/* FABRIC SCANNER SPECIFIC DATA */ > +/*========================================================*/ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#include > +#include > +#include > + > +#include > +#include > + > +#include "internal.h" > +#include "chassis.h" > + > +static char *ChassisTypeStr[5] = { "", "ISR9288", "ISR9096", "ISR2012", "ISR2004" }; > +static char *ChassisSlotTypeStr[4] = { "", "Line", "Spine", "SRBD" }; > + > +char *ibnd_get_chassis_type(ibnd_node_t *node) > +{ > + /* Currently, only if Voltaire chassis */ > + if (node->info.vendid != VTR_VENDOR_ID) > + return (NULL); > + if (!node->chassis) > + return (NULL); > + if (node->ch_type == UNRESOLVED_CT > + || node->ch_type > ISR2004_CT) > + return (NULL); > + return ChassisTypeStr[node->ch_type]; > +} > + > +char *ibnd_get_chassis_slot_str(ibnd_node_t *node, char *str, size_t size) > +{ > + /* Currently, only if Voltaire chassis */ > + if (node->info.vendid != VTR_VENDOR_ID) > + return (NULL); > + if (!node->chassis) > + return (NULL); > + if (node->ch_slot == UNRESOLVED_CS > + || node->ch_slot > SRBD_CS) > + return (NULL); > + if (!str) > + return (NULL); > + snprintf(str, size, "%s %d Chip %d", > + ChassisSlotTypeStr[node->ch_slot], > + node->ch_slotnum, > + node->ch_anafanum); > + return (str); > +} > + > +static ibnd_chassis_t *find_chassisnum(struct ibnd_fabric *fabric, unsigned char chassisnum) > +{ > + ibnd_chassis_t *current; > + > + for (current = fabric->first_chassis; current; current = current->next) { > + if (current->chassisnum == chassisnum) > + return current; > + } > + > + return NULL; > +} > + > +static uint64_t topspin_chassisguid(uint64_t guid) > +{ > + /* Byte 3 in system image GUID is chassis type, and */ > + /* Byte 4 is location ID (slot) so just mask off byte 4 */ > + return guid & 0xffffffff00ffffffULL; > +} > + > +int ibnd_is_xsigo_guid(uint64_t guid) > +{ > + if ((guid & 0xffffff0000000000ULL) == 0x0013970000000000ULL) > + return 1; > + else > + return 0; > +} > + > +static int is_xsigo_leafone(uint64_t guid) > +{ > + if ((guid & 0xffffffffff000000ULL) == 0x0013970102000000ULL) > + return 1; > + else > + return 0; > +} > + > +int ibnd_is_xsigo_hca(uint64_t guid) > +{ > + /* NodeType 2 is HCA */ > + if ((guid & 0xffffffff00000000ULL) == 0x0013970200000000ULL) > + return 1; > + else > + return 0; > +} > + > +int ibnd_is_xsigo_tca(uint64_t guid) > +{ > + /* NodeType 3 is TCA */ > + if ((guid & 0xffffffff00000000ULL) == 0x0013970300000000ULL) > + return 1; > + else > + return 0; > +} > + > +static int is_xsigo_ca(uint64_t guid) > +{ > + if (ibnd_is_xsigo_hca(guid) || ibnd_is_xsigo_tca(guid)) > + return 1; > + else > + return 0; > +} > + > +static int is_xsigo_switch(uint64_t guid) > +{ > + if ((guid & 0xffffffff00000000ULL) == 0x0013970100000000ULL) > + return 1; > + else > + return 0; > +} > + > +static uint64_t xsigo_chassisguid(ibnd_node_t *node) > +{ > + if (!is_xsigo_ca(node->info.sysimgguid)) { > + /* Byte 3 is NodeType and byte 4 is PortType */ > + /* If NodeType is 1 (switch), PortType is masked */ > + if (is_xsigo_switch(node->info.sysimgguid)) > + return node->info.sysimgguid & 0xffffffff00ffffffULL; > + else > + return node->info.sysimgguid; > + } else { > + if (!node->ports || !node->ports[1]) > + return (0); > + > + /* Is there a peer port ? */ > + if (!node->ports[1]->remoteport) > + return node->info.sysimgguid; > + > + /* If peer port is Leaf 1, use its chassis GUID */ > + if (is_xsigo_leafone(node->ports[1]->remoteport->node->info.sysimgguid)) > + return node->ports[1]->remoteport->node->info.sysimgguid & > + 0xffffffff00ffffffULL; > + else > + return node->info.sysimgguid; > + } > +} > + > +static uint64_t get_chassisguid(ibnd_node_t *node) > +{ > + if (node->info.vendid == TS_VENDOR_ID || node->info.vendid == SS_VENDOR_ID) > + return topspin_chassisguid(node->info.sysimgguid); > + else if (node->info.vendid == XS_VENDOR_ID || ibnd_is_xsigo_guid(node->info.sysimgguid)) > + return xsigo_chassisguid(node); > + else > + return node->info.sysimgguid; > +} > + > +static ibnd_chassis_t *find_chassisguid(ibnd_node_t *node) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(node->fabric); > + ibnd_chassis_t *current; > + uint64_t chguid; > + > + chguid = get_chassisguid(node); > + for (current = f->first_chassis; current; current = current->next) { > + if (current->chassisguid == chguid) > + return current; > + } > + > + return NULL; > +} > + > +uint64_t ibnd_get_chassis_guid(ibnd_fabric_t *fabric, unsigned char chassisnum) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > + ibnd_chassis_t *chassis; > + > + chassis = find_chassisnum(f, chassisnum); > + if (chassis) > + return chassis->chassisguid; > + else > + return 0; > +} > + > +static int is_router(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_IB_FC_ROUTER || > + n->node.info.devid == VTR_DEVID_IB_IP_ROUTER); > +} > + > +static int is_spine_9096(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SFB4 || > + n->node.info.devid == VTR_DEVID_SFB4_DDR); > +} > + > +static int is_spine_9288(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SFB12 || > + n->node.info.devid == VTR_DEVID_SFB12_DDR); > +} > + > +static int is_spine_2004(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SFB2004); > +} > + > +static int is_spine_2012(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SFB2012); > +} > + > +static int is_spine(struct ibnd_node *n) > +{ > + return (is_spine_9096(n) || is_spine_9288(n) || > + is_spine_2004(n) || is_spine_2012(n)); > +} > + > +static int is_line_24(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SLB24 || > + n->node.info.devid == VTR_DEVID_SLB24_DDR || > + n->node.info.devid == VTR_DEVID_SRB2004); > +} > + > +static int is_line_8(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SLB8); > +} > + > +static int is_line_2024(struct ibnd_node *n) > +{ > + return (n->node.info.devid == VTR_DEVID_SLB2024); > +} > + > +static int is_line(struct ibnd_node *n) > +{ > + return (is_line_24(n) || is_line_8(n) || is_line_2024(n)); > +} > + > +int is_chassis_switch(struct ibnd_node *n) > +{ > + return (is_spine(n) || is_line(n)); > +} > + > +/* these structs help find Line (Anafa) slot number while using spine portnum */ > +int line_slot_2_sfb4[25] = { 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4 }; > +int anafa_line_slot_2_sfb4[25] = { 0, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2 }; > +int line_slot_2_sfb12[25] = { 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10, 10, 11, 11, 12, 12 }; > +int anafa_line_slot_2_sfb12[25] = { 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2 }; > + > +/* IPR FCR modules connectivity while using sFB4 port as reference */ > +int ipr_slot_2_sfb4_port[25] = { 0, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1 }; > + > +/* these structs help find Spine (Anafa) slot number while using spine portnum */ > +int spine12_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > +int anafa_spine12_slot_2_slb[25]= { 0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > +int spine4_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > +int anafa_spine4_slot_2_slb[25] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > +/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ > + > +static void get_sfb_slot(struct ibnd_node *node, ibnd_port_t *lineport) > +{ > + ibnd_node_t *n = (ibnd_node_t *)node; > + > + n->ch_slot = SPINE_CS; > + if (is_spine_9096(node)) { > + n->ch_type = ISR9096_CT; > + n->ch_slotnum = spine4_slot_2_slb[lineport->portnum]; > + n->ch_anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; > + } else if (is_spine_9288(node)) { > + n->ch_type = ISR9288_CT; > + n->ch_slotnum = spine12_slot_2_slb[lineport->portnum]; > + n->ch_anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; > + } else if (is_spine_2012(node)) { > + n->ch_type = ISR2012_CT; > + n->ch_slotnum = spine12_slot_2_slb[lineport->portnum]; > + n->ch_anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; > + } else if (is_spine_2004(node)) { > + n->ch_type = ISR2004_CT; > + n->ch_slotnum = spine4_slot_2_slb[lineport->portnum]; > + n->ch_anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; > + } else { > + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, > + node->node.info.nodeguid); > + } > +} > + > +static void get_router_slot(struct ibnd_node *node, ibnd_port_t *spineport) > +{ > + ibnd_node_t *n = (ibnd_node_t *)node; > + int guessnum = 0; > + > + node->ch_found = 1; > + > + n->ch_slot = SRBD_CS; > + if (is_spine_9096(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR9096_CT; > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > + n->ch_anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; > + } else if (is_spine_9288(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR9288_CT; > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > + /* this is a smart guess based on nodeguids order on sFB-12 module */ > + guessnum = spineport->node->info.nodeguid % 4; > + /* module 1 <--> remote anafa 3 */ > + /* module 2 <--> remote anafa 2 */ > + /* module 3 <--> remote anafa 1 */ > + n->ch_anafanum = (guessnum == 3 ? 1 : (guessnum == 1 ? 3 : 2)); > + } else if (is_spine_2012(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR2012_CT; > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > + /* this is a smart guess based on nodeguids order on sFB-12 module */ > + guessnum = spineport->node->info.nodeguid % 4; > + // module 1 <--> remote anafa 3 > + // module 2 <--> remote anafa 2 > + // module 3 <--> remote anafa 1 > + n->ch_anafanum = (guessnum == 3? 1 : (guessnum == 1 ? 3 : 2)); > + } else if (is_spine_2004(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR2004_CT; > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > + n->ch_anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; > + } else { > + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, > + spineport->node->info.nodeguid); > + } > +} > + > +static void get_slb_slot(ibnd_node_t *n, ibnd_port_t *spineport) > +{ > + n->ch_slot = LINE_CS; > + if (is_spine_9096(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR9096_CT; > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > + n->ch_anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; > + } else if (is_spine_9288(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR9288_CT; > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > + n->ch_anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; > + } else if (is_spine_2012(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR2012_CT; > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > + n->ch_anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; > + } else if (is_spine_2004(CONV_NODE_INTERNAL(spineport->node))) { > + n->ch_type = ISR2004_CT; > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > + n->ch_anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; > + } else { > + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, > + spineport->node->info.nodeguid); > + } > +} > + > +/* forward declare this */ > +static void voltaire_portmap(ibnd_port_t *port); > +/* > + This function called for every Voltaire node in fabric > + It could be optimized so, but time overhead is very small > + and its only diag.util > +*/ > +static void fill_voltaire_chassis_record(struct ibnd_node *node) > +{ > + ibnd_node_t *n = (ibnd_node_t *)node; > + int p = 0; > + ibnd_port_t *port; > + struct ibnd_node *remnode = 0; > + > + if (node->ch_found) /* somehow this node has already been passed */ > + return; > + node->ch_found = 1; > + > + /* node is router only in case of using unique lid */ > + /* (which is lid of chassis router port) */ > + /* in such case node->ports is actually a requested port... */ > + if (is_router(node)) { > + /* find the remote node */ > + for (p = 1; p <= node->node.info.numports; p++) { > + port = node->node.ports[p]; > + if (port && is_spine(CONV_NODE_INTERNAL(port->remoteport->node))) > + get_router_slot(node, port->remoteport); > + } > + } else if (is_spine(node)) { > + for (p = 1; p <= node->node.info.numports; p++) { > + port = node->node.ports[p]; > + if (!port || !port->remoteport) > + continue; > + remnode = CONV_NODE_INTERNAL(port->remoteport->node); > + if (remnode->node.info.type != IBND_SWITCH_NODE) { > + if (!remnode->ch_found) > + get_router_slot(remnode, port); > + continue; > + } > + if (!n->ch_type) > + /* we assume here that remoteport belongs to line */ > + get_sfb_slot(node, port->remoteport); > + > + /* we could break here, but need to find if more routers connected */ > + } > + > + } else if (is_line(node)) { > + for (p = 1; p <= node->node.info.numports; p++) { > + port = node->node.ports[p]; > + if (!port || port->portnum > 12 || !port->remoteport) > + continue; > + /* we assume here that remoteport belongs to spine */ > + get_slb_slot(n, port->remoteport); > + break; > + } > + } > + > + /* for each port of this node, map external ports */ > + for (p = 1; p <= node->node.info.numports; p++) { > + port = node->node.ports[p]; > + if (!port) > + continue; > + voltaire_portmap(port); > + } > + > + return; > +} > + > +static int get_line_index(ibnd_node_t *node) > +{ > + int retval = 3 * (node->ch_slotnum - 1) + node->ch_anafanum; > + > + if (retval > LINES_MAX_NUM || retval < 1) > + IBPANIC("Internal error"); > + return retval; > +} > + > +static int get_spine_index(ibnd_node_t *node) > +{ > + int retval; > + > + if (is_spine_9288(CONV_NODE_INTERNAL(node)) || is_spine_2012(CONV_NODE_INTERNAL(node))) > + retval = 3 * (node->ch_slotnum - 1) + node->ch_anafanum; > + else > + retval = node->ch_slotnum; > + > + if (retval > SPINES_MAX_NUM || retval < 1) > + IBPANIC("Internal error"); > + return retval; > +} > + > +static void insert_line_router(ibnd_node_t *node, ibnd_chassis_t *chassis) > +{ > + int i = get_line_index(node); > + > + if (chassis->linenode[i]) > + return; /* already filled slot */ > + > + chassis->linenode[i] = node; > + node->chassis = chassis; > +} > + > +static void insert_spine(ibnd_node_t *node, ibnd_chassis_t *chassis) > +{ > + int i = get_spine_index(node); > + > + if (chassis->spinenode[i]) > + return; /* already filled slot */ > + > + chassis->spinenode[i] = node; > + node->chassis = chassis; > +} > + > +static void pass_on_lines_catch_spines(ibnd_chassis_t *chassis) > +{ > + ibnd_node_t *node, *remnode; > + ibnd_port_t *port; > + int i, p; > + > + for (i = 1; i <= LINES_MAX_NUM; i++) { > + node = chassis->linenode[i]; > + > + if (!(node && is_line(CONV_NODE_INTERNAL(node)))) > + continue; /* empty slot or router */ > + > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (!port || port->portnum > 12 || !port->remoteport) > + continue; > + > + remnode = port->remoteport->node; > + > + if (!CONV_NODE_INTERNAL(remnode)->ch_found) > + continue; /* some error - spine not initialized ? FIXME */ > + insert_spine(remnode, chassis); > + } > + } > +} > + > +static void pass_on_spines_catch_lines(ibnd_chassis_t *chassis) > +{ > + ibnd_node_t *node, *remnode; > + ibnd_port_t *port; > + int i, p; > + > + for (i = 1; i <= SPINES_MAX_NUM; i++) { > + node = chassis->spinenode[i]; > + if (!node) > + continue; /* empty slot */ > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (!port || !port->remoteport) > + continue; > + remnode = port->remoteport->node; > + > + if (!CONV_NODE_INTERNAL(remnode)->ch_found) > + continue; /* some error - line/router not initialized ? FIXME */ > + insert_line_router(remnode, chassis); > + } > + } > +} > + > +/* > + Stupid interpolation algorithm... > + But nothing to do - have to be compliant with VoltaireSM/NMS > +*/ > +static void pass_on_spines_interpolate_chguid(ibnd_chassis_t *chassis) > +{ > + ibnd_node_t *node; > + int i; > + > + for (i = 1; i <= SPINES_MAX_NUM; i++) { > + node = chassis->spinenode[i]; > + if (!node) > + continue; /* skip the empty slots */ > + > + /* take first guid minus one to be consistent with SM */ > + chassis->chassisguid = node->info.nodeguid - 1; > + break; > + } > +} > + > +/* > + This function fills chassis structure with all nodes > + in that chassis > + chassis structure = structure of one standalone chassis > +*/ > +static void build_chassis(struct ibnd_node *node, ibnd_chassis_t *chassis) > +{ > + int p = 0; > + struct ibnd_node *remnode = 0; > + ibnd_port_t *port = 0; > + > + /* we get here with node = chassis_spine */ > + insert_spine((ibnd_node_t *)node, chassis); > + > + /* loop: pass on all ports of node */ > + for (p = 1; p <= node->node.info.numports; p++ ) { > + port = node->node.ports[p]; > + if (!port || !port->remoteport) > + continue; > + remnode = CONV_NODE_INTERNAL(port->remoteport->node); > + > + if (!remnode->ch_found) > + continue; /* some error - line or router not initialized ? FIXME */ > + > + insert_line_router(&(remnode->node), chassis); > + } > + > + pass_on_lines_catch_spines(chassis); > + /* this pass needed for to catch routers, since routers connected only */ > + /* to spines in slot 1 or 4 and we could miss them first time */ > + pass_on_spines_catch_lines(chassis); > + > + /* additional 2 passes needed for to overcome a problem of pure "in-chassis" */ > + /* connectivity - extra pass to ensure that all related chips/modules */ > + /* inserted into the chassis */ > + pass_on_lines_catch_spines(chassis); > + pass_on_spines_catch_lines(chassis); > + pass_on_spines_interpolate_chguid(chassis); > +} > + > +/*========================================================*/ > +/* INTERNAL TO EXTERNAL PORT MAPPING */ > +/*========================================================*/ > + > +/* > +Description : On ISR9288/9096 external ports indexing > + is not matching the internal ( anafa ) port > + indexes. Use this MAP to translate the data you get from > + the OpenIB diagnostics (smpquery, ibroute, ibtracert, etc.) > + > + > +Module : sLB-24 > + anafa 1 anafa 2 > +ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 > +int port | 22 23 24 18 17 16 | 22 23 24 18 17 16 > +ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 > +int port | 19 20 21 15 14 13 | 19 20 21 15 14 13 > +------------------------------------------------ > + > +Module : sLB-8 > + anafa 1 anafa 2 > +ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 > +int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 > +ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 > +int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 > + > +-----------> > + anafa 1 anafa 2 > +ext port | - - 5 - - 6 | - - 7 - - 8 > +int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 > +ext port | - - 1 - - 2 | - - 3 - - 4 > +int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 > +------------------------------------------------ > + > +Module : sLB-2024 > + > +ext port | 13 14 15 16 17 18 19 20 21 22 23 24 > +A1 int port| 13 14 15 16 17 18 19 20 21 22 23 24 > +ext port | 1 2 3 4 5 6 7 8 9 10 11 12 > +A2 int port| 13 14 15 16 17 18 19 20 21 22 23 24 > +--------------------------------------------------- > + > +*/ > + > +int int2ext_map_slb24[2][25] = { > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 5, 4, 18, 17, 16, 1, 2, 3, 13, 14, 15 }, > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 11, 10, 24, 23, 22, 7, 8, 9, 19, 20, 21 } > + }; > +int int2ext_map_slb8[2][25] = { > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 6, 6, 6, 1, 1, 1, 5, 5, 5 }, > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 8, 8, 8, 3, 3, 3, 7, 7, 7 } > + }; > +int int2ext_map_slb2024[2][25] = { > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }, > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 } > + }; > +/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ > + > +/* map internal ports to external ports if appropriate */ > +static void > +voltaire_portmap(ibnd_port_t *port) > +{ > + struct ibnd_node *n = CONV_NODE_INTERNAL(port->node); > + int portnum = port->portnum; > + int chipnum = 0; > + ibnd_node_t *node = port->node; > + > + if (!n->ch_found || !is_line(CONV_NODE_INTERNAL(node)) || (portnum < 13 || portnum > 24)) { > + port->ext_portnum = 0; > + return; > + } > + > + if (port->node->ch_anafanum < 1 || port->node->ch_anafanum > 2) { > + port->ext_portnum = 0; > + return; > + } > + > + chipnum = port->node->ch_anafanum - 1; > + > + if (is_line_24(CONV_NODE_INTERNAL(node))) > + port->ext_portnum = int2ext_map_slb24[chipnum][portnum]; > + else if (is_line_2024(CONV_NODE_INTERNAL(node))) > + port->ext_portnum = int2ext_map_slb2024[chipnum][portnum]; > + else > + port->ext_portnum = int2ext_map_slb8[chipnum][portnum]; > +} > + > +static void add_chassis(struct ibnd_fabric *fabric) > +{ > + if (!(fabric->current_chassis = calloc(1, sizeof(ibnd_chassis_t)))) > + IBPANIC("out of mem"); > + > + if (fabric->first_chassis == NULL) { > + fabric->first_chassis = fabric->current_chassis; > + fabric->last_chassis = fabric->current_chassis; > + } else { > + fabric->last_chassis->next = fabric->current_chassis; > + fabric->last_chassis = fabric->current_chassis; > + } > +} > + > +static void > +add_node_to_chassis(ibnd_chassis_t *chassis, ibnd_node_t *node) > +{ > + node->chassis = chassis; > + node->next_chassis_node = chassis->nodes; > + chassis->nodes = node; > +} > + > +/* > + Main grouping function > + Algorithm: > + 1. pass on every Voltaire node > + 2. catch spine chip for every Voltaire node > + 2.1 build/interpolate chassis around this chip > + 2.2 go to 1. > + 3. pass on non Voltaire nodes (SystemImageGUID based grouping) > + 4. now group non Voltaire nodes by SystemImageGUID > + Returns: > + Pointer to the first chassis in a NULL terminated list of chassis in > + the fabric specified. > +*/ > +ibnd_chassis_t *group_nodes(struct ibnd_fabric *fabric) > +{ > + struct ibnd_node *node; > + int dist; > + int chassisnum = 0; > + ibnd_chassis_t *chassis; > + > + fabric->first_chassis = NULL; > + fabric->current_chassis = NULL; > + > + /* first pass on switches and build for every Voltaire node */ > + /* an appropriate chassis record (slotnum and position) */ > + /* according to internal connectivity */ > + /* not very efficient but clear code so... */ > + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > + if (node->node.info.vendid == VTR_VENDOR_ID) > + fill_voltaire_chassis_record(node); > + } > + } > + > + /* separate every Voltaire chassis from each other and build linked list of them */ > + /* algorithm: catch spine and find all surrounding nodes */ > + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > + if (node->node.info.vendid != VTR_VENDOR_ID) > + continue; > + //if (!node->node.chrecord || node->node.chrecord->chassisnum || !is_spine(node)) > + if (!node->ch_found > + || (node->node.chassis && node->node.chassis->chassisnum) > + || !is_spine(node)) > + continue; > + add_chassis(fabric); > + fabric->current_chassis->chassisnum = ++chassisnum; > + build_chassis(node, fabric->current_chassis); > + } > + } > + > + /* now make pass on nodes for chassis which are not Voltaire */ > + /* grouped by common SystemImageGUID */ > + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > + if (node->node.info.vendid == VTR_VENDOR_ID) > + continue; > + if (node->node.info.sysimgguid) { > + chassis = find_chassisguid((ibnd_node_t *)node); > + if (chassis) > + chassis->nodecount++; > + else { > + /* Possible new chassis */ > + add_chassis(fabric); > + fabric->current_chassis->chassisguid = > + get_chassisguid((ibnd_node_t *)node); > + fabric->current_chassis->nodecount = 1; > + } > + } > + } > + } > + > + /* now, make another pass to see which nodes are part of chassis */ > + /* (defined as chassis->nodecount > 1) */ > + for (dist = 0; dist <= MAXHOPS; ) { > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > + if (node->node.info.vendid == VTR_VENDOR_ID) > + continue; > + if (node->node.info.sysimgguid) { > + chassis = find_chassisguid((ibnd_node_t *)node); > + if (chassis && chassis->nodecount > 1) { > + if (!chassis->chassisnum) > + chassis->chassisnum = ++chassisnum; > + if (!node->ch_found) { > + node->ch_found = 1; > + add_node_to_chassis(chassis, (ibnd_node_t *)node); > + } > + } > + } > + } > + if (dist == fabric->fabric.maxhops_discovered) > + dist = MAXHOPS; /* skip to CAs */ > + else > + dist++; > + } > + > + return (fabric->first_chassis); > +} > diff --git a/infiniband-diags/libibnetdisc/src/chassis.h b/infiniband-diags/libibnetdisc/src/chassis.h > new file mode 100644 > index 0000000..16dad49 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/src/chassis.h > @@ -0,0 +1,85 @@ > +/* > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#ifndef _CHASSIS_H_ > +#define _CHASSIS_H_ > + > +#include > + > +#include "internal.h" > + > +/*========================================================*/ > +/* CHASSIS RECOGNITION SPECIFIC DATA */ > +/*========================================================*/ > + > +/* Device IDs */ > +#define VTR_DEVID_IB_FC_ROUTER 0x5a00 > +#define VTR_DEVID_IB_IP_ROUTER 0x5a01 > +#define VTR_DEVID_ISR9600_SPINE 0x5a02 > +#define VTR_DEVID_ISR9600_LEAF 0x5a03 > +#define VTR_DEVID_HCA1 0x5a04 > +#define VTR_DEVID_HCA2 0x5a44 > +#define VTR_DEVID_HCA3 0x6278 > +#define VTR_DEVID_SW_6IB4 0x5a05 > +#define VTR_DEVID_ISR9024 0x5a06 > +#define VTR_DEVID_ISR9288 0x5a07 > +#define VTR_DEVID_SLB24 0x5a09 > +#define VTR_DEVID_SFB12 0x5a08 > +#define VTR_DEVID_SFB4 0x5a0b > +#define VTR_DEVID_ISR9024_12 0x5a0c > +#define VTR_DEVID_SLB8 0x5a0d > +#define VTR_DEVID_RLX_SWITCH_BLADE 0x5a20 > +#define VTR_DEVID_ISR9024_DDR 0x5a31 > +#define VTR_DEVID_SFB12_DDR 0x5a32 > +#define VTR_DEVID_SFB4_DDR 0x5a33 > +#define VTR_DEVID_SLB24_DDR 0x5a34 > +#define VTR_DEVID_SFB2012 0x5a37 > +#define VTR_DEVID_SLB2024 0x5a38 > +#define VTR_DEVID_ISR2012 0x5a39 > +#define VTR_DEVID_SFB2004 0x5a40 > +#define VTR_DEVID_ISR2004 0x5a41 > +#define VTR_DEVID_SRB2004 0x5a42 > + > +/* Vendor IDs (for chassis based systems) */ > +#define VTR_VENDOR_ID 0x8f1 /* Voltaire */ > +#define TS_VENDOR_ID 0x5ad /* Cisco */ > +#define SS_VENDOR_ID 0x66a /* InfiniCon */ > +#define XS_VENDOR_ID 0x1397 /* Xsigo */ > + > +enum ibnd_chassis_type { UNRESOLVED_CT, ISR9288_CT, ISR9096_CT, ISR2012_CT, ISR2004_CT }; > +enum ibnd_chassis_slot_type { UNRESOLVED_CS, LINE_CS, SPINE_CS, SRBD_CS }; > + > +ibnd_chassis_t *group_nodes(struct ibnd_fabric *fabric); > + > +#endif /* _CHASSIS_H_ */ > diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc.c b/infiniband-diags/libibnetdisc/src/ibnetdisc.c > new file mode 100644 > index 0000000..64e4ece > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/src/ibnetdisc.c > @@ -0,0 +1,872 @@ > +/* > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Laboratory > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +#include > +#include > + > +#include "internal.h" > +#include "chassis.h" > + > +static int timeout_ms = 2000; > +static int show_progress = 0; > + > +static char *linkwidth_str[] = { > + "??", > + "1x", > + "4x", > + "??", > + "8x", > + "??", > + "??", > + "??", > + "12x" > +}; > + > +static char *linkspeed_str[] = { > + "???", > + "SDR", > + "DDR", > + "???", > + "QDR" > +}; > + > +static char *linkspeed_datarate_str[] = { > + "???", > + "2.5 Gbps", > + "5.0 Gbps", > + "???", > + "10.0 Gbps" > +}; > + > +static char *linkstate_str[] = { > + "No State", > + "Down", > + "Init", > + "Armed", > + "Active" > +}; > + > +static char *physstate_str[] = { > + "No State", > + "Sleep", > + "Polling", > + "Disabled", > + "PortConfigTraining", > + "LinkUp", > + "LinkErrorRecovery", > + "Phy Test" > +}; > + > +char * > +ibnd_linkwidth_str(int link_width) > +{ > + if (link_width > 8) > + return linkwidth_str[0]; > + else > + return linkwidth_str[link_width]; > +} > + > +char * > +ibnd_linkspeed_str(int link_speed, int data_rate) > +{ > + if (link_speed > 4) > + return linkspeed_str[0]; > + else if (data_rate) > + return linkspeed_datarate_str[link_speed]; > + else > + return linkspeed_str[link_speed]; > +} > +char * > +ibnd_linkstate_str(int link_state) > +{ > + if (link_state > 4) > + return linkstate_str[0]; > + else > + return linkstate_str[link_state]; > +} > + > +char * > +ibnd_physstate_str(int phys_state) > +{ > + if (phys_state > 7) > + return physstate_str[0]; > + else > + return physstate_str[phys_state]; > +} > + > +void > +decode_port_info(void * rcv_buf, ibnd_port_info_t *pi) > +{ > + mad_decode_field(rcv_buf, IB_PORT_LID_F, &pi->lid); > + mad_decode_field(rcv_buf, IB_PORT_SMLID_F, &pi->smlid); > + > + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_SUPPORTED_F, &pi->link_speed_supported); > + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_ENABLED_F, &pi->link_speed_enabled); > + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_ACTIVE_F, &pi->link_speed_active); > + > + mad_decode_field(rcv_buf, IB_PORT_LOCAL_PORT_F, &pi->local_port); > + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_SUPPORTED_F, &pi->link_width_supported); > + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_ENABLED_F, &pi->link_width_enabled); > + > + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_ACTIVE_F, &pi->link_width_active); > + > + mad_decode_field(rcv_buf, IB_PORT_DIAG_F, &pi->diag_code); > + mad_decode_field(rcv_buf, IB_PORT_MKEY_LEASE_F, &pi->mkey_lease); > + mad_decode_field(rcv_buf, IB_PORT_CAPMASK_F, &pi->capability_mask); > + mad_decode_field(rcv_buf, IB_PORT_MKEY_F, &pi->mkey); > + mad_decode_field(rcv_buf, IB_PORT_GID_PREFIX_F, &pi->gid_prefix); > + > + mad_decode_field(rcv_buf, IB_PORT_STATE_F, &pi->link_state); > + mad_decode_field(rcv_buf, IB_PORT_PHYS_STATE_F, &pi->phys_state); > + > + mad_decode_field(rcv_buf, IB_PORT_LINK_DOWN_DEF_F, &pi->link_down_def_state); > + mad_decode_field(rcv_buf, IB_PORT_MKEY_PROT_BITS_F, &pi->mkey_prot_bits); > + > + mad_decode_field(rcv_buf, IB_PORT_LMC_F, &pi->lmc); > + mad_decode_field(rcv_buf, IB_PORT_NEIGHBOR_MTU_F, &pi->neighbor_mtu); > + mad_decode_field(rcv_buf, IB_PORT_SMSL_F, &pi->smsl); > + mad_decode_field(rcv_buf, IB_PORT_INIT_TYPE_F, &pi->init_type); > + > + mad_decode_field(rcv_buf, IB_PORT_VL_CAP_F, &pi->vl_capability); > + mad_decode_field(rcv_buf, IB_PORT_VL_HIGH_LIMIT_F, &pi->vl_high_limit); > + mad_decode_field(rcv_buf, IB_PORT_VL_ARBITRATION_HIGH_CAP_F, &pi->vl_arb_high_cap); > + mad_decode_field(rcv_buf, IB_PORT_VL_ARBITRATION_LOW_CAP_F, &pi->vl_arb_low_cap); > + > + mad_decode_field(rcv_buf, IB_PORT_INIT_TYPE_REPLY_F, &pi->init_reply); > + mad_decode_field(rcv_buf, IB_PORT_MTU_CAP_F, &pi->mtu_cap); > + mad_decode_field(rcv_buf, IB_PORT_VL_STALL_COUNT_F, &pi->vl_stall_count); > + mad_decode_field(rcv_buf, IB_PORT_HOQ_LIFE_F, &pi->hoq_lifetime); > + mad_decode_field(rcv_buf, IB_PORT_OPER_VLS_F, &pi->oper_vls); > + mad_decode_field(rcv_buf, IB_PORT_PART_EN_INB_F, &pi->partition_enforce_in); > + mad_decode_field(rcv_buf, IB_PORT_PART_EN_OUTB_F, &pi->partition_enforce_out); > + mad_decode_field(rcv_buf, IB_PORT_FILTER_RAW_INB_F, &pi->filter_raw_in); > + mad_decode_field(rcv_buf, IB_PORT_FILTER_RAW_OUTB_F, &pi->filter_raw_out); > + mad_decode_field(rcv_buf, IB_PORT_MKEY_VIOL_F, &pi->mkey_violations); > + mad_decode_field(rcv_buf, IB_PORT_PKEY_VIOL_F, &pi->pkey_violations); > + mad_decode_field(rcv_buf, IB_PORT_QKEY_VIOL_F, &pi->qkey_violations); > + > + mad_decode_field(rcv_buf, IB_PORT_GUID_CAP_F, &pi->guid_capabilities); > + > + mad_decode_field(rcv_buf, IB_PORT_CLIENT_REREG_F, &pi->client_rereg); > + mad_decode_field(rcv_buf, IB_PORT_SUBN_TIMEOUT_F, &pi->subnet_timeout); > + mad_decode_field(rcv_buf, IB_PORT_RESP_TIME_VAL_F, &pi->response_time_val); > + mad_decode_field(rcv_buf, IB_PORT_LOCAL_PHYS_ERR_F, &pi->local_phys_error); > + mad_decode_field(rcv_buf, IB_PORT_OVERRUN_ERR_F, &pi->overrun_error); > + mad_decode_field(rcv_buf, IB_PORT_MAX_CREDIT_HINT_F, &pi->max_credit_hint); > + mad_decode_field(rcv_buf, IB_PORT_LINK_ROUND_TRIP_F, &pi->link_round_trip); > +} > + > +static int > +get_port_info(struct ibnd_fabric *fabric, struct ibnd_port *port, > + int portnum, ib_portid_t *portid) > +{ > + char portinfo[64]; > + void *pi = portinfo; > + > + port->port.portnum = portnum; > + > + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, portnum, timeout_ms, > + fabric->ibmad_port)) > + return -1; > + > + decode_port_info(pi, &port->port.info); > + > + IBND_DEBUG("portid %s portnum %d: lid %d state %d physstate %d %s %s\n", > + portid2str(portid), portnum, port->port.info.lid, port->port.info.link_state, > + port->port.info.phys_state, ibnd_linkwidth_str(port->port.info.link_width_active), > + ibnd_linkspeed_str(port->port.info.link_speed_active, 0)); > + return 1; > +} > + > +static void > +decode_node_info(void * rcv_buf, ibnd_node_info_t *ni) > +{ > + mad_decode_field(rcv_buf, IB_NODE_BASE_VERS_F, &ni->base_ver); > + mad_decode_field(rcv_buf, IB_NODE_CLASS_VERS_F, &ni->class_ver); > + mad_decode_field(rcv_buf, IB_NODE_TYPE_F, &ni->type); > + mad_decode_field(rcv_buf, IB_NODE_NPORTS_F, &ni->numports); > + mad_decode_field(rcv_buf, IB_NODE_SYSTEM_GUID_F, &ni->sysimgguid); > + mad_decode_field(rcv_buf, IB_NODE_GUID_F, &ni->nodeguid); > + mad_decode_field(rcv_buf, IB_NODE_PORT_GUID_F, &ni->nodeportguid); > + mad_decode_field(rcv_buf, IB_NODE_PARTITION_CAP_F, &ni->partition_cap); > + mad_decode_field(rcv_buf, IB_NODE_DEVID_F, &ni->devid); > + mad_decode_field(rcv_buf, IB_NODE_REVISION_F, &ni->revision); > + mad_decode_field(rcv_buf, IB_NODE_LOCAL_PORT_F, &ni->localport); > + mad_decode_field(rcv_buf, IB_NODE_VENDORID_F, &ni->vendid); > +} > + > +/* > + * Returns -1 if error. > + */ > +static int > +query_node_info(struct ibnd_fabric *fabric, struct ibnd_node *node, ib_portid_t *portid) > +{ > + char nodeinfo[64]; > + void *ni = nodeinfo; > + if (!smp_query_via(ni, portid, IB_ATTR_NODE_INFO, 0, timeout_ms, > + fabric->ibmad_port)) > + return -1; > + decode_node_info(ni, &(node->node.info)); > + return (0); > +} > + > +/* > + * Returns 0 if non switch node is found, 1 if switch is found, -1 if error. > + */ > +static int > +query_node(struct ibnd_fabric *fabric, struct ibnd_node *inode, > + struct ibnd_port *iport, ib_portid_t *portid) > +{ > + char portinfo[64]; > + void *pi = portinfo; > + char switchinfo[64]; > + void *si = switchinfo; > + ibnd_node_t *node = &(inode->node); > + ibnd_port_t *port = &(iport->port); > + void *nd = inode->node.nodedesc; > + > + if (query_node_info(fabric, inode, portid)) > + return -1; > + > + port->portnum = node->info.localport; > + port->guid = node->info.nodeportguid; > + > + if (!smp_query_via(nd, portid, IB_ATTR_NODE_DESC, 0, timeout_ms, > + fabric->ibmad_port)) > + return -1; > + > + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, 0, timeout_ms, > + fabric->ibmad_port)) > + return -1; > + decode_port_info(pi, &port->info); > + > + if (node->info.type != IBND_SWITCH_NODE) > + return 0; > + > + node->smalid = port->info.lid; > + node->smalmc = port->info.lmc; > + > + /* after we have the sma information find out the real PortInfo for this port */ > + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, node->info.localport, timeout_ms, > + fabric->ibmad_port)) > + return -1; > + decode_port_info(pi, &port->info); > + > + if (!smp_query_via(si, portid, IB_ATTR_SWITCH_INFO, 0, timeout_ms, > + fabric->ibmad_port)) > + node->sw_info.smaenhsp0 = 0; /* assume base SP0 */ > + else > + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &node->sw_info.smaenhsp0); > + > + IBND_DEBUG("portid %s: got switch node %" PRIx64 " '%s'\n", > + portid2str(portid), node->info.nodeguid, node->nodedesc); > + return 1; > +} > + > +static int > +add_port_to_dpath(ib_dr_path_t *path, int nextport) > +{ > + if (path->cnt+2 >= sizeof(path->p)) > + return -1; > + ++path->cnt; > + path->p[path->cnt] = nextport; > + return path->cnt; > +} > + > +static int > +extend_dpath(struct ibnd_fabric *f, ib_dr_path_t *path, int nextport) > +{ > + int rc = add_port_to_dpath(path, nextport); > + if ((rc != -1) && (path->cnt > f->fabric.maxhops_discovered)) > + f->fabric.maxhops_discovered = path->cnt; > + return (rc); > +} > + > +static void > +dump_endnode(ib_portid_t *path, char *prompt, > + struct ibnd_node *node, struct ibnd_port *port) > +{ > + if (!show_progress) > + return; > + > + printf("%s -> %s %s {%016" PRIx64 "} portnum %d lid %d-%d\"%s\"\n", > + portid2str(path), prompt, > + ibnd_node_type_str((ibnd_node_t *)node), > + node->node.info.nodeguid, > + node->node.info.type == IBND_SWITCH_NODE ? 0 : port->port.portnum, > + port->port.info.lid, port->port.info.lid + (1 << port->port.info.lmc) - 1, > + node->node.nodedesc); > +} > + > +static struct ibnd_node * > +find_existing_node(struct ibnd_fabric *fabric, struct ibnd_node *new) > +{ > + int hash = HASHGUID(new->node.info.nodeguid) % HTSZ; > + struct ibnd_node *node; > + > + for (node = fabric->nodestbl[hash]; node; node = node->htnext) > + if (node->node.info.nodeguid == new->node.info.nodeguid) > + return node; > + > + return NULL; > +} > + > +ibnd_node_t * > +ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > + int hash = HASHGUID(guid) % HTSZ; > + struct ibnd_node *node; > + > + for (node = f->nodestbl[hash]; node; node = node->htnext) > + if (node->node.info.nodeguid == guid) > + return (ibnd_node_t *)node; > + > + return NULL; > +} > + > +ibnd_node_t * > +ibnd_update_node(ibnd_node_t *node) > +{ > + char portinfo[64]; > + void *pi = portinfo; > + ibnd_port_info_t port0_info; > + char switchinfo[64]; > + void *si = switchinfo; > + void *nd = node->nodedesc; > + int p = 0; > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(node->fabric); > + struct ibnd_node *n = CONV_NODE_INTERNAL(node); > + > + if (query_node_info(f, n, &(n->node.path_portid))) > + return (NULL); > + > + if (!smp_query_via(nd, &(n->node.path_portid), IB_ATTR_NODE_DESC, 0, timeout_ms, > + f->ibmad_port)) > + return (NULL); > + > + /* update all the port info's */ > + for (p = 1; p >= n->node.info.numports; p++) { > + get_port_info(f, CONV_PORT_INTERNAL(n->node.ports[p]), p, &(n->node.path_portid)); > + } > + > + if (n->node.info.type != IBND_SWITCH_NODE) > + goto done; > + > + if (!smp_query_via(pi, &(n->node.path_portid), IB_ATTR_PORT_INFO, 0, timeout_ms, > + f->ibmad_port)) > + return (NULL); > + decode_port_info(pi, &port0_info); > + > + n->node.smalid = port0_info.lid; > + n->node.smalmc = port0_info.lmc; > + > + if (!smp_query_via(si, &(n->node.path_portid), IB_ATTR_SWITCH_INFO, 0, timeout_ms, > + f->ibmad_port)) > + node->sw_info.smaenhsp0 = 0; /* assume base SP0 */ > + else > + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &n->node.sw_info.smaenhsp0); > + > +done: > + return (node); > +} > + > +ibnd_node_t * > +ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > + int i = 0; > + ibnd_node_t *rc = f->fabric.from_node; > + ib_dr_path_t path; > + > + if (str2drpath(&path, dr_str, 0, 0) == -1) { > + return (NULL); > + } > + > + for (i = 0; i <= path.cnt; i++) { > + ibnd_port_t *remote_port = NULL; > + if (path.p[i] == 0) > + continue; > + if (!rc->ports) > + return (NULL); > + > + remote_port = rc->ports[path.p[i]]->remoteport; > + if (!remote_port) > + return (NULL); > + > + rc = remote_port->node; > + } > + > + return (rc); > +} > + > +static void > +add_to_nodeguid_hash(struct ibnd_node *node, struct ibnd_node *hash[]) > +{ > + int hash_idx = HASHGUID(node->node.info.nodeguid) % HTSZ; > + > + node->htnext = hash[hash_idx]; > + hash[hash_idx] = node; > +} > + > +static void > +add_to_portguid_hash(struct ibnd_port *port, struct ibnd_port *hash[]) > +{ > + int hash_idx = HASHGUID(port->port.guid) % HTSZ; > + > + port->htnext = hash[hash_idx]; > + hash[hash_idx] = port; > +} > + > +static void > +add_to_type_list(struct ibnd_node*node, struct ibnd_fabric *fabric) > +{ > + switch (node->node.info.type) { > + case IBND_CA_NODE: > + node->type_next = fabric->ch_adapters; > + fabric->ch_adapters = node; > + break; > + case IBND_SWITCH_NODE: > + node->type_next = fabric->switches; > + fabric->switches = node; > + break; > + case IBND_ROUTER_NODE: > + node->type_next = fabric->routers; > + fabric->routers = node; > + break; > + } > +} > + > +static void > +add_to_nodedist(struct ibnd_node *node, struct ibnd_fabric *fabric) > +{ > + int dist = node->node.dist; > + if (node->node.info.type != IBND_SWITCH_NODE) > + dist = MAXHOPS; /* special Ca list */ > + > + node->dnext = fabric->nodesdist[dist]; > + fabric->nodesdist[dist] = node; > +} > + > + > +static struct ibnd_node * > +create_node(struct ibnd_fabric *fabric, struct ibnd_node *temp, ib_portid_t *path, int dist) > +{ > + struct ibnd_node *node; > + > + node = malloc(sizeof(*node)); > + if (!node) { > + IBPANIC("OOM: node creation failed\n"); > + return NULL; > + } > + > + memcpy(node, temp, sizeof(*node)); > + node->node.dist = dist; > + node->node.path_portid = *path; > + node->node.fabric = (ibnd_fabric_t *)fabric; > + > + add_to_nodeguid_hash(node, fabric->nodestbl); > + > + /* add this to the all nodes list */ > + node->node.next = fabric->fabric.nodes; > + fabric->fabric.nodes = (ibnd_node_t *)node; > + > + add_to_type_list(node, fabric); > + add_to_nodedist(node, fabric); > + > + return node; > +} > + > +static struct ibnd_port * > +find_existing_port_node(struct ibnd_node *node, struct ibnd_port *port) > +{ > + if (port->port.portnum > node->node.info.numports || node->node.ports == NULL ) > + return (NULL); > + > + return (CONV_PORT_INTERNAL(node->node.ports[port->port.portnum])); > +} > + > +static struct ibnd_port * > +add_port_to_node(struct ibnd_fabric *fabric, struct ibnd_node *node, struct ibnd_port *temp) > +{ > + struct ibnd_port *port; > + > + port = malloc(sizeof(*port)); > + if (!port) > + return NULL; > + > + memcpy(port, temp, sizeof(*port)); > + port->port.node = (ibnd_node_t *)node; > + port->port.ext_portnum = 0; > + > + if (node->node.ports == NULL) { > + node->node.ports = calloc(sizeof(*node->node.ports), node->node.info.numports + 1); > + if (!node->node.ports) { > + IBND_ERROR("Failed to allocate the ports array\n"); > + return (NULL); > + } > + } > + > + node->node.ports[temp->port.portnum] = (ibnd_port_t *)port; > + > + add_to_portguid_hash(port, fabric->portstbl); > + return port; > +} > + > +static void > +link_ports(struct ibnd_node *node, struct ibnd_port *port, > + struct ibnd_node *remotenode, struct ibnd_port *remoteport) > +{ > + IBND_DEBUG("linking: 0x%" PRIx64 " %p->%p:%u and 0x%" PRIx64 " %p->%p:%u\n", > + node->node.info.nodeguid, node, port, port->port.portnum, > + remotenode->node.info.nodeguid, remotenode, > + remoteport, remoteport->port.portnum); > + if (port->port.remoteport) > + port->port.remoteport->remoteport = NULL; > + if (remoteport->port.remoteport) > + remoteport->port.remoteport->remoteport = NULL; > + port->port.remoteport = (ibnd_port_t *)remoteport; > + remoteport->port.remoteport = (ibnd_port_t *)port; > +} > + > +static int > +get_remote_node(struct ibnd_fabric *fabric, struct ibnd_node *node, struct ibnd_port *port, ib_portid_t *path, > + int portnum, int dist) > +{ > + struct ibnd_node node_buf; > + struct ibnd_port port_buf; > + struct ibnd_node *remotenode, *oldnode; > + struct ibnd_port *remoteport, *oldport; > + > + memset(&node_buf, 0, sizeof(node_buf)); > + memset(&port_buf, 0, sizeof(port_buf)); > + > + IBND_DEBUG("handle node %p port %p:%d dist %d\n", node, port, portnum, dist); > + if (port->port.info.phys_state != 5) /* LinkUp */ > + return -1; > + > + if (extend_dpath(fabric, &path->drpath, portnum) < 0) > + return -1; > + > + if (query_node(fabric, &node_buf, &port_buf, path) < 0) { > + IBWARN("NodeInfo on %s failed, skipping port", > + portid2str(path)); > + path->drpath.cnt--; /* restore path */ > + return -1; > + } > + > + oldnode = find_existing_node(fabric, &node_buf); > + if (oldnode) > + remotenode = oldnode; > + else if (!(remotenode = create_node(fabric, &node_buf, path, dist + 1))) > + IBPANIC("no memory"); > + > + oldport = find_existing_port_node(remotenode, &port_buf); > + if (oldport) { > + remoteport = oldport; > + } else if (!(remoteport = add_port_to_node(fabric, remotenode, &port_buf))) > + IBPANIC("no memory"); > + > + dump_endnode(path, oldnode ? "known remote" : "new remote", > + remotenode, remoteport); > + > + link_ports(node, port, remotenode, remoteport); > + > + path->drpath.cnt--; /* restore path */ > + return 0; > +} > + > +static void * > +ibnd_init_port(char *dev_name, int dev_port) > +{ > + int mgmt_classes[2] = {IB_SMI_CLASS, IB_SMI_DIRECT_CLASS}; > + > + /* Crank up the mad lib */ > + return (mad_rpc_open_port(dev_name, dev_port, mgmt_classes, 2)); > +} > + > +ibnd_fabric_t * > +ibnd_discover_fabric(char *dev_name, int dev_port, int timeout_ms, > + ib_portid_t *from, int hops) > +{ > + struct ibnd_fabric *fabric = NULL; > + ib_portid_t my_portid = {0}; > + struct ibnd_node node_buf; > + struct ibnd_port port_buf; > + struct ibnd_node *node; > + struct ibnd_port *port; > + int i; > + int dist = 0; > + ib_portid_t *path; > + int max_hops = MAXHOPS-1; /* default find everything */ > + > + /* if not everything how much? */ > + if (hops >= 0) { > + max_hops = hops; > + } > + > + /* If not specified start from "my" port */ > + if (!from) { > + from = &my_portid; > + } > + > + fabric = malloc(sizeof(*fabric)); > + > + if (!fabric) { > + IBPANIC("OOM: failed to malloc ibnd_fabric_t\n"); > + return (NULL); > + } > + > + memset(fabric, 0, sizeof(*fabric)); > + > + fabric->ibmad_port = ibnd_init_port(dev_name, dev_port); > + if (!fabric->ibmad_port) { > + IBPANIC("OOM: failed to open \"%s\" port %d\n", > + dev_name, dev_port); > + goto error; > + } > + > + IBND_DEBUG("from %s\n", portid2str(from)); > + > + memset(&node_buf, 0, sizeof(node_buf)); > + memset(&port_buf, 0, sizeof(port_buf)); > + > + if (query_node(fabric, &node_buf, &port_buf, from) < 0) { > + IBWARN("can't reach node %s\n", portid2str(from)); > + goto error; > + } > + > + node = create_node(fabric, &node_buf, from, 0); > + if (!node) > + goto error; > + > + fabric->fabric.from_node = (ibnd_node_t *)node; > + > + port = add_port_to_node(fabric, node, &port_buf); > + if (!port) > + IBPANIC("out of memory"); > + > + if (node->node.info.type != IBND_SWITCH_NODE && > + get_remote_node(fabric, node, port, from, node->node.info.localport, 0) < 0) > + return ((ibnd_fabric_t *)fabric); > + > + for (dist = 0; dist <= max_hops; dist++) { > + > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > + > + path = &node->node.path_portid; > + > + IBND_DEBUG("dist %d node %p\n", dist, node); > + dump_endnode(path, "processing", node, port); > + > + for (i = 1; i <= node->node.info.numports; i++) { > + if (i == node->node.info.localport) > + continue; > + > + if (get_port_info(fabric, &port_buf, i, path) < 0) { > + IBWARN("can't reach node %s port %d", portid2str(path), i); > + continue; > + } > + > + port = find_existing_port_node(node, &port_buf); > + if (port) > + continue; > + > + port = add_port_to_node(fabric, node, &port_buf); > + if (!port) > + IBPANIC("out of memory"); > + > + /* If switch, set port GUID to node port GUID */ > + if (node->node.info.type == IBND_SWITCH_NODE) > + port->port.guid = node->node.info.nodeportguid; > + > + get_remote_node(fabric, node, port, path, i, dist); > + } > + } > + } > + > + fabric->fabric.chassis = group_nodes(fabric); > + > + return ((ibnd_fabric_t *)fabric); > +error: > + free(fabric); > + return (NULL); > +} > + > +static void > +destroy_node(struct ibnd_node *node) > +{ > + int p = 0; > + > + for (p = 0; p <= node->node.info.numports; p++) { > + free(node->node.ports[p]); > + } > + free(node->node.ports); > + free(node); > +} > + > +void > +ibnd_destroy_fabric(ibnd_fabric_t *fabric) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > + int dist = 0; > + struct ibnd_node *node = NULL; > + struct ibnd_node *next = NULL; > + ibnd_chassis_t *ch, *ch_next; > + > + ch = f->first_chassis; > + while (ch) { > + ch_next = ch->next; > + free(ch); > + ch = ch_next; > + } > + for (dist = 0; dist <= MAXHOPS; dist++) { > + node = f->nodesdist[dist]; > + while (node) { > + next = node->dnext; > + destroy_node(node); > + node = next; > + } > + } > + if (f->ibmad_port) > + mad_rpc_close_port(f->ibmad_port); > + free(f); > +} > + > +void > +ibnd_debug(int i) > +{ > + if (i) { > + ibdebug++; > + madrpc_show_errors(1); > + umad_debug(i); > + } else { > + ibdebug = 0; > + madrpc_show_errors(0); > + umad_debug(0); > + } > +} > + > +void > +ibnd_show_progress(int i) > +{ > + show_progress = i; > +} > + > +const char* > +ibnd_node_type_str(ibnd_node_t *node) > +{ > + switch(node->info.type) { > + case IBND_CA_NODE: return "Ca"; > + case IBND_SWITCH_NODE: return "Switch"; > + case IBND_ROUTER_NODE: return "Router"; > + } > + return "??"; > +} > + > +const char* > +ibnd_node_type_str_short(ibnd_node_t *node) > +{ > + switch(node->info.type) { > + case IBND_SWITCH_NODE: return "SW"; > + case IBND_CA_NODE: return "CA"; > + case IBND_ROUTER_NODE: return "RT"; > + } > + return "??"; > +} > + > + > +void > +ibnd_iter_nodes(ibnd_fabric_t *fabric, > + ibnd_iter_node_func_t func, > + void *user_data) > +{ > + ibnd_node_t *cur = NULL; > + > + for (cur = fabric->nodes; cur; cur = cur->next) { > + func(cur, user_data); > + } > +} > + > + > +void > +ibnd_iter_nodes_type(ibnd_fabric_t *fabric, > + ibnd_iter_node_func_t func, > + ibnd_node_type_t node_type, > + void *user_data) > +{ > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > + struct ibnd_node *list = NULL; > + struct ibnd_node *cur = NULL; > + > + switch (node_type) { > + case IBND_SWITCH_NODE: > + list = f->switches; > + break; > + case IBND_CA_NODE: > + list = f->ch_adapters; > + break; > + case IBND_ROUTER_NODE: > + list = f->routers; > + break; > + default: > + IBND_DEBUG("Invalid node_type specified %d\n", node_type); > + break; > + } > + > + for (cur = list; cur; cur = cur->type_next) { > + func((ibnd_node_t *)cur, user_data); > + } > +} > + > diff --git a/infiniband-diags/libibnetdisc/src/internal.h b/infiniband-diags/libibnetdisc/src/internal.h > new file mode 100644 > index 0000000..89f238f > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/src/internal.h > @@ -0,0 +1,82 @@ > +/* > + * Copyright (c) 2008 Lawrence Livermore National Laboratory > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +/** ========================================================================= > + * Define the internal data structures. > + */ > + > +#ifndef _INTERNAL_H_ > +#define _INTERNAL_H_ > + > +#include > + > +struct ibnd_node { > + /* This member MUST BE FIRST */ > + ibnd_node_t node; > + > + /* internal use only */ > + unsigned char ch_found; > + struct ibnd_node *htnext; /* hash table list */ > + struct ibnd_node *dnext; /* nodesdist next */ > + struct ibnd_node *type_next; /* next based on type */ > +}; > +#define CONV_NODE_INTERNAL(node) ((struct ibnd_node *)node) > + > +struct ibnd_port { > + /* This member MUST BE FIRST */ > + ibnd_port_t port; > + > + /* internal use only */ > + struct ibnd_port *htnext; > +}; > +#define CONV_PORT_INTERNAL(port) ((struct ibnd_port *)port) > + > +struct ibnd_fabric { > + /* This member MUST BE FIRST */ > + ibnd_fabric_t fabric; > + > + /* internal use only */ > + void *ibmad_port; > + struct ibnd_node *nodestbl[HTSZ]; > + struct ibnd_port *portstbl[HTSZ]; > + struct ibnd_node *nodesdist[MAXHOPS+1]; > + ibnd_chassis_t *first_chassis; > + ibnd_chassis_t *current_chassis; > + ibnd_chassis_t *last_chassis; > + struct ibnd_node *switches; > + struct ibnd_node *ch_adapters; > + struct ibnd_node *routers; > +}; > +#define CONV_FABRIC_INTERNAL(fabric) ((struct ibnd_fabric *)fabric) > + > +#endif /* _INTERNAL_H_ */ > diff --git a/infiniband-diags/libibnetdisc/src/libibnetdisc.map b/infiniband-diags/libibnetdisc/src/libibnetdisc.map > new file mode 100644 > index 0000000..5e8c315 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/src/libibnetdisc.map > @@ -0,0 +1,27 @@ > +IBNETDISC_1.0 { > + global: > + ibnd_debug; > + ibnd_show_progress; > + ibnd_discover_fabric; > + ibnd_cache_fabric; > + ibnd_read_fabric; > + ibnd_destroy_fabric; > + ibnd_find_node_guid; > + ibnd_update_node; > + ibnd_find_node_dr; > + ibnd_linkwidth_str; > + ibnd_linkspeed_str; > + ibnd_node_type_str; > + ibnd_node_type_str_short; > + ibnd_is_xsigo_guid; > + ibnd_is_xsigo_tca; > + ibnd_is_xsigo_hca; > + ibnd_get_chassis_guid; > + ibnd_get_chassis_type; > + ibnd_get_chassis_slot_str; > + ibnd_linkstate_str; > + ibnd_physstate_str; > + ibnd_iter_nodes; > + ibnd_iter_nodes_type; > + local: *; > +}; > diff --git a/infiniband-diags/libibnetdisc/test/iblinkinfotest.c b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c > new file mode 100644 > index 0000000..6e63f4a > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c > @@ -0,0 +1,395 @@ > +/* > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > + > +char *argv0 = "iblinkinfotest"; > +static FILE *f; > + > +static char *node_name_map_file = NULL; > +static nn_map_t *node_name_map = NULL; > + > +static int timeout_ms = 500; > + > +static int debug = 0; > +#define DEBUG(str, args...) \ > + if (debug) fprintf(stderr, str, ##args) > + > +static int down_links_only = 0; > +static int line_mode = 0; > +static int add_sw_settings = 0; > +static int print_port_guids = 0; > + > +static unsigned int > +get_max(unsigned int num) > +{ > + unsigned int v = num; // 32-bit word to find the log base 2 of > + unsigned r = 0; // r will be lg(v) > + > + while (v >>= 1) // unroll for more speed... > + { > + r++; > + } > + > + return (1 << r); > +} > + > +void > +get_msg(char *width_msg, char *speed_msg, int msg_size, ibnd_port_t *port) > +{ > + int max_speed = 0; > + > + int max_width = get_max(port->info.link_width_supported > + & port->remoteport->info.link_width_supported); > + if ((max_width & port->info.link_width_active) == 0) { > + // we are not at the max supported width > + // print what we could be at. > + snprintf(width_msg, msg_size, "Could be %s", > + ibnd_linkwidth_str(max_width)); > + } > + > + max_speed = get_max(port->info.link_speed_supported > + & port->remoteport->info.link_speed_supported); > + if ((max_speed & port->info.link_speed_active) == 0) { > + // we are not at the max supported speed > + // print what we could be at. > + snprintf(speed_msg, msg_size, "Could be %s", > + ibnd_linkspeed_str(max_speed, 1)); > + } > +} > + > +void > +print_port(ibnd_node_t *node, ibnd_port_t *port) > +{ > + char remote_guid_str[256]; > + char remote_str[256]; > + char link_str[256]; > + char width_msg[256]; > + char speed_msg[256]; > + char ext_port_str[256]; > + > + if (!port) > + return; > + > + remote_guid_str[0] = '\0'; > + remote_str[0] = '\0'; > + link_str[0] = '\0'; > + width_msg[0] = '\0'; > + speed_msg[0] = '\0'; > + > + if (port->remoteport) { > + char remote_name_buf[256]; > + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); > + > + if (port->remoteport->ext_portnum) > + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); > + else > + ext_port_str[0] = '\0'; > + > + get_msg(width_msg, speed_msg, 256, port); > + if (line_mode) { > + if (print_port_guids) { > + snprintf(remote_guid_str, 256, > + "0x%016lx ", > + port->remoteport->guid); > + } else { > + snprintf(remote_guid_str, 256, > + "0x%016lx ", > + port->remoteport->node->info.nodeguid); > + } > + } > + > + snprintf(remote_str, 256, > + "%s%6d %4d[%2s] \"%s\" (%s %s)\n", > + remote_guid_str, > + port->remoteport->info.lid ? > + port->remoteport->info.lid : > + port->remoteport->node->smalid, > + port->remoteport->portnum, > + ext_port_str, > + remap_node_name(node_name_map, > + port->remoteport->node->info.nodeguid, > + remote_name_buf), > + width_msg, > + speed_msg > + ); > + } else { > + snprintf(remote_str, 256, > + "%6s %4s[%2s] \"\" ( )\n", "", "", ""); > + } > + > + if (add_sw_settings) { > + snprintf(link_str, 256, > + "(%3s %s %6s/%8s) (HOQ:%d VL_Stall:%d)", > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 1), > + ibnd_linkstate_str(port->info.link_state), > + ibnd_physstate_str(port->info.phys_state), > + port->info.hoq_lifetime, > + port->info.vl_stall_count > + ); > + } else { > + snprintf(link_str, 256, > + "(%3s %s %6s/%8s)", > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 1), > + ibnd_linkstate_str(port->info.link_state), > + ibnd_physstate_str(port->info.phys_state) > + ); > + } > + > + if (port->ext_portnum) > + snprintf(ext_port_str, 256, "%d", port->ext_portnum); > + else > + ext_port_str[0] = '\0'; > + > + if (line_mode) { > + char name_buf[256]; > + strncpy(name_buf, node->nodedesc, 256); > + printf("0x%016lx \"%30s\" %6d %4d[%2s] ==%s==> %s", > + node->info.nodeguid, > + remap_node_name(node_name_map, > + node->info.nodeguid, > + name_buf), > + node->smalid, port->portnum, > + ext_port_str, > + link_str, > + remote_str > + ); > + } else { > + printf(" %6d %4d[%2s] ==%s==> %s", > + node->smalid, port->portnum, > + ext_port_str, > + link_str, > + remote_str > + ); > + } > +} > + > +void > +print_switch(ibnd_node_t *node, void *user_data) > +{ > + int i = 0; > + > + if (!line_mode) { > + char name_buf[256]; > + strncpy(name_buf, node->nodedesc, 256); > + printf("Switch 0x%016lx %s:\n", > + node->info.nodeguid, > + remap_node_name(node_name_map, > + node->info.nodeguid, > + name_buf)); > + } > + > + for (i = 1; i <= node->info.numports; i++) { > + ibnd_port_t *port = node->ports[i]; > + if (!port) > + continue; > + if (!down_links_only || port->info.link_state == IBND_LINK_DOWN) { > + print_port(node, port); > + } > + } > +} > + > +void > +usage(void) > +{ > + fprintf(stderr, > + "Usage: %s [-hclp -S -D -C -P ]\n" > + " Report link speed and connection for each port of each switch which is active\n" > + " -h This help message\n" > + " -S output only the node specified by guid\n" > + " -D print only node specified by \n" > + " -f specify node to start \"from\"\n" > + " -n Number of hops to include away from specified node\n" > + " -d print only down links\n" > + " -l (line mode) print all information for each link on each line\n" > + " -p print additional switch settings (PktLifeTime,HoqLife,VLStallCount)\n" > + > + > + " -t timeout for any single fabric query\n" > + " -s show errors\n" > + " --node-name-map use specified node name map\n" > + > + " -C use selected Channel Adaptor name for queries\n" > + " -P use selected channel adaptor port for queries\n" > + " -g print port guids instead of node guids\n" > + " --debug print debug messages\n" > + , > + argv0); > + exit(-1); > +} > + > +int > +main(int argc, char **argv) > +{ > + char *ca = 0; > + int ca_port = 0; > + ibnd_fabric_t *fabric = NULL; > + uint64_t guid = 0; > + char *dr_path = NULL; > + char *from = NULL; > + int hops = 0; > + ib_portid_t port_id; > + > + static char const str_opts[] = "S:D:n:C:P:t:sldgphuf:"; > + static const struct option long_opts[] = { > + { "S", 1, 0, 'S'}, > + { "D", 1, 0, 'D'}, > + { "num-hops", 1, 0, 'n'}, > + { "down-links-only", 0, 0, 'd'}, > + { "line-mode", 0, 0, 'l'}, > + { "ca-name", 1, 0, 'C'}, > + { "ca-port", 1, 0, 'P'}, > + { "timeout", 1, 0, 't'}, > + { "show", 0, 0, 's'}, > + { "print-port-guids", 0, 0, 'g'}, > + { "print-additional", 0, 0, 'p'}, > + { "help", 0, 0, 'h'}, > + { "usage", 0, 0, 'u'}, > + { "node-name-map", 1, 0, 1}, > + { "debug", 0, 0, 2}, > + { "from", 1, 0, 'f'}, > + { } > + }; > + > + f = stdout; > + > + argv0 = argv[0]; > + > + while (1) { > + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); > + if ( ch == -1 ) > + break; > + switch(ch) { > + case 1: > + node_name_map_file = strdup(optarg); > + break; > + case 2: > + debug = 1; > + ibnd_debug(1); > + break; > + case 'f': > + from = strdup(optarg); > + break; > + case 'C': > + ca = strdup(optarg); > + break; > + case 'P': > + ca_port = strtoul(optarg, 0, 0); > + break; > + case 'D': > + dr_path = strdup(optarg); > + break; > + case 'n': > + hops = (int)strtol(optarg, NULL, 0); > + break; > + case 'd': > + down_links_only = 1; > + break; > + case 'l': > + line_mode = 1; > + break; > + case 't': > + timeout_ms = strtoul(optarg, 0, 0); > + break; > + case 'g': > + print_port_guids = 1; > + break; > + case 'S': > + guid = (uint64_t)strtoull(optarg, 0, 0); > + break; > + case 'p': > + add_sw_settings = 1; > + break; > + default: > + usage(); > + break; > + } > + } > + argc -= optind; > + argv += optind; > + > + if (argc && !(f = fopen(argv[0], "w"))) > + fprintf(stderr, "can't open file %s for writing", argv[0]); > + > + node_name_map = open_node_name_map(node_name_map_file); > + > + if (from) { > + /* only scan part of the fabric */ > + str2drpath(&(port_id.drpath), from, 0, 0); > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { > + fprintf(stderr, "discover failed\n"); > + exit(1); > + } > + guid = 0; > + } else { > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { > + fprintf(stderr, "discover failed\n"); > + exit(1); > + } > + } > + > + if (guid) { > + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); > + print_switch(sw, NULL); > + } else if (dr_path) { > + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); > + print_switch(sw, NULL); > + } else { > + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); > + } > + > + ibnd_destroy_fabric(fabric); > + > + close_node_name_map(node_name_map); > + exit(0); > +} > diff --git a/infiniband-diags/libibnetdisc/test/ibnetdisctest.c b/infiniband-diags/libibnetdisc/test/ibnetdisctest.c > new file mode 100644 > index 0000000..fc6e234 > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/test/ibnetdisctest.c > @@ -0,0 +1,675 @@ > +/* > + * Copyright (c) 2004-2008 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +static int verbose; > +#define LIST_CA_NODE (1 << IBND_CA_NODE) > +#define LIST_SWITCH_NODE (1 << IBND_SWITCH_NODE) > +#define LIST_ROUTER_NODE (1 << IBND_ROUTER_NODE) > + > +char *argv0 = "ibnetdiscover"; > +static FILE *f; > + > +static char *node_name_map_file = NULL; > +static nn_map_t *node_name_map = NULL; > + > +static int timeout_ms = 2000; > + > +static int debug = 0; > +#define DEBUG(str, args...) \ > + if (debug) fprintf(stderr, str, ##args) > + > + > +char * > +node_name(ibnd_node_t *node) > +{ > + static char buf[256]; > + > + switch(node->info.type) { > + case IBND_CA_NODE: > + sprintf(buf, "\"%s", "H"); > + break; > + case IBND_SWITCH_NODE: > + sprintf(buf, "\"%s", "S"); > + break; > + case IBND_ROUTER_NODE: > + sprintf(buf, "\"%s", "R"); > + break; > + default: > + sprintf(buf, "\"%s", "?"); > + break; > + } > + sprintf(buf+2, "-%016" PRIx64 "\"", node->info.nodeguid); > + > + return buf; > +} > + > +void > +list_node(ibnd_node_t *node, void *user_data) > +{ > + char *nodename = remap_node_name(node_name_map, node->info.nodeguid, > + node->nodedesc); > + > + fprintf(f, "%s\t : 0x%016" PRIx64 " ports %d devid 0x%x vendid 0x%x \"%s\"\n", > + ibnd_node_type_str(node), > + node->info.nodeguid, node->info.numports, node->info.devid, > + node->info.vendid, > + nodename); > + > + free(nodename); > +} > + > +void > +list_nodes(ibnd_fabric_t *fabric, int list) > +{ > + if (list & LIST_CA_NODE) { > + ibnd_iter_nodes_type(fabric, list_node, IBND_CA_NODE, NULL); > + } > + if (list & LIST_SWITCH_NODE) { > + ibnd_iter_nodes_type(fabric, list_node, IBND_SWITCH_NODE, NULL); > + } > + if (list & LIST_ROUTER_NODE) { > + ibnd_iter_nodes_type(fabric, list_node, IBND_ROUTER_NODE, NULL); > + } > +} > + > +void > +out_ids(ibnd_node_t *node, int group, char *chname) > +{ > + fprintf(f, "\nvendid=0x%x\ndevid=0x%x\n", node->info.vendid, node->info.devid); > + if (node->info.sysimgguid) > + fprintf(f, "sysimgguid=0x%" PRIx64, node->info.sysimgguid); > + if (group > + && node->chassis && node->chassis->chassisnum) { > + fprintf(f, "\t\t# Chassis %d", node->chassis->chassisnum); > + if (chname) > + fprintf(f, " (%s)", clean_nodedesc(chname)); > + if (ibnd_is_xsigo_tca(node->info.nodeguid) > + && node->ports[1] > + && node->ports[1]->remoteport) > + fprintf(f, " slot %d", node->ports[1]->remoteport->portnum); > + } > + fprintf(f, "\n"); > +} > + > + > +uint64_t > +out_chassis(ibnd_fabric_t *fabric, int chassisnum) > +{ > + uint64_t guid; > + > + fprintf(f, "\nChassis %d", chassisnum); > + guid = ibnd_get_chassis_guid(fabric, chassisnum); > + if (guid) > + fprintf(f, " (guid 0x%" PRIx64 ")", guid); > + fprintf(f, "\n"); > + return guid; > +} > + > +void > +out_switch(ibnd_node_t *node, int group, char *chname) > +{ > + char *str; > + char str2[256]; > + char *nodename = NULL; > + > + out_ids(node, group, chname); > + fprintf(f, "switchguid=0x%" PRIx64, node->info.nodeguid); > + fprintf(f, "(%" PRIx64 ")", node->info.nodeportguid); > + if (group) { > + str = ibnd_get_chassis_type(node); > + if (str) > + fprintf(f, "%s ", str); > + str = ibnd_get_chassis_slot_str(node, str2, 256); > + if (str) > + fprintf(f, "%s", str); > + } > + > + nodename = remap_node_name(node_name_map, node->info.nodeguid, > + node->nodedesc); > + > + fprintf(f, "\nSwitch\t%d %s\t\t# \"%s\" %s port 0 lid %d lmc %d\n", > + node->info.numports, node_name(node), > + nodename, > + node->sw_info.smaenhsp0 ? "enhanced" : "base", > + node->smalid, node->smalmc); > + > + free(nodename); > +} > + > +void > +out_ca(ibnd_node_t *node, int group, char *chname) > +{ > + char *node_type; > + char *node_type2; > + > + out_ids(node, group, chname); > + switch(node->info.type) { > + case IBND_CA_NODE: > + node_type = "ca"; > + node_type2 = "Ca"; > + break; > + case IBND_ROUTER_NODE: > + node_type = "rt"; > + node_type2 = "Rt"; > + break; > + default: > + node_type = "???"; > + node_type2 = "???"; > + break; > + } > + > + fprintf(f, "%sguid=0x%" PRIx64 "\n", node_type, node->info.nodeguid); > + fprintf(f, "%s\t%d %s\t\t# \"%s\"", > + node_type2, node->info.numports, node_name(node), > + clean_nodedesc(node->nodedesc)); > + if (group && ibnd_is_xsigo_hca(node->info.nodeguid)) > + fprintf(f, " (scp)"); > + fprintf(f, "\n"); > +} > + > +#define OUT_BUFFER_SIZE 16 > +static char * > +out_ext_port(ibnd_port_t *port, int group) > +{ > + static char mapping[OUT_BUFFER_SIZE]; > + > + if (group && port->ext_portnum != 0) { > + snprintf(mapping, OUT_BUFFER_SIZE, > + "[ext %d]", port->ext_portnum); > + return (mapping); > + } > + > + return (NULL); > +} > + > +void > +out_switch_port(ibnd_port_t *port, int group) > +{ > + char *ext_port_str = NULL; > + char *rem_nodename = NULL; > + > + DEBUG("port %p:%d remoteport %p\n", port, port->portnum, port->remoteport); > + fprintf(f, "[%d]", port->portnum); > + > + ext_port_str = out_ext_port(port, group); > + if (ext_port_str) > + fprintf(f, "%s", ext_port_str); > + > + rem_nodename = remap_node_name(node_name_map, > + port->remoteport->node->info.nodeguid, > + port->remoteport->node->nodedesc); > + > + ext_port_str = out_ext_port(port->remoteport, group); > + fprintf(f, "\t%s[%d]%s", > + node_name(port->remoteport->node), > + port->remoteport->portnum, > + ext_port_str ? ext_port_str : ""); > + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) > + fprintf(f, "(%" PRIx64 ") ", port->remoteport->guid); > + fprintf(f, "\t\t# \"%s\" lid %d %s%s", > + rem_nodename, > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 0)); > + > + if (ibnd_is_xsigo_tca(port->remoteport->guid)) > + fprintf(f, " slot %d", port->portnum); > + else if (ibnd_is_xsigo_hca(port->remoteport->guid)) > + fprintf(f, " (scp)"); > + fprintf(f, "\n"); > + > + free(rem_nodename); > +} > + > +void > +out_ca_port(ibnd_port_t *port, int group) > +{ > + char *str = NULL; > + char *rem_nodename = NULL; > + > + fprintf(f, "[%d]", port->portnum); > + if (port->node->info.type != IBND_SWITCH_NODE) > + fprintf(f, "(%" PRIx64 ") ", port->guid); > + fprintf(f, "\t%s[%d]", > + node_name(port->remoteport->node), > + port->remoteport->portnum); > + str = out_ext_port(port->remoteport, group); > + if (str) > + fprintf(f, "%s", str); > + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) > + fprintf(f, " (%" PRIx64 ") ", port->remoteport->guid); > + > + rem_nodename = remap_node_name(node_name_map, > + port->remoteport->node->info.nodeguid, > + port->remoteport->node->nodedesc); > + > + fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d %s%s\n", > + port->info.lid, port->info.lmc, rem_nodename, > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 0)); > + > + free(rem_nodename); > +} > + > +struct iter_user_data { > + int group; > + int skip_chassis_nodes; > +}; > + > +static void > +switch_iter_func(ibnd_node_t *node, void *iter_user_data) > +{ > + ibnd_port_t *port; > + int p = 0; > + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; > + > + DEBUG("SWITCH: node %p\n", node); > + > + /* skip chassis based switches if flagged */ > + if (data->skip_chassis_nodes && node->chassis && node->chassis->chassisnum) > + return; > + > + out_switch(node, data->group, NULL); > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_switch_port(port, data->group); > + } > +} > + > +static void > +ca_iter_func(ibnd_node_t *node, void *iter_user_data) > +{ > + ibnd_port_t *port; > + int p = 0; > + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; > + > + DEBUG("CA: node %p\n", node); > + /* Now, skip chassis based CAs */ > + if (data->group && node->chassis && node->chassis->chassisnum) > + return; > + out_ca(node, data->group, NULL); > + > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_ca_port(port, data->group); > + } > +} > + > +static void > +router_iter_func(ibnd_node_t *node, void *iter_user_data) > +{ > + ibnd_port_t *port; > + int p = 0; > + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; > + > + DEBUG("RT: node %p\n", node); > + /* Now, skip chassis based RTs */ > + if (data->group && node->chassis && node->chassis->chassisnum) > + return; > + out_ca(node, data->group, NULL); > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_ca_port(port, data->group); > + } > +} > + > +int > +dump_topology(int group, ibnd_fabric_t *fabric) > +{ > + ibnd_node_t *node; > + ibnd_port_t *port; > + int i = 0, p = 0; > + time_t t = time(0); > + uint64_t chguid; > + char *chname = NULL; > + struct iter_user_data iter_user_data; > + > + fprintf(f, "#\n# Topology file: generated on %s#\n", ctime(&t)); > + fprintf(f, "# Max of %d hops discovered\n", fabric->maxhops_discovered); > + fprintf(f, "# Initiated from node %016" PRIx64 " port %016" PRIx64 "\n", > + fabric->from_node->info.nodeguid, fabric->from_node->info.nodeportguid); > + > + /* Make pass on switches */ > + if (group) { > + ibnd_chassis_t *ch = NULL; > + > + /* Chassis based switches first */ > + for (ch = fabric->chassis; ch; ch = ch->next) { > + int n = 0; > + > + if (!ch->chassisnum) > + continue; > + chguid = out_chassis(fabric, ch->chassisnum); > + > + chname = NULL; > +/** > + * Will this work for Xsigo? > + */ > + if (ibnd_is_xsigo_guid(chguid)) { > + for (node = ch->nodes; node; > + node = node->next_chassis_node) { > + if (ibnd_is_xsigo_hca(node->info.nodeguid)) { > + chname = node->nodedesc; > + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); > + } > + } > + > +#if 0 > +/** > + * vs. this? > + * I don't want to expose the nodesdist array to the end user. > + */ > + for (node = fabric->nodesdist[MAXHOPS]; node; node = node->dnext) { > + if (!node->chrecord || > + !node->chrecord->chassisnum) > + continue; > + > + if (node->chrecord->chassisnum != ch->chassisnum) > + continue; > + > + if (ibnd_is_xsigo_hca(node->nodeguid)) { > + chname = node->nodedesc; > + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); > + } > + } > +#endif > + } > + > + fprintf(f, "\n# Spine Nodes"); > + for (n = 1; n <= SPINES_MAX_NUM; n++) { > + if (ch->spinenode[n]) { > + out_switch(ch->spinenode[n], group, chname); > + for (p = 1; p <= ch->spinenode[n]->info.numports; p++) { > + port = ch->spinenode[n]->ports[p]; > + if (port && port->remoteport) > + out_switch_port(port, group); > + } > + } > + } > + fprintf(f, "\n# Line Nodes"); > + for (n = 1; n <= LINES_MAX_NUM; n++) { > + if (ch->linenode[n]) { > + out_switch(ch->linenode[n], group, chname); > + for (p = 1; p <= ch->linenode[n]->info.numports; p++) { > + port = ch->linenode[n]->ports[p]; > + if (port && port->remoteport) > + out_switch_port(port, group); > + } > + } > + } > + > + fprintf(f, "\n# Chassis Switches"); > + for (node = ch->nodes; node; > + node = node->next_chassis_node) { > + if (node->info.type == IBND_SWITCH_NODE) { > + out_switch(node, group, chname); > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_switch_port(port, group); > + } > + } > + } > + > + fprintf(f, "\n# Chassis CAs"); > + for (node = ch->nodes; node; > + node = node->next_chassis_node) { > + if (node->info.type == IBND_CA_NODE) { > + out_ca(node, group, chname); > + for (p = 1; p <= node->info.numports; p++) { > + port = node->ports[p]; > + if (port && port->remoteport) > + out_ca_port(port, group); > + } > + } > + } > + > + } > + > + } else { /* !group */ > + iter_user_data.group = group; > + iter_user_data.skip_chassis_nodes = 0; > + > + ibnd_iter_nodes_type(fabric, switch_iter_func, > + IBND_SWITCH_NODE, &iter_user_data); > + } > + > + chname = NULL; > + if (group) { > + iter_user_data.group = group; > + iter_user_data.skip_chassis_nodes = 1; > + > + fprintf(f, "\nNon-Chassis Nodes\n"); > + ibnd_iter_nodes_type(fabric, switch_iter_func, > + IBND_SWITCH_NODE, &iter_user_data); > + > + } > + > + iter_user_data.group = group; > + iter_user_data.skip_chassis_nodes = 0; > + > + /* Make pass on CAs */ > + ibnd_iter_nodes_type(fabric, ca_iter_func, IBND_CA_NODE, > + &iter_user_data); > + > + /* make pass on routers */ > + ibnd_iter_nodes_type(fabric, router_iter_func, IBND_ROUTER_NODE, > + &iter_user_data); > + > + return i; > +} > + > + > +void dump_ports_report (ibnd_node_t *node, void *user_data) > +{ > + int p = 0; > + ibnd_port_t *port = NULL; > + > + /* for each port */ > + for (p = node->info.numports, port = node->ports[p]; > + p > 0; > + port = node->ports[--p]) { > + if (port == NULL) > + continue; > + > + fprintf(stdout, > + "%2s %5d %2d 0x%016" PRIx64 " %s %s", > + ibnd_node_type_str_short(node), > + node->info.type == IBND_SWITCH_NODE ? node->smalid : port->info.lid, > + port->portnum, > + port->guid, > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 0)); > + if (port->remoteport) > + fprintf(stdout, > + " - %2s %5d %2d 0x%016" PRIx64 > + " ( '%s' - '%s' )\n", > + ibnd_node_type_str_short(port->remoteport->node), > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? > + port->remoteport->node->smalid : port->remoteport->info.lid, > + port->remoteport->portnum, > + port->remoteport->guid, > + port->node->nodedesc, > + port->remoteport->node->nodedesc); > + else > + fprintf(stdout, "%36s'%s'\n", "", > + port->node->nodedesc); > + } > +} > + > +void > +usage(void) > +{ > + fprintf(stderr, "Usage: %s [-d(ebug)] -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -R(outer_list) -V(ersion) -C ca_name -P ca_port " > + "-t(imeout) timeout_ms --node-name-map node-name-map] -p(orts) []\n", > + argv0); > + fprintf(stderr, " --node-name-map specify a node name map file\n"); > + exit(-1); > +} > + > +int > +main(int argc, char **argv) > +{ > + int list = 0; > + char *ca = 0; > + int ca_port = 0; > + int group = 0; > + int ports_report = 0; > + ibnd_fabric_t *fabric = NULL; > + > + static char const str_opts[] = "C:P:t:devslgHSRpVhu"; > + static const struct option long_opts[] = { > + { "C", 1, 0, 'C'}, > + { "P", 1, 0, 'P'}, > + { "debug", 0, 0, 'd'}, > + { "verbose", 0, 0, 'v'}, > + { "show", 0, 0, 's'}, > + { "list", 0, 0, 'l'}, > + { "grouping", 0, 0, 'g'}, > + { "Hca_list", 0, 0, 'H'}, > + { "Switch_list", 0, 0, 'S'}, > + { "Router_list", 0, 0, 'R'}, > + { "timeout", 1, 0, 't'}, > + { "node-name-map", 1, 0, 1}, > + { "ports", 0, 0, 'p'}, > + { "Version", 0, 0, 'V'}, > + { "help", 0, 0, 'h'}, > + { "usage", 0, 0, 'u'}, > + { } > + }; > + > + f = stdout; > + > + argv0 = argv[0]; > + > + while (1) { > + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); > + if ( ch == -1 ) > + break; > + switch(ch) { > + case 1: > + node_name_map_file = strdup(optarg); > + break; > + case 'C': > + ca = optarg; > + break; > + case 'P': > + ca_port = strtoul(optarg, 0, 0); > + break; > + case 'd': > + debug = 1; > + ibnd_debug(1); > + break; > + case 't': > + timeout_ms = strtoul(optarg, 0, 0); > + break; > + case 'v': > + verbose++; > + break; > + case 's': > + ibnd_show_progress(1); > + break; > + case 'l': > + list = LIST_CA_NODE | LIST_SWITCH_NODE | LIST_ROUTER_NODE; > + break; > + case 'g': > + group = 1; > + break; > + case 'S': > + list |= LIST_SWITCH_NODE; > + break; > + case 'H': > + list |= LIST_CA_NODE; > + break; > + case 'R': > + list |= LIST_ROUTER_NODE; > + break; > + case 'p': > + ports_report = 1; > + break; > + default: > + usage(); > + break; > + } > + } > + argc -= optind; > + argv += optind; > + > + if (argc && !(f = fopen(argv[0], "w"))) > + fprintf(stderr, "can't open file %s for writing", argv[0]); > + > + node_name_map = open_node_name_map(node_name_map_file); > + > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { > + fprintf(stderr, "discover failed\n"); > + exit(1); > + } > + > + if (ports_report) > + ibnd_iter_nodes(fabric, > + dump_ports_report, > + NULL); > + else if (list) > + list_nodes(fabric, list); > + else > + dump_topology(group, fabric); > + > + ibnd_destroy_fabric(fabric); > + close_node_name_map(node_name_map); > + exit(0); > +} > diff --git a/infiniband-diags/libibnetdisc/test/testleaks.c b/infiniband-diags/libibnetdisc/test/testleaks.c > new file mode 100644 > index 0000000..3fbf7af > --- /dev/null > +++ b/infiniband-diags/libibnetdisc/test/testleaks.c > @@ -0,0 +1,268 @@ > +/* > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + */ > + > +#if HAVE_CONFIG_H > +# include > +#endif /* HAVE_CONFIG_H */ > + > +#define _GNU_SOURCE > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > + > +char *argv0 = "iblinkinfotest"; > +static FILE *f; > + > +static int timeout_ms = 500; > + > +void > +print_port(ibnd_node_t *node, ibnd_port_t *port) > +{ > + char remote_guid_str[256]; > + char remote_str[256]; > + char link_str[256]; > + char speed_msg[256]; > + char ext_port_str[256]; > + > + if (!port) > + return; > + > + remote_guid_str[0] = '\0'; > + remote_str[0] = '\0'; > + link_str[0] = '\0'; > + speed_msg[0] = '\0'; > + > + if (port->remoteport) { > + char remote_name_buf[256]; > + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); > + > + if (port->remoteport->ext_portnum) > + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); > + else > + ext_port_str[0] = '\0'; > + > + snprintf(remote_str, 256, > + "%s%6d %4d[%2s] \"%s\" (%s)\n", > + remote_guid_str, > + port->remoteport->info.lid ? > + port->remoteport->info.lid : > + port->remoteport->node->smalid, > + port->remoteport->portnum, > + ext_port_str, > + port->remoteport->node->nodedesc, > + speed_msg > + ); > + } else { > + snprintf(remote_str, 256, > + "%6s %4s[%2s] \"\" ( )\n", "", "", ""); > + } > + > + snprintf(link_str, 256, > + "(%3s %s %6s/%8s)", > + ibnd_linkwidth_str(port->info.link_width_active), > + ibnd_linkspeed_str(port->info.link_speed_active, 0), > + ibnd_linkstate_str(port->info.link_state), > + ibnd_physstate_str(port->info.phys_state) > + ); > + > + if (port->ext_portnum) > + snprintf(ext_port_str, 256, "%d", port->ext_portnum); > + else > + ext_port_str[0] = '\0'; > + > + printf(" %6d %4d[%2s] ==%s==> %s", > + node->smalid, port->portnum, > + ext_port_str, > + link_str, > + remote_str > + ); > +} > + > +void > +print_switch(ibnd_node_t *node, void *user_data) > +{ > + int i = 0; > + > + for (i = 1; i <= node->info.numports; i++) { > + ibnd_port_t *port = node->ports[i]; > + if (!port) > + continue; > + if (port->info.link_state == IBND_LINK_DOWN) { > + print_port(node, port); > + } > + } > +} > + > +void > +usage(void) > +{ > + fprintf(stderr, > + "Usage: %s [-hclp -S -D -C -P ]\n" > + " Report link speed and connection for each port of each switch which is active\n" > + " -h This help message\n" > + " -i Number of iterations to run (default -1 == infinate)\n" > + > + " -S output only the node specified by guid\n" > + " -D print only node specified by \n" > + " -f specify node to start \"from\"\n" > + " -n Number of hops to include away from specified node\n" > + > + " -t timeout for any single fabric query\n" > + " -s show errors\n" > + > + " -C use selected Channel Adaptor name for queries\n" > + " -P use selected channel adaptor port for queries\n" > + " --debug print debug messages\n" > + , > + argv0); > + exit(-1); > +} > + > +int > +main(int argc, char **argv) > +{ > + char *ca = 0; > + int ca_port = 0; > + ibnd_fabric_t *fabric = NULL; > + uint64_t guid = 0; > + char *dr_path = NULL; > + char *from = NULL; > + int hops = 0; > + ib_portid_t port_id; > + int iters = -1; > + > + static char const str_opts[] = "S:D:n:C:P:t:shuf:i:"; > + static const struct option long_opts[] = { > + { "S", 1, 0, 'S'}, > + { "D", 1, 0, 'D'}, > + { "num-hops", 1, 0, 'n'}, > + { "ca-name", 1, 0, 'C'}, > + { "ca-port", 1, 0, 'P'}, > + { "timeout", 1, 0, 't'}, > + { "show", 0, 0, 's'}, > + { "help", 0, 0, 'h'}, > + { "usage", 0, 0, 'u'}, > + { "debug", 0, 0, 2}, > + { "from", 1, 0, 'f'}, > + { "iters", 1, 0, 'i'}, > + { } > + }; > + > + f = stdout; > + > + argv0 = argv[0]; > + > + while (1) { > + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); > + if ( ch == -1 ) > + break; > + switch(ch) { > + case 2: > + ibnd_debug(1); > + break; > + case 'f': > + from = strdup(optarg); > + break; > + case 'C': > + ca = strdup(optarg); > + break; > + case 'P': > + ca_port = strtoul(optarg, 0, 0); > + break; > + case 'D': > + dr_path = strdup(optarg); > + break; > + case 'n': > + hops = (int)strtol(optarg, NULL, 0); > + break; > + case 'i': > + iters = (int)strtol(optarg, NULL, 0); > + break; > + case 't': > + timeout_ms = strtoul(optarg, 0, 0); > + break; > + case 'S': > + guid = (uint64_t)strtoull(optarg, 0, 0); > + break; > + default: > + usage(); > + break; > + } > + } > + argc -= optind; > + argv += optind; > + > + while (iters == -1 || iters-- > 0) { > + if (from) { > + /* only scan part of the fabric */ > + str2drpath(&(port_id.drpath), from, 0, 0); > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { > + fprintf(stderr, "discover failed\n"); > + exit(1); > + } > + guid = 0; > + } else { > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { > + fprintf(stderr, "discover failed\n"); > + exit(1); > + } > + } > + > +#if 0 > + if (guid) { > + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); > + print_switch(sw, NULL); > + } else if (dr_path) { > + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); > + print_switch(sw, NULL); > + } else { > + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); > + } > +#endif > + > + ibnd_destroy_fabric(fabric); > + } > + > + exit(0); > +} -- Albert Chu chu11 at llnl.gov Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory From jgunthorpe at obsidianresearch.com Tue Dec 23 10:43:31 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 23 Dec 2008 11:43:31 -0700 Subject: ***SPAM*** Re: [ofa-general] [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: <1230056943.23747.21.camel@auk31.llnl.gov> References: <20081211162031.0c591f54.weiny2@llnl.gov> <1230056943.23747.21.camel@auk31.llnl.gov> Message-ID: <20081223184331.GL31213@obsidianresearch.com> On Tue, Dec 23, 2008 at 10:29:02AM -0800, Al Chu wrote: > > +#define IBND_DEBUG(str, args...) \ > > + if (ibdebug) printf("%s:%d; "str, __FILE__, __LINE__, ##args) > > +#define IBND_ERROR(str, args...) \ > > + fprintf(stderr, "%s:%d; "str, __FILE__, __LINE__, ##args) > > I believe the "args ..." and "##args" are only for gcc. Not sure how > much this portability issue matters for OFED. Personally, I always > do Right that format is an obsolete gcc extension. Ira, it should be #define debug(format, ...) fprintf (stderr, format, __VA_ARGS__) Which is how C99 standardized varadic macros. Jason From jgunthorpe at obsidianresearch.com Tue Dec 23 10:52:14 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 23 Dec 2008 11:52:14 -0700 Subject: [ofa-general] PATCH[1/6] Windows port of libibmad - mad.h In-Reply-To: <000101c964c9$ebcaa460$c4e0180a@amr.corp.intel.com> References: <20081221205124.GE28259@sashak.voltaire.com> <000101c964c9$ebcaa460$c4e0180a@amr.corp.intel.com> Message-ID: <20081223185214.GM31213@obsidianresearch.com> On Mon, Dec 22, 2008 at 10:44:35PM -0800, Sean Hefty wrote: > >If we will add "extern" keyword for exported symbols and somewhere in > >windows-specific header file it will be redefined as > > > >#define extern __declspec(dllexport) > > I don't think we want to get into redefining keywords. Right, that's evil.. Anyhow, on linux it is considered a best practice for library authors to do the same as what the above option does on windows - explicitly mark symbols as exported. This prevents symbol table pollution. To do this you use the -fvisibility=hidden flag and mark exported symbols with __attribute__((visibility("default"))). Everything else will not be available for dynamic linking. > >> +MAD_EXPORT uint32_t mad_get_field(void *buf, int base_offs, int field); > >> +MAD_EXPORT void mad_set_field(void *buf, int base_offs, int field, uint32_t > >val); > > > >Windows don't like "inline"? > > The compiler doesn't allow it in the header file. Not even static inline? Inline functions in header should all be static inline or extern inline to avoid comdefs.. Jason From weiny2 at llnl.gov Tue Dec 23 11:34:49 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 23 Dec 2008 11:34:49 -0800 Subject: [ofa-general] ***SPAM*** Re: [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: <20081221152100.GN25208@sashak.voltaire.com> References: <20081211162031.0c591f54.weiny2@llnl.gov> <20081221152100.GN25208@sashak.voltaire.com> Message-ID: <20081223113449.7d6c629b.weiny2@llnl.gov> On Sun, 21 Dec 2008 17:21:00 +0200 Sasha Khapyorsky wrote: > Hi Ira, > > Some initial comments... > > On 16:20 Thu 11 Dec , Ira Weiny wrote: > > From d615162e547f3a2b2d1acd8c79c24ee691c96c95 Mon Sep 17 00:00:00 2001 > > From: Ira Weiny > > Date: Wed, 26 Nov 2008 12:54:47 -0800 > > Subject: [PATCH] Create a new library libibnetdisc > > > > This encompasses the functionality of ibnetdiscover in a C library. It returns > > a single "ibnd_fabric_t" object which represents the data found during the > > scan. The NodeInfo, PortInfo, and SwitchInfo are preserved from the queries > > made on the fabric to be used by the calling function as they see fit. > > > > This greatly benefits some diags like iblinkinfo.pl. This diag in particular > > was re-written using this library in C and has shown an 85% speed up on a ~1000 > > node cluster. > > > > Previous iblinkinfo.pl > > real 3m35.876s > > user 0m13.210s > > sys 1m1.046s > > > > New iblinkinfotest > > real 0m32.869s > > user 0m0.067s > > sys 0m0.140s > > > > Signed-off-by: Ira Weiny > > --- > > infiniband-diags/Makefile.am | 1 + > > infiniband-diags/configure.in | 31 +- > > infiniband-diags/libibnetdisc/Makefile.am | 66 ++ > > .../libibnetdisc/include/infiniband/ibnetdisc.h | 276 ++++++ > > infiniband-diags/libibnetdisc/libibnetdisc.ver | 9 + > > infiniband-diags/libibnetdisc/man/ibnd_debug.3 | 2 + > > .../libibnetdisc/man/ibnd_destroy_fabric.3 | 2 + > > .../libibnetdisc/man/ibnd_discover_fabric.3 | 49 ++ > > .../libibnetdisc/man/ibnd_find_node_dr.3 | 2 + > > .../libibnetdisc/man/ibnd_find_node_guid.3 | 25 + > > .../libibnetdisc/man/ibnd_iter_nodes.3 | 24 + > > .../libibnetdisc/man/ibnd_iter_nodes_type.3 | 2 + > > .../libibnetdisc/man/ibnd_linkspeed_str.3 | 2 + > > .../libibnetdisc/man/ibnd_linkstate_str.3 | 2 + > > .../libibnetdisc/man/ibnd_linkwidth_str.3 | 26 + > > .../libibnetdisc/man/ibnd_node_type_str.3 | 2 + > > .../libibnetdisc/man/ibnd_node_type_str_short.3 | 2 + > > .../libibnetdisc/man/ibnd_physstate_str.3 | 2 + > > .../libibnetdisc/man/ibnd_show_progress.3 | 2 + > > .../libibnetdisc/man/ibnd_update_node.3 | 21 + > > infiniband-diags/libibnetdisc/src/chassis.c | 818 ++++++++++++++++++ > > infiniband-diags/libibnetdisc/src/chassis.h | 85 ++ > > infiniband-diags/libibnetdisc/src/ibnetdisc.c | 872 ++++++++++++++++++++ > > infiniband-diags/libibnetdisc/src/internal.h | 82 ++ > > infiniband-diags/libibnetdisc/src/libibnetdisc.map | 27 + > > .../libibnetdisc/test/iblinkinfotest.c | 395 +++++++++ > > infiniband-diags/libibnetdisc/test/ibnetdisctest.c | 675 +++++++++++++++ > > infiniband-diags/libibnetdisc/test/testleaks.c | 268 ++++++ > > 28 files changed, 3769 insertions(+), 1 deletions(-) > > create mode 100644 infiniband-diags/libibnetdisc/Makefile.am > > create mode 100644 infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h > > create mode 100644 infiniband-diags/libibnetdisc/libibnetdisc.ver > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_debug.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 > > create mode 100644 infiniband-diags/libibnetdisc/man/ibnd_update_node.3 > > create mode 100644 infiniband-diags/libibnetdisc/src/chassis.c > > create mode 100644 infiniband-diags/libibnetdisc/src/chassis.h > > create mode 100644 infiniband-diags/libibnetdisc/src/ibnetdisc.c > > create mode 100644 infiniband-diags/libibnetdisc/src/internal.h > > create mode 100644 infiniband-diags/libibnetdisc/src/libibnetdisc.map > > create mode 100644 infiniband-diags/libibnetdisc/test/iblinkinfotest.c > > create mode 100644 infiniband-diags/libibnetdisc/test/ibnetdisctest.c > > create mode 100644 infiniband-diags/libibnetdisc/test/testleaks.c > > > > diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am > > index c22ba5e..8e8c3c1 100644 > > --- a/infiniband-diags/Makefile.am > > +++ b/infiniband-diags/Makefile.am > > @@ -1,3 +1,4 @@ > > +SUBDIRS = libibnetdisc > > > > INCLUDES = -I$(top_builddir)/include/ -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband > > > > diff --git a/infiniband-diags/configure.in b/infiniband-diags/configure.in > > index 5509fec..7c346e2 100644 > > --- a/infiniband-diags/configure.in > > +++ b/infiniband-diags/configure.in > > @@ -145,6 +145,34 @@ IBSCRIPTPATH_TMP2="`echo $IBSCRIPTPATH_TMP1 | sed 's/^NONE/$ac_default_prefix/'` > > IBSCRIPTPATH="`eval echo $IBSCRIPTPATH_TMP2`" > > AC_SUBST(IBSCRIPTPATH) > > > > +dnl Begin libibnetdisc stuff > > +AC_CHECK_HEADERS([stdint.h stdlib.h string.h syslog.h unistd.h]) > > I cannot find where syslog.h is actually used in infiniband-diags. Ok, I will remove it both places. Also I don't think it is necessary to check for the other headers again here. It looks like I have some other duplication with the enable-test-utils option. I have fixed this as well. > > > +AC_CHECK_FUNCS([strrchr strtoul strtoull]) > > + > > +ibnetdisc_api_version=`grep LIBVERSION $srcdir/libibnetdisc/libibnetdisc.ver | sed 's/LIBVERSION=//'` > > +if test -z $ibnetdisc_api_version; then > > + echo "FAILED to find $srcdir/libibnetdisc/libibnetdisc.ver" > > + exit 1 > > +fi > > +AC_SUBST(ibnetdisc_api_version) > > +AC_DEFINE_UNQUOTED(API_VERSION, > > + ["$ibnetdisc_api_version"], > > + [The API version of this library]) > > + > > +AC_MSG_CHECKING(for --enable-test-utils) > > +AC_ARG_ENABLE(test-utils, > > +[ --enable-test-utils build additional test utilities (default=no)], > > +[case "${enableval}" in > > + yes) tutils=yes ;; > > + no) tutils=no ;; > > + *) AC_MSG_ERROR(bad value ${enableval} for --enable-test-utils) ;; > > +esac],[tutils=no]) > > +AM_CONDITIONAL(ENABLE_TEST_UTILS, test x$tutils = xyes) > > +AC_MSG_RESULT(${tutils=no}) > > + > > +dnl End libibnetdisc stuff > > + > > + > > AC_CONFIG_FILES([\ > > Makefile \ > > infiniband-diags.spec \ > > @@ -165,6 +193,7 @@ AC_CONFIG_FILES([\ > > scripts/ibhosts \ > > scripts/ibnodes \ > > scripts/ibswitches \ > > - scripts/ibrouters > > + scripts/ibrouters \ > > + libibnetdisc/Makefile > > ]) > > AC_OUTPUT > > diff --git a/infiniband-diags/libibnetdisc/Makefile.am b/infiniband-diags/libibnetdisc/Makefile.am > > new file mode 100644 > > index 0000000..7b478b1 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/Makefile.am > > @@ -0,0 +1,66 @@ > > + > > +#SUBDIRS = . > > + > > +INCLUDES = -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband > > + > > +lib_LTLIBRARIES = libibnetdisc.la > > +sbin_PROGRAMS = > > + > > +if ENABLE_TEST_UTILS > > +sbin_PROGRAMS += test/ibnetdisctest \ > > + test/iblinkinfotest \ > > + test/testleaks > > +endif > > + > > +DBGFLAGS = -g > > + > > +if HAVE_LD_VERSION_SCRIPT > > +libibnetdisc_version_script = -Wl,--version-script=$(srcdir)/src/libibnetdisc.map > > +else > > +libibnetdisc_version_script = > > +endif > > + > > +libibnetdisc_la_SOURCES = src/ibnetdisc.c src/chassis.c src/chassis.h > > +libibnetdisc_la_CFLAGS = -Wall $(DBGFLAGS) > > +libibnetdisc_la_LDFLAGS = -version-info $(ibnetdisc_api_version) \ > > + -export-dynamic $(libibnetdisc_version_script) \ > > + -losmcomp -libmad > > +libibnetdisc_la_DEPENDENCIES = $(srcdir)/src/libibnetdisc.map > > + > > +libibnetdiscincludedir = $(includedir)/infiniband > > + > > +test_ibnetdisctest_SOURCES = test/ibnetdisctest.c > > +test_ibnetdisctest_CFLAGS = -Wall $(DBGFLAGS) > > +test_ibnetdisctest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ > > + -libcommon -libnetdisc > > + > > +test_iblinkinfotest_SOURCES = test/iblinkinfotest.c > > +test_iblinkinfotest_CFLAGS = -Wall $(DBGFLAGS) > > +test_iblinkinfotest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ > > + -libcommon -libnetdisc > > + > > +test_testleaks_SOURCES = test/testleaks.c > > +test_testleaks_CFLAGS = -Wall $(DBGFLAGS) > > +test_testleaks_LDFLAGS = -Wl,--rpath -Wl,$(libdir) \ > > + -libcommon -libnetdisc > > Having --rpath in Makefiles is not permitted by FC and RH package review > process. I know that it is not something introduced by this patch and > infiniband-diags/Makefile.am has it already, but I think to clean this > up some days. Ok, I removed them. Better to start things off right! > > > + > > +libibnetdiscinclude_HEADERS = $(srcdir)/include/infiniband/ibnetdisc.h > > + > > +man_MANS = man/ibnd_debug.3 \ > > + man/ibnd_destroy_fabric.3 \ > > + man/ibnd_discover_fabric.3 \ > > + man/ibnd_find_node_dr.3 \ > > + man/ibnd_find_node_guid.3 \ > > + man/ibnd_iter_nodes.3 \ > > + man/ibnd_iter_nodes_type.3 \ > > + man/ibnd_linkspeed_str.3 \ > > + man/ibnd_linkstate_str.3 \ > > + man/ibnd_linkwidth_str.3 \ > > + man/ibnd_node_type_str.3 \ > > + man/ibnd_physstate_str.3 \ > > + man/ibnd_update_node.3 \ > > + man/ibnd_show_progress.3 > > + > > +EXTRA_DIST = libibnetdisc.spec.in libibnetdisc.spec \ > > Files *.spec.in and *.spec don't exist anymore and 'make dist' fails. My bad. :-( sorry... > > > + $(srcdir)/src/libibnetdisc.map libibnetdisc.ver autogen.sh > > + > > diff --git a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h > > new file mode 100644 > > index 0000000..cdee2bd > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h > > @@ -0,0 +1,276 @@ > > +/* > > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > > + * > > + * This software is available to you under a choice of one of two > > + * licenses. You may choose to be licensed under the terms of the GNU > > + * General Public License (GPL) Version 2, available from the file > > + * COPYING in the main directory of this source tree, or the > > + * OpenIB.org BSD license below: > > + * > > + * Redistribution and use in source and binary forms, with or > > + * without modification, are permitted provided that the following > > + * conditions are met: > > + * > > + * - Redistributions of source code must retain the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer. > > + * > > + * - Redistributions in binary form must reproduce the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer in the documentation and/or other materials > > + * provided with the distribution. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > > + * SOFTWARE. > > + * > > + */ > > + > > +#ifndef _IBNETDISC_H_ > > +#define _IBNETDISC_H_ > > + > > +#include > > +#include > > + > > +#define MAXHOPS 63 > > + > > +/* HASH table defines */ > > +#define HASHGUID(guid) ((uint32_t)(((uint32_t)(guid) * 101) ^ ((uint32_t)((guid) >> 32) * 103))) > > +#define HTSZ 137 > > + > > +#define IBND_DEBUG(str, args...) \ > > + if (ibdebug) printf("%s:%d; "str, __FILE__, __LINE__, ##args) > > +#define IBND_ERROR(str, args...) \ > > + fprintf(stderr, "%s:%d; "str, __FILE__, __LINE__, ##args) > > + > > +/** ========================================================================= > > + * ENUM definitions > > + */ > > +typedef enum { > > + IBND_CA_NODE = 1, > > + IBND_SWITCH_NODE = 2, > > + IBND_ROUTER_NODE = 3 > > +} ibnd_node_type_t; > > + > > +typedef enum { > > + IBND_LINK_DOWN = 1, > > + IBND_LINK_INIT = 2, > > + IBND_LINK_ARMED = 3, > > + IBND_LINK_ACTIVE = 4 > > +} ibnd_link_state_t; > > + > > +/** ========================================================================= > > + * Node > > + */ > > +typedef struct switch_info { > > + int smaenhsp0; > > +} ibnd_switch_info_t; > > + > > +typedef struct node_info { > > + int base_ver; > > + int class_ver; > > + int type; > > + int numports; > > + uint64_t sysimgguid; > > + uint64_t nodeguid; > > + uint64_t nodeportguid; > > + uint16_t partition_cap; > > + uint32_t devid; > > + uint32_t revision; > > + int localport; > > + uint32_t vendid; > > +} ibnd_node_info_t; > > + > > +struct ib_fabric; /* forward declare */ > > +struct chassis; /* forward declare */ > > +struct port; /* forward declare */ > > + > > +typedef struct node { > > + struct node *next; /* all node list in fabric */ > > + struct ib_fabric *fabric; /* the fabric node belongs to */ > > + > > + ib_portid_t path_portid; /* path from "from_node" */ > > + int dist; /* num of hops from "from_node" */ > > + int smalid; > > + int smalmc; > > + ibnd_switch_info_t sw_info; > > + ibnd_node_info_t info; > > + char nodedesc[64]; > > + struct port **ports; /* in order array of port pointers */ > > + /* the size of this array is info.numports + 1 */ > > + /* items MAY BE NULL! (ie 0 == switches only) */ > > + > > + /* chassis info */ > > + struct node *next_chassis_node; /* next node in ibnd_chassis_t->nodes */ > > + struct chassis *chassis; /* if != NULL the chassis this node belongs to */ > > + unsigned char ch_type; > > + unsigned char ch_anafanum; > > + unsigned char ch_slotnum; > > + unsigned char ch_slot; > > +} ibnd_node_t; > > + > > +/** ========================================================================= > > + * Port > > + */ > > +typedef struct port_info { > > + int lid; > > + int smlid; > > + int link_speed_supported; > > + int link_speed_enabled; > > + int link_speed_active; > > + int link_state; > > + int phys_state; > > + int link_down_def_state; > > + int mkey_prot_bits; > > + int lmc; > > + int neighbor_mtu; > > + int smsl; > > + int init_type; > > + int vl_capability; > > + int vl_high_limit; > > + int vl_arb_high_cap; > > + int vl_arb_low_cap; > > + int init_reply; > > + int mtu_cap; > > + int vl_stall_count; > > + int hoq_lifetime; > > + int oper_vls; > > + int partition_enforce_in; > > + int partition_enforce_out; > > + int filter_raw_in; > > + int filter_raw_out; > > + int mkey_violations; > > + int pkey_violations; > > + int qkey_violations; > > + int guid_capabilities; > > + int client_rereg; > > + int subnet_timeout; > > + int response_time_val; > > + int local_phys_error; > > + int overrun_error; > > + int max_credit_hint; > > + uint32_t link_round_trip; > > + int local_port; > > + int link_width_supported; > > + int link_width_enabled; > > + int link_width_active; > > + int diag_code; > > + int mkey_lease; > > + uint32_t capability_mask; > > + uint64_t mkey; > > + uint64_t gid_prefix; > > +} ibnd_port_info_t; > > What is the reason to redeclear custom NodeInfo and PortInfo structures? > The original are defined by IBA and there are lot of utilities to work > with them. Wouldn't it be better to use it as is? > [DISCLAIMER] First I want to answer your questions directly. However, after writing this information, discussing with Al, and thinking it over. I think I see where you are coming from and I _may_ agree with you. So after reading these responses please read my thoughts regarding libibmad and this new lib. I have 3 reasons I did it this way: 1) This is pretty much the way that ibnetdiscover did things (By using mad_decode_field into these single fields) 2) This makes libibnetdisc only dependent on libibmad rather than the OpenSM libs. We have had some people complain that the diags require opensm-* things to be installed. (This assumes you want to use ib_port_info_t from ib_types.h) 3) This structure is in host byte order and calls out each field independently rather than having to have intimate knowledge of the PortInfo wire packet. For example this is the code used in iblinkinfo with the above structure. snprintf(link_str, 256, "(%3s %s %6s/%8s)", ibnd_linkwidth_str(port->info.link_width_active), ibnd_linkspeed_str(port->info.link_speed_active, 1), ibnd_linkstate_str(port->info.link_state), ibnd_physstate_str(port->info.phys_state) ); Here is the code if you use the ib_port_info_t from ib_types.h snprintf(link_str, 256, "(%3s %s %6s/%8s)", ibnd_linkwidth_str(port->info.link_width_active), ibnd_linkspeed_str(IB_PORT_LINK_SPEED_ACTIVE_MASK(port->info.link_speed), 1), ibnd_linkstate_str(IB_PORT_STATE_MASK(port->info.state_info1)), ibnd_physstate_str(IB_PORT_PHYS_STATE_MASK(port->info.state_info2) >> IB_PORT_PHYS_STATE_SHIFT) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // This is particularly nasty compared to the above. ); I no longer agree with reason 1 and 2. However, reason 3 I believe is enough justification to declare a new type. [DISCLAIMER] item 3 might be a mute point as well if you redefine what libibnetdisc is supposed to be. See below. > > > + > > +typedef struct port { > > + uint64_t guid; > > + int portnum; > > + int ext_portnum; /* optional if != 0 external port num */ > > + ibnd_node_t *node; /* node this port belongs to */ > > + ibnd_port_info_t info; > > + struct port *remoteport; /* null if SMA, or does not exist */ > > +} ibnd_port_t; > > + > > + > > +/** ========================================================================= > > + * Chassis data > > + */ > > +typedef struct chassis { > > + struct chassis *next; > > + uint64_t chassisguid; > > + int chassisnum; > > + > > + /* generic grouping by SystemImageGUID */ > > + int nodecount; > > + ibnd_node_t *nodes; > > + > > + /* specific to voltaire type nodes */ > > +#define SPINES_MAX_NUM 12 > > +#define LINES_MAX_NUM 36 > > + ibnd_node_t *spinenode[SPINES_MAX_NUM + 1]; > > + ibnd_node_t *linenode[LINES_MAX_NUM + 1]; > > +} ibnd_chassis_t; > > + > > +/** ========================================================================= > > + * Fabric > > + * Main fabric object which is returned and represents the data discovered > > + */ > > +typedef struct ib_fabric { > > + /* the node the discover was initiated from > > + * "from" parameter in ibnd_discover_fabric > > + * or by default the node you ar running on > > + */ > > + ibnd_node_t *from_node; > > + /* NULL term list of all nodes in the fabric */ > > + ibnd_node_t *nodes; > > + /* NULL terminated list of all chassis found in the fabric */ > > + ibnd_chassis_t *chassis; > > + int maxhops_discovered; > > +} ibnd_fabric_t; > > + > > + > > +/** ========================================================================= > > + * Initialization (fabric operations) > > + */ > > +void ibnd_debug(int i); > > +void ibnd_show_progress(int i); > > + > > +ibnd_fabric_t *ibnd_discover_fabric(char *dev_name, int dev_port, > > + int timeout_ms, ib_portid_t *from, int hops); > > + /** > > + * dev_name: (required) local device name to use to access the fabric > > + * dev_port: (required) local device port to use to access the fabric > > + * timeout_ms: (required) gives the timeout for a _SINGLE_ query on > > + * the fabric. So if there are mutiple nodes not > > + * responding this may result in a lengthy delay. > > + * from: (optional) specify the node to start scanning from. > > + * If NULL start from the node we are running on. > > + * hops: (optional) Specify how much of the fabric to traverse. > > + * negative value == scan entire fabric > > + */ > > +void ibnd_destroy_fabric(ibnd_fabric_t *fabric); > > + > > +/** ========================================================================= > > + * Node operations > > + */ > > +ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid); > > +ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str); > > +ibnd_node_t *ibnd_update_node(ibnd_node_t *node); > > + > > +typedef void (*ibnd_iter_node_func_t)(ibnd_node_t *node, void *user_data); > > +void ibnd_iter_nodes(ibnd_fabric_t *fabric, > > + ibnd_iter_node_func_t func, > > + void *user_data); > > +void ibnd_iter_nodes_type(ibnd_fabric_t *fabric, > > + ibnd_iter_node_func_t func, > > + ibnd_node_type_t node_type, > > + void *user_data); > > + > > +/** ========================================================================= > > + * Str convert functions > > + */ > > +char *ibnd_linkwidth_str(int link_width); > > +char *ibnd_linkstate_str(int link_state); > > +char *ibnd_physstate_str(int phys_state); > > +const char *ibnd_node_type_str(ibnd_node_t *node); > > +const char *ibnd_node_type_str_short(ibnd_node_t *node); > > +char *ibnd_linkspeed_str(int link_speed, int data_rate); > > + /* if data_rate == 0 use "SDR", "DDR", etc. */ > > + /* if data_rate == 1 use "2.5 Gbps", "5.0 Gbps", etc. */ > > Similar functions exist in libibmad. Why do we need another set? 2 reasons. 1) The strings returned are not compatible with the current output of ibnetdiscover and iblinkinfo... I was trying to make sure that the library returned string which were backwards compatible. That is actually the reason for the extra "data_rate" parameter of linkspeed. iblinkinfo and ibnetdiscover print this differently. :-( 2) But more importantly this is an ease of use issue. This: snprintf(link_str, 256, "(%3s %s %6s/%8s)", ibnd_linkwidth_str(port->info.link_width_active), ibnd_linkspeed_str(port->info.link_speed_active, 1), ibnd_linkstate_str(port->info.link_state), ibnd_physstate_str(port->info.phys_state) ); Becomes this: char buf[256]; ... snprintf(link_str, 256, "(%3s %s %6s/%8s)", mad_dump_val(IB_PORT_LINK_WIDTH_ACTIVE_F, buf, 256, &port->info.link_width_active); mad_dump_val(IB_PORT_LINK_SPEED_ACTIVE_F, buf, 256, &port->info.link_speed_active); // ^^^^^^^^^^^^^^^^^ // Not backwards compatible with the current ibnetdiscover // as it prints the data as "2.5 Gbps" rather than "SDR" mad_dump_val(IB_PORT_STATE_F, buf, 256, &port->info.link_state); mad_dump_val(IB_PORT_PHYS_STATE_F, buf, 256, &port->info.phys_state);); Users don't need to go look up in mad.h for the field enum to print something they already have; "link_width_active". Anyway, I think I am starting to see the difference in what we are thinking... The ibnd_*_str functions and the ibnd_port_info_t were designed based on libibnetdisc being a "one stop shop" for this data. I envisioned this library being a wrapper around lower level libraries which would abstract away some details, something like this. +----------+ +----------+ | diag1 | | diag2 | +----------+ +----------+ | | +-----------------+ | libibnetdisc | +-----------------+ | +-----------------+ | libibmad | +-----------------+ I think what you had in mind was something like: +--------+ -| diag 1 |- / +--------+ \ +-----------------+ +--------+ +-----------------+ | libibnetdisc | -| diag 2 |--| libibmad | +-----------------+ +--------+ +-----------------+ \ / -------------------------- In this case users of libibnetdisc might get back something like: typedef struct port { uint64_t guid; int portnum; int ext_portnum; /* optional if != 0 external port num */ ibnd_node_t *node; /* node this port belongs to */ struct port *remoteport; /* null if SMA, or does not exist */ void *port_info; /* or uint8_t port_info[port_info_size] */ } ibnd_port_t; and decode port_info like this: uint32_t lid = mad_get_field(port->port_info, 0, IB_PORT_LID_F); mad_dump_val(IB_PORT_LID_F, port->port_info, &lid); Is that what you are thinking? If this is the case I don't think I object. I think it makes the end user of libibnetdisc work harder but it does offer some advantages, namely less redefinition and a bit more flexibility. That said, I would like to clean up the mad interface at the same time. Just figuring out the examples to write in this email have taken a lot of time. I don't think this is a good thing. Here are some examples: add something like: static inline char * mad_snprint_field(uint8_t *buf, int base_offs, int field, char *print_buf, int print_buf_size) Therefore the above could be used in a print statement like: char tmp[256]; printf("lid %s\n", mad_snprint_field(port->port_info, 0, IB_PORT_LID_F, tmp, 256)); [Although lid is a bad example since it could be done with "%d"... But you get what I mean.] And along those lines the difference between mad_dump_field and mad_dump_val needs to be made more clear. They have the same signature but one has a lot of formating added to it which I don't think is appropriate at this level. "LinkState:.......................Active" vs. "Active" Also, I don't think that the following declarations need to be public. /* dump.c */ ib_mad_dump_fn mad_dump_int, mad_dump_uint, mad_dump_hex, mad_dump_rhex, mad_dump_bitfield, mad_dump_array, mad_dump_string, mad_dump_linkwidth, mad_dump_linkwidthsup, mad_dump_linkwidthen, mad_dump_linkdowndefstate, mad_dump_linkspeed, mad_dump_linkspeedsup, mad_dump_linkspeeden, mad_dump_portstate, mad_dump_portstates, mad_dump_physportstate, mad_dump_portcapmask, mad_dump_mtu, mad_dump_vlcap, mad_dump_opervls, mad_dump_node_type, mad_dump_sltovl, mad_dump_vlarbitration, mad_dump_nodedesc, mad_dump_nodeinfo, mad_dump_portinfo, mad_dump_switchinfo, mad_dump_perfcounters, mad_dump_perfcounters_ext; int _mad_dump(ib_mad_dump_fn *fn, char *name, void *val, int valsz); char * _mad_dump_field(ib_field_t *f, char *name, char *buf, int bufsz, void *val); int _mad_print_field(ib_field_t *f, char *name, void *val, int valsz); char * _mad_dump_val(ib_field_t *f, char *buf, int bufsz, void *val); They confuse the ibmad layer. If this is what you would like I will rework the library. Perhaps starting to clean up libibmad along the way? Ira > > Sasha > > > + > > +/** ========================================================================= > > + * Chassis queries > > + */ > > +uint64_t ibnd_get_chassis_guid(ibnd_fabric_t *fabric, unsigned char chassisnum); > > +char *ibnd_get_chassis_type(ibnd_node_t *node); > > +char *ibnd_get_chassis_slot_str(ibnd_node_t *node, char *str, size_t size); > > + > > +int ibnd_is_xsigo_guid(uint64_t guid); > > +int ibnd_is_xsigo_tca(uint64_t guid); > > +int ibnd_is_xsigo_hca(uint64_t guid); > > + > > +#endif /* _IBNETDISC_H_ */ > > diff --git a/infiniband-diags/libibnetdisc/libibnetdisc.ver b/infiniband-diags/libibnetdisc/libibnetdisc.ver > > new file mode 100644 > > index 0000000..a0a5f3c > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/libibnetdisc.ver > > @@ -0,0 +1,9 @@ > > +# In this file we track the current API version > > +# of the IB net discover interface (and libraries) > > +# The version is built of the following > > +# tree numbers: > > +# API_REV:RUNNING_REV:AGE > > +# API_REV - advance on any added API > > +# RUNNING_REV - advance any change to the vendor files > > +# AGE - number of backward versions the API still supports > > +LIBVERSION=1:0:0 > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_debug.3 b/infiniband-diags/libibnetdisc/man/ibnd_debug.3 > > new file mode 100644 > > index 0000000..a4076fc > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_debug.3 > > @@ -0,0 +1,2 @@ > > +.\".TH IBND_DEBUG 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.so man3/ibnd_discover_fabric.3 > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 b/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 > > new file mode 100644 > > index 0000000..8fe20ae > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_destroy_fabric.3 > > @@ -0,0 +1,2 @@ > > +.\".TH IBND_DESTROY_FABRIC 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.so man3/ibnd_discover_fabric.3 > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 b/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 > > new file mode 100644 > > index 0000000..44d8c65 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_discover_fabric.3 > > @@ -0,0 +1,49 @@ > > +.TH IBND_DISCOVER_FABRIC 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.SH "NAME" > > +ibnd_discover_fabric, ibnd_destroy_fabric, ibnd_debug ibnd_show_progress \- initialize ibnetdiscover library. > > +.SH "SYNOPSIS" > > +.nf > > +.B #include > > +.sp > > +.BI "ibnd_fabric_t *ibnd_discover_fabric(char *dev_name, int dev_port, int timeout_ms, ib_portid_t *from, int hops)" > > +.BI "void ibnd_destroy_fabric(ibnd_fabric_t *fabric)" > > +.BI "void ibnd_debug(int i)" > > +.BI "void ibnd_show_progress(int i)" > > + > > + > > +.SH "DESCRIPTION" > > +.B ibnd_discover_fabric() > > +Discover the fabric connected to the port specified by dev_name and dev_port, using a timeout specified. The "from" and "hops" parameters are optional and allow one to scan part of a fabric by specifying a node "from" and a number of hops away from that node to scan, "hops". This gives the user a "sub-fabric" which is "centered" anywhere they chose. > > + > > +.B ibnd_destroy_fabric() > > +free all memory and resources associated with the fabric. > > + > > +.B ibnd_debug() > > +Set the debug level to be printed as library operations take place. > > + > > +.B ibnd_debug() > > +Indicate that the library should print debug output which shows it's progress > > +through the fabric. > > + > > +.SH "RETURN VALUE" > > +.B ibnd_discover_fabric() > > +return NULL on failure, otherwise a valid ibnd_fabric_t object. > > + > > +.B ibnd_destory_fabric(), ibnd_debug() > > +NONE > > + > > +.SH "EXAMPLES" > > + > > +.B Discover the entire fabric connected to device "mthca0", port 1. > > + > > + ibnd_discover_fabric("mthca0", 1, 100, NULL, 0); > > + > > +.B Discover only a single node and those nodes connected to it. > > + > > + str2drpath(&(port_id.drpath), from, 0, 0); > > + > > + ibnd_discover_fabric("mthca0", 1, 100, &port_id, 1); > > + > > +.SH "AUTHORS" > > +.TP > > +Ira Weiny > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 b/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 > > new file mode 100644 > > index 0000000..612e501 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_find_node_dr.3 > > @@ -0,0 +1,2 @@ > > +.\".TH IBND_FIND_NODE_DR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.so man3/ibnd_find_node_guid.3 > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 b/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 > > new file mode 100644 > > index 0000000..676b528 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_find_node_guid.3 > > @@ -0,0 +1,25 @@ > > +.TH IBND_FIND_NODE_GUID 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.SH "NAME" > > +ibnd_find_node_guid, ibnd_find_node_dr \- given a fabric object find the node object within it which matches the guid or directed route specified. > > + > > +.SH "SYNOPSIS" > > +.nf > > +.B #include > > +.sp > > +.BI "ibnd_node_t *ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid)" > > +.BI "ibnd_node_t *ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str)" > > + > > +.SH "DESCRIPTION" > > +.B ibnd_find_node_guid() > > +Given a fabric object and a guid, return the ibnd_node_t object with that node guid. > > +.B ibnd_find_node_dr() > > +Given a fabric object and a directed route, return the ibnd_node_t object with > > +that directed route. > > + > > +.SH "RETURN VALUE" > > +.B ibnd_find_node_guid(), ibnd_find_node_dr() > > +return NULL on failure, otherwise a valid ibnd_node_t object. > > + > > +.SH "AUTHORS" > > +.TP > > +Ira Weiny > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 > > new file mode 100644 > > index 0000000..7199dfb > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes.3 > > @@ -0,0 +1,24 @@ > > +.TH IBND_ITER_NODES 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.SH "NAME" > > +ibnd_iter_nodes, ibnd_iter_nodes_type \- given a fabric object and a function itterate over the nodes in the fabric. > > + > > +.SH "SYNOPSIS" > > +.nf > > +.B #include > > +.sp > > +.BI "void ibnd_iter_nodes(ibnd_fabric_t *fabric, ibnd_iter_func_t func, void *user_data)" > > +.BI "void ibnd_iter_nodes_type(ibnd_fabric_t *fabric, ibnd_iter_func_t func, ibnd_node_type_t type, void *user_data)" > > + > > +.SH "DESCRIPTION" > > +.B ibnd_iter_nodes() > > +Itterate through all the nodes in the fabric and call "func" on them. > > +.B ibnd_iter_nodes_type() > > +The same as ibnd_iter_nodes except to limit the iteration to the nodes with the specified type. > > + > > +.SH "RETURN VALUE" > > +.B ibnd_iter_nodes(), ibnd_iter_nodes_type() > > +NONE > > + > > +.SH "AUTHORS" > > +.TP > > +Ira Weiny > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 > > new file mode 100644 > > index 0000000..878547c > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_iter_nodes_type.3 > > @@ -0,0 +1,2 @@ > > +.\".TH IBND_FIND_NODES_TYPE 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.so man3/ibnd_find_nodes.3 > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 > > new file mode 100644 > > index 0000000..128cd3e > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkspeed_str.3 > > @@ -0,0 +1,2 @@ > > +.\".TH IBND_LINKSPEED_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.so man3/ibnd_linkwidth_str.3 > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 > > new file mode 100644 > > index 0000000..2fa9189 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkstate_str.3 > > @@ -0,0 +1,2 @@ > > +.\".TH IBND_LINKSTATE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.so man3/ibnd_linkwidth_str.3 > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 > > new file mode 100644 > > index 0000000..2cd4f0a > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_linkwidth_str.3 > > @@ -0,0 +1,26 @@ > > +.TH IBND_LINKWIDTH_STR 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.SH "NAME" > > +ibnd_linkwidth_str, ibnd_linkspeed_str, ibnd_linkstate_str, ibnd_physstate_str, ibnd_node_type_str \- prety string functions. > > + > > +.SH "SYNOPSIS" > > +.nf > > +.B #include > > +.sp > > +.BI > > +.BI "char *ibnd_linkwidth_str(int link_width)" > > +.BI "char *ibnd_linkspeed_str(int link_speed)" > > +.BI "char *ibnd_linkstate_str(int link_state)" > > +.BI "char *ibnd_physstate_str(int phys_state)" > > +.BI "const char *ibnd_node_type_str(ibnd_node_t *node)" > > +.BI "const char *ibnd_node_type_str_short(ibnd_node_t *node)" > > + > > +.SH "DESCRIPTION" > > +Return user readable strings for the values given. > > + > > +.BI "const char *ibnd_node_type_str_short(ibnd_node_t *node)" > > +Returns a shorter abbreviated version of the string. > > + > > + > > +.SH "AUTHORS" > > +.TP > > +Ira Weiny > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 > > new file mode 100644 > > index 0000000..77dbf07 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str.3 > > @@ -0,0 +1,2 @@ > > +.\".TH IBND_NODE_TYPE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.so man3/ibnd_linkwidth_str.3 > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 > > new file mode 100644 > > index 0000000..62feb6e > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_node_type_str_short.3 > > @@ -0,0 +1,2 @@ > > +.\".TH IBND_NODE_TYPE_STR_SHORT 3 "Aug 05, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.so man3/ibnd_linkwidth_str.3 > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 b/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 > > new file mode 100644 > > index 0000000..aeeaeb7 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_physstate_str.3 > > @@ -0,0 +1,2 @@ > > +.\".TH IBND_PHYSSTATE_STR 3 "Aug 04, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.so man3/ibnd_physstate_str.3 > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 b/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 > > new file mode 100644 > > index 0000000..280af31 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_show_progress.3 > > @@ -0,0 +1,2 @@ > > +.\".TH IBND_SHOW_PROGRESS 3 "Nov 26, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.so man3/ibnd_discover_fabric.3 > > diff --git a/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 b/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 > > new file mode 100644 > > index 0000000..d3aa206 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/man/ibnd_update_node.3 > > @@ -0,0 +1,21 @@ > > +.TH IBND_UPDATE_NODE 3 "July 25, 2008" "OpenIB" "OpenIB Programmer's Manual" > > +.SH "NAME" > > +ibnd_update_node \- Update the node specified with new data from the fabric. > > + > > +.SH "SYNOPSIS" > > +.nf > > +.B #include > > +.sp > > +.BI "ibnd_node_t *ibnd_update_node(ibnd_node_t *node)" > > + > > +.SH "DESCRIPTION" > > +.B ibnd_update_node() > > +Update the node info, port info, and node description of the node specified. > > + > > +.SH "RETURN VALUE" > > +.B ibnd_update_node() > > +Return NULL on failure, otherwise a valid ibnd_node_t object which is part of the fabric object. > > + > > +.SH "AUTHORS" > > +.TP > > +Ira Weiny > > diff --git a/infiniband-diags/libibnetdisc/src/chassis.c b/infiniband-diags/libibnetdisc/src/chassis.c > > new file mode 100644 > > index 0000000..41f325e > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/src/chassis.c > > @@ -0,0 +1,818 @@ > > +/* > > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > > + * > > + * This software is available to you under a choice of one of two > > + * licenses. You may choose to be licensed under the terms of the GNU > > + * General Public License (GPL) Version 2, available from the file > > + * COPYING in the main directory of this source tree, or the > > + * OpenIB.org BSD license below: > > + * > > + * Redistribution and use in source and binary forms, with or > > + * without modification, are permitted provided that the following > > + * conditions are met: > > + * > > + * - Redistributions of source code must retain the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer. > > + * > > + * - Redistributions in binary form must reproduce the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer in the documentation and/or other materials > > + * provided with the distribution. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > > + * SOFTWARE. > > + * > > + */ > > + > > +/*========================================================*/ > > +/* FABRIC SCANNER SPECIFIC DATA */ > > +/*========================================================*/ > > + > > +#if HAVE_CONFIG_H > > +# include > > +#endif /* HAVE_CONFIG_H */ > > + > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > + > > +#include "internal.h" > > +#include "chassis.h" > > + > > +static char *ChassisTypeStr[5] = { "", "ISR9288", "ISR9096", "ISR2012", "ISR2004" }; > > +static char *ChassisSlotTypeStr[4] = { "", "Line", "Spine", "SRBD" }; > > + > > +char *ibnd_get_chassis_type(ibnd_node_t *node) > > +{ > > + /* Currently, only if Voltaire chassis */ > > + if (node->info.vendid != VTR_VENDOR_ID) > > + return (NULL); > > + if (!node->chassis) > > + return (NULL); > > + if (node->ch_type == UNRESOLVED_CT > > + || node->ch_type > ISR2004_CT) > > + return (NULL); > > + return ChassisTypeStr[node->ch_type]; > > +} > > + > > +char *ibnd_get_chassis_slot_str(ibnd_node_t *node, char *str, size_t size) > > +{ > > + /* Currently, only if Voltaire chassis */ > > + if (node->info.vendid != VTR_VENDOR_ID) > > + return (NULL); > > + if (!node->chassis) > > + return (NULL); > > + if (node->ch_slot == UNRESOLVED_CS > > + || node->ch_slot > SRBD_CS) > > + return (NULL); > > + if (!str) > > + return (NULL); > > + snprintf(str, size, "%s %d Chip %d", > > + ChassisSlotTypeStr[node->ch_slot], > > + node->ch_slotnum, > > + node->ch_anafanum); > > + return (str); > > +} > > + > > +static ibnd_chassis_t *find_chassisnum(struct ibnd_fabric *fabric, unsigned char chassisnum) > > +{ > > + ibnd_chassis_t *current; > > + > > + for (current = fabric->first_chassis; current; current = current->next) { > > + if (current->chassisnum == chassisnum) > > + return current; > > + } > > + > > + return NULL; > > +} > > + > > +static uint64_t topspin_chassisguid(uint64_t guid) > > +{ > > + /* Byte 3 in system image GUID is chassis type, and */ > > + /* Byte 4 is location ID (slot) so just mask off byte 4 */ > > + return guid & 0xffffffff00ffffffULL; > > +} > > + > > +int ibnd_is_xsigo_guid(uint64_t guid) > > +{ > > + if ((guid & 0xffffff0000000000ULL) == 0x0013970000000000ULL) > > + return 1; > > + else > > + return 0; > > +} > > + > > +static int is_xsigo_leafone(uint64_t guid) > > +{ > > + if ((guid & 0xffffffffff000000ULL) == 0x0013970102000000ULL) > > + return 1; > > + else > > + return 0; > > +} > > + > > +int ibnd_is_xsigo_hca(uint64_t guid) > > +{ > > + /* NodeType 2 is HCA */ > > + if ((guid & 0xffffffff00000000ULL) == 0x0013970200000000ULL) > > + return 1; > > + else > > + return 0; > > +} > > + > > +int ibnd_is_xsigo_tca(uint64_t guid) > > +{ > > + /* NodeType 3 is TCA */ > > + if ((guid & 0xffffffff00000000ULL) == 0x0013970300000000ULL) > > + return 1; > > + else > > + return 0; > > +} > > + > > +static int is_xsigo_ca(uint64_t guid) > > +{ > > + if (ibnd_is_xsigo_hca(guid) || ibnd_is_xsigo_tca(guid)) > > + return 1; > > + else > > + return 0; > > +} > > + > > +static int is_xsigo_switch(uint64_t guid) > > +{ > > + if ((guid & 0xffffffff00000000ULL) == 0x0013970100000000ULL) > > + return 1; > > + else > > + return 0; > > +} > > + > > +static uint64_t xsigo_chassisguid(ibnd_node_t *node) > > +{ > > + if (!is_xsigo_ca(node->info.sysimgguid)) { > > + /* Byte 3 is NodeType and byte 4 is PortType */ > > + /* If NodeType is 1 (switch), PortType is masked */ > > + if (is_xsigo_switch(node->info.sysimgguid)) > > + return node->info.sysimgguid & 0xffffffff00ffffffULL; > > + else > > + return node->info.sysimgguid; > > + } else { > > + if (!node->ports || !node->ports[1]) > > + return (0); > > + > > + /* Is there a peer port ? */ > > + if (!node->ports[1]->remoteport) > > + return node->info.sysimgguid; > > + > > + /* If peer port is Leaf 1, use its chassis GUID */ > > + if (is_xsigo_leafone(node->ports[1]->remoteport->node->info.sysimgguid)) > > + return node->ports[1]->remoteport->node->info.sysimgguid & > > + 0xffffffff00ffffffULL; > > + else > > + return node->info.sysimgguid; > > + } > > +} > > + > > +static uint64_t get_chassisguid(ibnd_node_t *node) > > +{ > > + if (node->info.vendid == TS_VENDOR_ID || node->info.vendid == SS_VENDOR_ID) > > + return topspin_chassisguid(node->info.sysimgguid); > > + else if (node->info.vendid == XS_VENDOR_ID || ibnd_is_xsigo_guid(node->info.sysimgguid)) > > + return xsigo_chassisguid(node); > > + else > > + return node->info.sysimgguid; > > +} > > + > > +static ibnd_chassis_t *find_chassisguid(ibnd_node_t *node) > > +{ > > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(node->fabric); > > + ibnd_chassis_t *current; > > + uint64_t chguid; > > + > > + chguid = get_chassisguid(node); > > + for (current = f->first_chassis; current; current = current->next) { > > + if (current->chassisguid == chguid) > > + return current; > > + } > > + > > + return NULL; > > +} > > + > > +uint64_t ibnd_get_chassis_guid(ibnd_fabric_t *fabric, unsigned char chassisnum) > > +{ > > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > > + ibnd_chassis_t *chassis; > > + > > + chassis = find_chassisnum(f, chassisnum); > > + if (chassis) > > + return chassis->chassisguid; > > + else > > + return 0; > > +} > > + > > +static int is_router(struct ibnd_node *n) > > +{ > > + return (n->node.info.devid == VTR_DEVID_IB_FC_ROUTER || > > + n->node.info.devid == VTR_DEVID_IB_IP_ROUTER); > > +} > > + > > +static int is_spine_9096(struct ibnd_node *n) > > +{ > > + return (n->node.info.devid == VTR_DEVID_SFB4 || > > + n->node.info.devid == VTR_DEVID_SFB4_DDR); > > +} > > + > > +static int is_spine_9288(struct ibnd_node *n) > > +{ > > + return (n->node.info.devid == VTR_DEVID_SFB12 || > > + n->node.info.devid == VTR_DEVID_SFB12_DDR); > > +} > > + > > +static int is_spine_2004(struct ibnd_node *n) > > +{ > > + return (n->node.info.devid == VTR_DEVID_SFB2004); > > +} > > + > > +static int is_spine_2012(struct ibnd_node *n) > > +{ > > + return (n->node.info.devid == VTR_DEVID_SFB2012); > > +} > > + > > +static int is_spine(struct ibnd_node *n) > > +{ > > + return (is_spine_9096(n) || is_spine_9288(n) || > > + is_spine_2004(n) || is_spine_2012(n)); > > +} > > + > > +static int is_line_24(struct ibnd_node *n) > > +{ > > + return (n->node.info.devid == VTR_DEVID_SLB24 || > > + n->node.info.devid == VTR_DEVID_SLB24_DDR || > > + n->node.info.devid == VTR_DEVID_SRB2004); > > +} > > + > > +static int is_line_8(struct ibnd_node *n) > > +{ > > + return (n->node.info.devid == VTR_DEVID_SLB8); > > +} > > + > > +static int is_line_2024(struct ibnd_node *n) > > +{ > > + return (n->node.info.devid == VTR_DEVID_SLB2024); > > +} > > + > > +static int is_line(struct ibnd_node *n) > > +{ > > + return (is_line_24(n) || is_line_8(n) || is_line_2024(n)); > > +} > > + > > +int is_chassis_switch(struct ibnd_node *n) > > +{ > > + return (is_spine(n) || is_line(n)); > > +} > > + > > +/* these structs help find Line (Anafa) slot number while using spine portnum */ > > +int line_slot_2_sfb4[25] = { 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4 }; > > +int anafa_line_slot_2_sfb4[25] = { 0, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2 }; > > +int line_slot_2_sfb12[25] = { 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9,10, 10, 11, 11, 12, 12 }; > > +int anafa_line_slot_2_sfb12[25] = { 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2 }; > > + > > +/* IPR FCR modules connectivity while using sFB4 port as reference */ > > +int ipr_slot_2_sfb4_port[25] = { 0, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1, 3, 2, 1 }; > > + > > +/* these structs help find Spine (Anafa) slot number while using spine portnum */ > > +int spine12_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > > +int anafa_spine12_slot_2_slb[25]= { 0, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > > +int spine4_slot_2_slb[25] = { 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > > +int anafa_spine4_slot_2_slb[25] = { 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; > > +/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ > > + > > +static void get_sfb_slot(struct ibnd_node *node, ibnd_port_t *lineport) > > +{ > > + ibnd_node_t *n = (ibnd_node_t *)node; > > + > > + n->ch_slot = SPINE_CS; > > + if (is_spine_9096(node)) { > > + n->ch_type = ISR9096_CT; > > + n->ch_slotnum = spine4_slot_2_slb[lineport->portnum]; > > + n->ch_anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; > > + } else if (is_spine_9288(node)) { > > + n->ch_type = ISR9288_CT; > > + n->ch_slotnum = spine12_slot_2_slb[lineport->portnum]; > > + n->ch_anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; > > + } else if (is_spine_2012(node)) { > > + n->ch_type = ISR2012_CT; > > + n->ch_slotnum = spine12_slot_2_slb[lineport->portnum]; > > + n->ch_anafanum = anafa_spine12_slot_2_slb[lineport->portnum]; > > + } else if (is_spine_2004(node)) { > > + n->ch_type = ISR2004_CT; > > + n->ch_slotnum = spine4_slot_2_slb[lineport->portnum]; > > + n->ch_anafanum = anafa_spine4_slot_2_slb[lineport->portnum]; > > + } else { > > + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, > > + node->node.info.nodeguid); > > + } > > +} > > + > > +static void get_router_slot(struct ibnd_node *node, ibnd_port_t *spineport) > > +{ > > + ibnd_node_t *n = (ibnd_node_t *)node; > > + int guessnum = 0; > > + > > + node->ch_found = 1; > > + > > + n->ch_slot = SRBD_CS; > > + if (is_spine_9096(CONV_NODE_INTERNAL(spineport->node))) { > > + n->ch_type = ISR9096_CT; > > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > > + n->ch_anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; > > + } else if (is_spine_9288(CONV_NODE_INTERNAL(spineport->node))) { > > + n->ch_type = ISR9288_CT; > > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > > + /* this is a smart guess based on nodeguids order on sFB-12 module */ > > + guessnum = spineport->node->info.nodeguid % 4; > > + /* module 1 <--> remote anafa 3 */ > > + /* module 2 <--> remote anafa 2 */ > > + /* module 3 <--> remote anafa 1 */ > > + n->ch_anafanum = (guessnum == 3 ? 1 : (guessnum == 1 ? 3 : 2)); > > + } else if (is_spine_2012(CONV_NODE_INTERNAL(spineport->node))) { > > + n->ch_type = ISR2012_CT; > > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > > + /* this is a smart guess based on nodeguids order on sFB-12 module */ > > + guessnum = spineport->node->info.nodeguid % 4; > > + // module 1 <--> remote anafa 3 > > + // module 2 <--> remote anafa 2 > > + // module 3 <--> remote anafa 1 > > + n->ch_anafanum = (guessnum == 3? 1 : (guessnum == 1 ? 3 : 2)); > > + } else if (is_spine_2004(CONV_NODE_INTERNAL(spineport->node))) { > > + n->ch_type = ISR2004_CT; > > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > > + n->ch_anafanum = ipr_slot_2_sfb4_port[spineport->portnum]; > > + } else { > > + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, > > + spineport->node->info.nodeguid); > > + } > > +} > > + > > +static void get_slb_slot(ibnd_node_t *n, ibnd_port_t *spineport) > > +{ > > + n->ch_slot = LINE_CS; > > + if (is_spine_9096(CONV_NODE_INTERNAL(spineport->node))) { > > + n->ch_type = ISR9096_CT; > > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > > + n->ch_anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; > > + } else if (is_spine_9288(CONV_NODE_INTERNAL(spineport->node))) { > > + n->ch_type = ISR9288_CT; > > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > > + n->ch_anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; > > + } else if (is_spine_2012(CONV_NODE_INTERNAL(spineport->node))) { > > + n->ch_type = ISR2012_CT; > > + n->ch_slotnum = line_slot_2_sfb12[spineport->portnum]; > > + n->ch_anafanum = anafa_line_slot_2_sfb12[spineport->portnum]; > > + } else if (is_spine_2004(CONV_NODE_INTERNAL(spineport->node))) { > > + n->ch_type = ISR2004_CT; > > + n->ch_slotnum = line_slot_2_sfb4[spineport->portnum]; > > + n->ch_anafanum = anafa_line_slot_2_sfb4[spineport->portnum]; > > + } else { > > + IBPANIC("Unexpected node found: guid 0x%016" PRIx64, > > + spineport->node->info.nodeguid); > > + } > > +} > > + > > +/* forward declare this */ > > +static void voltaire_portmap(ibnd_port_t *port); > > +/* > > + This function called for every Voltaire node in fabric > > + It could be optimized so, but time overhead is very small > > + and its only diag.util > > +*/ > > +static void fill_voltaire_chassis_record(struct ibnd_node *node) > > +{ > > + ibnd_node_t *n = (ibnd_node_t *)node; > > + int p = 0; > > + ibnd_port_t *port; > > + struct ibnd_node *remnode = 0; > > + > > + if (node->ch_found) /* somehow this node has already been passed */ > > + return; > > + node->ch_found = 1; > > + > > + /* node is router only in case of using unique lid */ > > + /* (which is lid of chassis router port) */ > > + /* in such case node->ports is actually a requested port... */ > > + if (is_router(node)) { > > + /* find the remote node */ > > + for (p = 1; p <= node->node.info.numports; p++) { > > + port = node->node.ports[p]; > > + if (port && is_spine(CONV_NODE_INTERNAL(port->remoteport->node))) > > + get_router_slot(node, port->remoteport); > > + } > > + } else if (is_spine(node)) { > > + for (p = 1; p <= node->node.info.numports; p++) { > > + port = node->node.ports[p]; > > + if (!port || !port->remoteport) > > + continue; > > + remnode = CONV_NODE_INTERNAL(port->remoteport->node); > > + if (remnode->node.info.type != IBND_SWITCH_NODE) { > > + if (!remnode->ch_found) > > + get_router_slot(remnode, port); > > + continue; > > + } > > + if (!n->ch_type) > > + /* we assume here that remoteport belongs to line */ > > + get_sfb_slot(node, port->remoteport); > > + > > + /* we could break here, but need to find if more routers connected */ > > + } > > + > > + } else if (is_line(node)) { > > + for (p = 1; p <= node->node.info.numports; p++) { > > + port = node->node.ports[p]; > > + if (!port || port->portnum > 12 || !port->remoteport) > > + continue; > > + /* we assume here that remoteport belongs to spine */ > > + get_slb_slot(n, port->remoteport); > > + break; > > + } > > + } > > + > > + /* for each port of this node, map external ports */ > > + for (p = 1; p <= node->node.info.numports; p++) { > > + port = node->node.ports[p]; > > + if (!port) > > + continue; > > + voltaire_portmap(port); > > + } > > + > > + return; > > +} > > + > > +static int get_line_index(ibnd_node_t *node) > > +{ > > + int retval = 3 * (node->ch_slotnum - 1) + node->ch_anafanum; > > + > > + if (retval > LINES_MAX_NUM || retval < 1) > > + IBPANIC("Internal error"); > > + return retval; > > +} > > + > > +static int get_spine_index(ibnd_node_t *node) > > +{ > > + int retval; > > + > > + if (is_spine_9288(CONV_NODE_INTERNAL(node)) || is_spine_2012(CONV_NODE_INTERNAL(node))) > > + retval = 3 * (node->ch_slotnum - 1) + node->ch_anafanum; > > + else > > + retval = node->ch_slotnum; > > + > > + if (retval > SPINES_MAX_NUM || retval < 1) > > + IBPANIC("Internal error"); > > + return retval; > > +} > > + > > +static void insert_line_router(ibnd_node_t *node, ibnd_chassis_t *chassis) > > +{ > > + int i = get_line_index(node); > > + > > + if (chassis->linenode[i]) > > + return; /* already filled slot */ > > + > > + chassis->linenode[i] = node; > > + node->chassis = chassis; > > +} > > + > > +static void insert_spine(ibnd_node_t *node, ibnd_chassis_t *chassis) > > +{ > > + int i = get_spine_index(node); > > + > > + if (chassis->spinenode[i]) > > + return; /* already filled slot */ > > + > > + chassis->spinenode[i] = node; > > + node->chassis = chassis; > > +} > > + > > +static void pass_on_lines_catch_spines(ibnd_chassis_t *chassis) > > +{ > > + ibnd_node_t *node, *remnode; > > + ibnd_port_t *port; > > + int i, p; > > + > > + for (i = 1; i <= LINES_MAX_NUM; i++) { > > + node = chassis->linenode[i]; > > + > > + if (!(node && is_line(CONV_NODE_INTERNAL(node)))) > > + continue; /* empty slot or router */ > > + > > + for (p = 1; p <= node->info.numports; p++) { > > + port = node->ports[p]; > > + if (!port || port->portnum > 12 || !port->remoteport) > > + continue; > > + > > + remnode = port->remoteport->node; > > + > > + if (!CONV_NODE_INTERNAL(remnode)->ch_found) > > + continue; /* some error - spine not initialized ? FIXME */ > > + insert_spine(remnode, chassis); > > + } > > + } > > +} > > + > > +static void pass_on_spines_catch_lines(ibnd_chassis_t *chassis) > > +{ > > + ibnd_node_t *node, *remnode; > > + ibnd_port_t *port; > > + int i, p; > > + > > + for (i = 1; i <= SPINES_MAX_NUM; i++) { > > + node = chassis->spinenode[i]; > > + if (!node) > > + continue; /* empty slot */ > > + for (p = 1; p <= node->info.numports; p++) { > > + port = node->ports[p]; > > + if (!port || !port->remoteport) > > + continue; > > + remnode = port->remoteport->node; > > + > > + if (!CONV_NODE_INTERNAL(remnode)->ch_found) > > + continue; /* some error - line/router not initialized ? FIXME */ > > + insert_line_router(remnode, chassis); > > + } > > + } > > +} > > + > > +/* > > + Stupid interpolation algorithm... > > + But nothing to do - have to be compliant with VoltaireSM/NMS > > +*/ > > +static void pass_on_spines_interpolate_chguid(ibnd_chassis_t *chassis) > > +{ > > + ibnd_node_t *node; > > + int i; > > + > > + for (i = 1; i <= SPINES_MAX_NUM; i++) { > > + node = chassis->spinenode[i]; > > + if (!node) > > + continue; /* skip the empty slots */ > > + > > + /* take first guid minus one to be consistent with SM */ > > + chassis->chassisguid = node->info.nodeguid - 1; > > + break; > > + } > > +} > > + > > +/* > > + This function fills chassis structure with all nodes > > + in that chassis > > + chassis structure = structure of one standalone chassis > > +*/ > > +static void build_chassis(struct ibnd_node *node, ibnd_chassis_t *chassis) > > +{ > > + int p = 0; > > + struct ibnd_node *remnode = 0; > > + ibnd_port_t *port = 0; > > + > > + /* we get here with node = chassis_spine */ > > + insert_spine((ibnd_node_t *)node, chassis); > > + > > + /* loop: pass on all ports of node */ > > + for (p = 1; p <= node->node.info.numports; p++ ) { > > + port = node->node.ports[p]; > > + if (!port || !port->remoteport) > > + continue; > > + remnode = CONV_NODE_INTERNAL(port->remoteport->node); > > + > > + if (!remnode->ch_found) > > + continue; /* some error - line or router not initialized ? FIXME */ > > + > > + insert_line_router(&(remnode->node), chassis); > > + } > > + > > + pass_on_lines_catch_spines(chassis); > > + /* this pass needed for to catch routers, since routers connected only */ > > + /* to spines in slot 1 or 4 and we could miss them first time */ > > + pass_on_spines_catch_lines(chassis); > > + > > + /* additional 2 passes needed for to overcome a problem of pure "in-chassis" */ > > + /* connectivity - extra pass to ensure that all related chips/modules */ > > + /* inserted into the chassis */ > > + pass_on_lines_catch_spines(chassis); > > + pass_on_spines_catch_lines(chassis); > > + pass_on_spines_interpolate_chguid(chassis); > > +} > > + > > +/*========================================================*/ > > +/* INTERNAL TO EXTERNAL PORT MAPPING */ > > +/*========================================================*/ > > + > > +/* > > +Description : On ISR9288/9096 external ports indexing > > + is not matching the internal ( anafa ) port > > + indexes. Use this MAP to translate the data you get from > > + the OpenIB diagnostics (smpquery, ibroute, ibtracert, etc.) > > + > > + > > +Module : sLB-24 > > + anafa 1 anafa 2 > > +ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 > > +int port | 22 23 24 18 17 16 | 22 23 24 18 17 16 > > +ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 > > +int port | 19 20 21 15 14 13 | 19 20 21 15 14 13 > > +------------------------------------------------ > > + > > +Module : sLB-8 > > + anafa 1 anafa 2 > > +ext port | 13 14 15 16 17 18 | 19 20 21 22 23 24 > > +int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 > > +ext port | 1 2 3 4 5 6 | 7 8 9 10 11 12 > > +int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 > > + > > +-----------> > > + anafa 1 anafa 2 > > +ext port | - - 5 - - 6 | - - 7 - - 8 > > +int port | 24 23 22 18 17 16 | 24 23 22 18 17 16 > > +ext port | - - 1 - - 2 | - - 3 - - 4 > > +int port | 21 20 19 15 14 13 | 21 20 19 15 14 13 > > +------------------------------------------------ > > + > > +Module : sLB-2024 > > + > > +ext port | 13 14 15 16 17 18 19 20 21 22 23 24 > > +A1 int port| 13 14 15 16 17 18 19 20 21 22 23 24 > > +ext port | 1 2 3 4 5 6 7 8 9 10 11 12 > > +A2 int port| 13 14 15 16 17 18 19 20 21 22 23 24 > > +--------------------------------------------------- > > + > > +*/ > > + > > +int int2ext_map_slb24[2][25] = { > > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 5, 4, 18, 17, 16, 1, 2, 3, 13, 14, 15 }, > > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 11, 10, 24, 23, 22, 7, 8, 9, 19, 20, 21 } > > + }; > > +int int2ext_map_slb8[2][25] = { > > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 6, 6, 6, 1, 1, 1, 5, 5, 5 }, > > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 8, 8, 8, 3, 3, 3, 7, 7, 7 } > > + }; > > +int int2ext_map_slb2024[2][25] = { > > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }, > > + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 } > > + }; > > +/* reference { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 }; */ > > + > > +/* map internal ports to external ports if appropriate */ > > +static void > > +voltaire_portmap(ibnd_port_t *port) > > +{ > > + struct ibnd_node *n = CONV_NODE_INTERNAL(port->node); > > + int portnum = port->portnum; > > + int chipnum = 0; > > + ibnd_node_t *node = port->node; > > + > > + if (!n->ch_found || !is_line(CONV_NODE_INTERNAL(node)) || (portnum < 13 || portnum > 24)) { > > + port->ext_portnum = 0; > > + return; > > + } > > + > > + if (port->node->ch_anafanum < 1 || port->node->ch_anafanum > 2) { > > + port->ext_portnum = 0; > > + return; > > + } > > + > > + chipnum = port->node->ch_anafanum - 1; > > + > > + if (is_line_24(CONV_NODE_INTERNAL(node))) > > + port->ext_portnum = int2ext_map_slb24[chipnum][portnum]; > > + else if (is_line_2024(CONV_NODE_INTERNAL(node))) > > + port->ext_portnum = int2ext_map_slb2024[chipnum][portnum]; > > + else > > + port->ext_portnum = int2ext_map_slb8[chipnum][portnum]; > > +} > > + > > +static void add_chassis(struct ibnd_fabric *fabric) > > +{ > > + if (!(fabric->current_chassis = calloc(1, sizeof(ibnd_chassis_t)))) > > + IBPANIC("out of mem"); > > + > > + if (fabric->first_chassis == NULL) { > > + fabric->first_chassis = fabric->current_chassis; > > + fabric->last_chassis = fabric->current_chassis; > > + } else { > > + fabric->last_chassis->next = fabric->current_chassis; > > + fabric->last_chassis = fabric->current_chassis; > > + } > > +} > > + > > +static void > > +add_node_to_chassis(ibnd_chassis_t *chassis, ibnd_node_t *node) > > +{ > > + node->chassis = chassis; > > + node->next_chassis_node = chassis->nodes; > > + chassis->nodes = node; > > +} > > + > > +/* > > + Main grouping function > > + Algorithm: > > + 1. pass on every Voltaire node > > + 2. catch spine chip for every Voltaire node > > + 2.1 build/interpolate chassis around this chip > > + 2.2 go to 1. > > + 3. pass on non Voltaire nodes (SystemImageGUID based grouping) > > + 4. now group non Voltaire nodes by SystemImageGUID > > + Returns: > > + Pointer to the first chassis in a NULL terminated list of chassis in > > + the fabric specified. > > +*/ > > +ibnd_chassis_t *group_nodes(struct ibnd_fabric *fabric) > > +{ > > + struct ibnd_node *node; > > + int dist; > > + int chassisnum = 0; > > + ibnd_chassis_t *chassis; > > + > > + fabric->first_chassis = NULL; > > + fabric->current_chassis = NULL; > > + > > + /* first pass on switches and build for every Voltaire node */ > > + /* an appropriate chassis record (slotnum and position) */ > > + /* according to internal connectivity */ > > + /* not very efficient but clear code so... */ > > + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { > > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > > + if (node->node.info.vendid == VTR_VENDOR_ID) > > + fill_voltaire_chassis_record(node); > > + } > > + } > > + > > + /* separate every Voltaire chassis from each other and build linked list of them */ > > + /* algorithm: catch spine and find all surrounding nodes */ > > + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { > > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > > + if (node->node.info.vendid != VTR_VENDOR_ID) > > + continue; > > + //if (!node->node.chrecord || node->node.chrecord->chassisnum || !is_spine(node)) > > + if (!node->ch_found > > + || (node->node.chassis && node->node.chassis->chassisnum) > > + || !is_spine(node)) > > + continue; > > + add_chassis(fabric); > > + fabric->current_chassis->chassisnum = ++chassisnum; > > + build_chassis(node, fabric->current_chassis); > > + } > > + } > > + > > + /* now make pass on nodes for chassis which are not Voltaire */ > > + /* grouped by common SystemImageGUID */ > > + for (dist = 0; dist <= fabric->fabric.maxhops_discovered; dist++) { > > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > > + if (node->node.info.vendid == VTR_VENDOR_ID) > > + continue; > > + if (node->node.info.sysimgguid) { > > + chassis = find_chassisguid((ibnd_node_t *)node); > > + if (chassis) > > + chassis->nodecount++; > > + else { > > + /* Possible new chassis */ > > + add_chassis(fabric); > > + fabric->current_chassis->chassisguid = > > + get_chassisguid((ibnd_node_t *)node); > > + fabric->current_chassis->nodecount = 1; > > + } > > + } > > + } > > + } > > + > > + /* now, make another pass to see which nodes are part of chassis */ > > + /* (defined as chassis->nodecount > 1) */ > > + for (dist = 0; dist <= MAXHOPS; ) { > > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > > + if (node->node.info.vendid == VTR_VENDOR_ID) > > + continue; > > + if (node->node.info.sysimgguid) { > > + chassis = find_chassisguid((ibnd_node_t *)node); > > + if (chassis && chassis->nodecount > 1) { > > + if (!chassis->chassisnum) > > + chassis->chassisnum = ++chassisnum; > > + if (!node->ch_found) { > > + node->ch_found = 1; > > + add_node_to_chassis(chassis, (ibnd_node_t *)node); > > + } > > + } > > + } > > + } > > + if (dist == fabric->fabric.maxhops_discovered) > > + dist = MAXHOPS; /* skip to CAs */ > > + else > > + dist++; > > + } > > + > > + return (fabric->first_chassis); > > +} > > diff --git a/infiniband-diags/libibnetdisc/src/chassis.h b/infiniband-diags/libibnetdisc/src/chassis.h > > new file mode 100644 > > index 0000000..16dad49 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/src/chassis.h > > @@ -0,0 +1,85 @@ > > +/* > > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > > + * > > + * This software is available to you under a choice of one of two > > + * licenses. You may choose to be licensed under the terms of the GNU > > + * General Public License (GPL) Version 2, available from the file > > + * COPYING in the main directory of this source tree, or the > > + * OpenIB.org BSD license below: > > + * > > + * Redistribution and use in source and binary forms, with or > > + * without modification, are permitted provided that the following > > + * conditions are met: > > + * > > + * - Redistributions of source code must retain the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer. > > + * > > + * - Redistributions in binary form must reproduce the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer in the documentation and/or other materials > > + * provided with the distribution. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > > + * SOFTWARE. > > + * > > + */ > > + > > +#ifndef _CHASSIS_H_ > > +#define _CHASSIS_H_ > > + > > +#include > > + > > +#include "internal.h" > > + > > +/*========================================================*/ > > +/* CHASSIS RECOGNITION SPECIFIC DATA */ > > +/*========================================================*/ > > + > > +/* Device IDs */ > > +#define VTR_DEVID_IB_FC_ROUTER 0x5a00 > > +#define VTR_DEVID_IB_IP_ROUTER 0x5a01 > > +#define VTR_DEVID_ISR9600_SPINE 0x5a02 > > +#define VTR_DEVID_ISR9600_LEAF 0x5a03 > > +#define VTR_DEVID_HCA1 0x5a04 > > +#define VTR_DEVID_HCA2 0x5a44 > > +#define VTR_DEVID_HCA3 0x6278 > > +#define VTR_DEVID_SW_6IB4 0x5a05 > > +#define VTR_DEVID_ISR9024 0x5a06 > > +#define VTR_DEVID_ISR9288 0x5a07 > > +#define VTR_DEVID_SLB24 0x5a09 > > +#define VTR_DEVID_SFB12 0x5a08 > > +#define VTR_DEVID_SFB4 0x5a0b > > +#define VTR_DEVID_ISR9024_12 0x5a0c > > +#define VTR_DEVID_SLB8 0x5a0d > > +#define VTR_DEVID_RLX_SWITCH_BLADE 0x5a20 > > +#define VTR_DEVID_ISR9024_DDR 0x5a31 > > +#define VTR_DEVID_SFB12_DDR 0x5a32 > > +#define VTR_DEVID_SFB4_DDR 0x5a33 > > +#define VTR_DEVID_SLB24_DDR 0x5a34 > > +#define VTR_DEVID_SFB2012 0x5a37 > > +#define VTR_DEVID_SLB2024 0x5a38 > > +#define VTR_DEVID_ISR2012 0x5a39 > > +#define VTR_DEVID_SFB2004 0x5a40 > > +#define VTR_DEVID_ISR2004 0x5a41 > > +#define VTR_DEVID_SRB2004 0x5a42 > > + > > +/* Vendor IDs (for chassis based systems) */ > > +#define VTR_VENDOR_ID 0x8f1 /* Voltaire */ > > +#define TS_VENDOR_ID 0x5ad /* Cisco */ > > +#define SS_VENDOR_ID 0x66a /* InfiniCon */ > > +#define XS_VENDOR_ID 0x1397 /* Xsigo */ > > + > > +enum ibnd_chassis_type { UNRESOLVED_CT, ISR9288_CT, ISR9096_CT, ISR2012_CT, ISR2004_CT }; > > +enum ibnd_chassis_slot_type { UNRESOLVED_CS, LINE_CS, SPINE_CS, SRBD_CS }; > > + > > +ibnd_chassis_t *group_nodes(struct ibnd_fabric *fabric); > > + > > +#endif /* _CHASSIS_H_ */ > > diff --git a/infiniband-diags/libibnetdisc/src/ibnetdisc.c b/infiniband-diags/libibnetdisc/src/ibnetdisc.c > > new file mode 100644 > > index 0000000..64e4ece > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/src/ibnetdisc.c > > @@ -0,0 +1,872 @@ > > +/* > > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > > + * Copyright (c) 2008 Lawrence Livermore National Laboratory > > + * > > + * This software is available to you under a choice of one of two > > + * licenses. You may choose to be licensed under the terms of the GNU > > + * General Public License (GPL) Version 2, available from the file > > + * COPYING in the main directory of this source tree, or the > > + * OpenIB.org BSD license below: > > + * > > + * Redistribution and use in source and binary forms, with or > > + * without modification, are permitted provided that the following > > + * conditions are met: > > + * > > + * - Redistributions of source code must retain the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer. > > + * > > + * - Redistributions in binary form must reproduce the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer in the documentation and/or other materials > > + * provided with the distribution. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > > + * SOFTWARE. > > + * > > + */ > > + > > +#if HAVE_CONFIG_H > > +# include > > +#endif /* HAVE_CONFIG_H */ > > + > > +#define _GNU_SOURCE > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > + > > +#include "internal.h" > > +#include "chassis.h" > > + > > +static int timeout_ms = 2000; > > +static int show_progress = 0; > > + > > +static char *linkwidth_str[] = { > > + "??", > > + "1x", > > + "4x", > > + "??", > > + "8x", > > + "??", > > + "??", > > + "??", > > + "12x" > > +}; > > + > > +static char *linkspeed_str[] = { > > + "???", > > + "SDR", > > + "DDR", > > + "???", > > + "QDR" > > +}; > > + > > +static char *linkspeed_datarate_str[] = { > > + "???", > > + "2.5 Gbps", > > + "5.0 Gbps", > > + "???", > > + "10.0 Gbps" > > +}; > > + > > +static char *linkstate_str[] = { > > + "No State", > > + "Down", > > + "Init", > > + "Armed", > > + "Active" > > +}; > > + > > +static char *physstate_str[] = { > > + "No State", > > + "Sleep", > > + "Polling", > > + "Disabled", > > + "PortConfigTraining", > > + "LinkUp", > > + "LinkErrorRecovery", > > + "Phy Test" > > +}; > > + > > +char * > > +ibnd_linkwidth_str(int link_width) > > +{ > > + if (link_width > 8) > > + return linkwidth_str[0]; > > + else > > + return linkwidth_str[link_width]; > > +} > > + > > +char * > > +ibnd_linkspeed_str(int link_speed, int data_rate) > > +{ > > + if (link_speed > 4) > > + return linkspeed_str[0]; > > + else if (data_rate) > > + return linkspeed_datarate_str[link_speed]; > > + else > > + return linkspeed_str[link_speed]; > > +} > > +char * > > +ibnd_linkstate_str(int link_state) > > +{ > > + if (link_state > 4) > > + return linkstate_str[0]; > > + else > > + return linkstate_str[link_state]; > > +} > > + > > +char * > > +ibnd_physstate_str(int phys_state) > > +{ > > + if (phys_state > 7) > > + return physstate_str[0]; > > + else > > + return physstate_str[phys_state]; > > +} > > + > > +void > > +decode_port_info(void * rcv_buf, ibnd_port_info_t *pi) > > +{ > > + mad_decode_field(rcv_buf, IB_PORT_LID_F, &pi->lid); > > + mad_decode_field(rcv_buf, IB_PORT_SMLID_F, &pi->smlid); > > + > > + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_SUPPORTED_F, &pi->link_speed_supported); > > + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_ENABLED_F, &pi->link_speed_enabled); > > + mad_decode_field(rcv_buf, IB_PORT_LINK_SPEED_ACTIVE_F, &pi->link_speed_active); > > + > > + mad_decode_field(rcv_buf, IB_PORT_LOCAL_PORT_F, &pi->local_port); > > + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_SUPPORTED_F, &pi->link_width_supported); > > + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_ENABLED_F, &pi->link_width_enabled); > > + > > + mad_decode_field(rcv_buf, IB_PORT_LINK_WIDTH_ACTIVE_F, &pi->link_width_active); > > + > > + mad_decode_field(rcv_buf, IB_PORT_DIAG_F, &pi->diag_code); > > + mad_decode_field(rcv_buf, IB_PORT_MKEY_LEASE_F, &pi->mkey_lease); > > + mad_decode_field(rcv_buf, IB_PORT_CAPMASK_F, &pi->capability_mask); > > + mad_decode_field(rcv_buf, IB_PORT_MKEY_F, &pi->mkey); > > + mad_decode_field(rcv_buf, IB_PORT_GID_PREFIX_F, &pi->gid_prefix); > > + > > + mad_decode_field(rcv_buf, IB_PORT_STATE_F, &pi->link_state); > > + mad_decode_field(rcv_buf, IB_PORT_PHYS_STATE_F, &pi->phys_state); > > + > > + mad_decode_field(rcv_buf, IB_PORT_LINK_DOWN_DEF_F, &pi->link_down_def_state); > > + mad_decode_field(rcv_buf, IB_PORT_MKEY_PROT_BITS_F, &pi->mkey_prot_bits); > > + > > + mad_decode_field(rcv_buf, IB_PORT_LMC_F, &pi->lmc); > > + mad_decode_field(rcv_buf, IB_PORT_NEIGHBOR_MTU_F, &pi->neighbor_mtu); > > + mad_decode_field(rcv_buf, IB_PORT_SMSL_F, &pi->smsl); > > + mad_decode_field(rcv_buf, IB_PORT_INIT_TYPE_F, &pi->init_type); > > + > > + mad_decode_field(rcv_buf, IB_PORT_VL_CAP_F, &pi->vl_capability); > > + mad_decode_field(rcv_buf, IB_PORT_VL_HIGH_LIMIT_F, &pi->vl_high_limit); > > + mad_decode_field(rcv_buf, IB_PORT_VL_ARBITRATION_HIGH_CAP_F, &pi->vl_arb_high_cap); > > + mad_decode_field(rcv_buf, IB_PORT_VL_ARBITRATION_LOW_CAP_F, &pi->vl_arb_low_cap); > > + > > + mad_decode_field(rcv_buf, IB_PORT_INIT_TYPE_REPLY_F, &pi->init_reply); > > + mad_decode_field(rcv_buf, IB_PORT_MTU_CAP_F, &pi->mtu_cap); > > + mad_decode_field(rcv_buf, IB_PORT_VL_STALL_COUNT_F, &pi->vl_stall_count); > > + mad_decode_field(rcv_buf, IB_PORT_HOQ_LIFE_F, &pi->hoq_lifetime); > > + mad_decode_field(rcv_buf, IB_PORT_OPER_VLS_F, &pi->oper_vls); > > + mad_decode_field(rcv_buf, IB_PORT_PART_EN_INB_F, &pi->partition_enforce_in); > > + mad_decode_field(rcv_buf, IB_PORT_PART_EN_OUTB_F, &pi->partition_enforce_out); > > + mad_decode_field(rcv_buf, IB_PORT_FILTER_RAW_INB_F, &pi->filter_raw_in); > > + mad_decode_field(rcv_buf, IB_PORT_FILTER_RAW_OUTB_F, &pi->filter_raw_out); > > + mad_decode_field(rcv_buf, IB_PORT_MKEY_VIOL_F, &pi->mkey_violations); > > + mad_decode_field(rcv_buf, IB_PORT_PKEY_VIOL_F, &pi->pkey_violations); > > + mad_decode_field(rcv_buf, IB_PORT_QKEY_VIOL_F, &pi->qkey_violations); > > + > > + mad_decode_field(rcv_buf, IB_PORT_GUID_CAP_F, &pi->guid_capabilities); > > + > > + mad_decode_field(rcv_buf, IB_PORT_CLIENT_REREG_F, &pi->client_rereg); > > + mad_decode_field(rcv_buf, IB_PORT_SUBN_TIMEOUT_F, &pi->subnet_timeout); > > + mad_decode_field(rcv_buf, IB_PORT_RESP_TIME_VAL_F, &pi->response_time_val); > > + mad_decode_field(rcv_buf, IB_PORT_LOCAL_PHYS_ERR_F, &pi->local_phys_error); > > + mad_decode_field(rcv_buf, IB_PORT_OVERRUN_ERR_F, &pi->overrun_error); > > + mad_decode_field(rcv_buf, IB_PORT_MAX_CREDIT_HINT_F, &pi->max_credit_hint); > > + mad_decode_field(rcv_buf, IB_PORT_LINK_ROUND_TRIP_F, &pi->link_round_trip); > > +} > > + > > +static int > > +get_port_info(struct ibnd_fabric *fabric, struct ibnd_port *port, > > + int portnum, ib_portid_t *portid) > > +{ > > + char portinfo[64]; > > + void *pi = portinfo; > > + > > + port->port.portnum = portnum; > > + > > + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, portnum, timeout_ms, > > + fabric->ibmad_port)) > > + return -1; > > + > > + decode_port_info(pi, &port->port.info); > > + > > + IBND_DEBUG("portid %s portnum %d: lid %d state %d physstate %d %s %s\n", > > + portid2str(portid), portnum, port->port.info.lid, port->port.info.link_state, > > + port->port.info.phys_state, ibnd_linkwidth_str(port->port.info.link_width_active), > > + ibnd_linkspeed_str(port->port.info.link_speed_active, 0)); > > + return 1; > > +} > > + > > +static void > > +decode_node_info(void * rcv_buf, ibnd_node_info_t *ni) > > +{ > > + mad_decode_field(rcv_buf, IB_NODE_BASE_VERS_F, &ni->base_ver); > > + mad_decode_field(rcv_buf, IB_NODE_CLASS_VERS_F, &ni->class_ver); > > + mad_decode_field(rcv_buf, IB_NODE_TYPE_F, &ni->type); > > + mad_decode_field(rcv_buf, IB_NODE_NPORTS_F, &ni->numports); > > + mad_decode_field(rcv_buf, IB_NODE_SYSTEM_GUID_F, &ni->sysimgguid); > > + mad_decode_field(rcv_buf, IB_NODE_GUID_F, &ni->nodeguid); > > + mad_decode_field(rcv_buf, IB_NODE_PORT_GUID_F, &ni->nodeportguid); > > + mad_decode_field(rcv_buf, IB_NODE_PARTITION_CAP_F, &ni->partition_cap); > > + mad_decode_field(rcv_buf, IB_NODE_DEVID_F, &ni->devid); > > + mad_decode_field(rcv_buf, IB_NODE_REVISION_F, &ni->revision); > > + mad_decode_field(rcv_buf, IB_NODE_LOCAL_PORT_F, &ni->localport); > > + mad_decode_field(rcv_buf, IB_NODE_VENDORID_F, &ni->vendid); > > +} > > + > > +/* > > + * Returns -1 if error. > > + */ > > +static int > > +query_node_info(struct ibnd_fabric *fabric, struct ibnd_node *node, ib_portid_t *portid) > > +{ > > + char nodeinfo[64]; > > + void *ni = nodeinfo; > > + if (!smp_query_via(ni, portid, IB_ATTR_NODE_INFO, 0, timeout_ms, > > + fabric->ibmad_port)) > > + return -1; > > + decode_node_info(ni, &(node->node.info)); > > + return (0); > > +} > > + > > +/* > > + * Returns 0 if non switch node is found, 1 if switch is found, -1 if error. > > + */ > > +static int > > +query_node(struct ibnd_fabric *fabric, struct ibnd_node *inode, > > + struct ibnd_port *iport, ib_portid_t *portid) > > +{ > > + char portinfo[64]; > > + void *pi = portinfo; > > + char switchinfo[64]; > > + void *si = switchinfo; > > + ibnd_node_t *node = &(inode->node); > > + ibnd_port_t *port = &(iport->port); > > + void *nd = inode->node.nodedesc; > > + > > + if (query_node_info(fabric, inode, portid)) > > + return -1; > > + > > + port->portnum = node->info.localport; > > + port->guid = node->info.nodeportguid; > > + > > + if (!smp_query_via(nd, portid, IB_ATTR_NODE_DESC, 0, timeout_ms, > > + fabric->ibmad_port)) > > + return -1; > > + > > + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, 0, timeout_ms, > > + fabric->ibmad_port)) > > + return -1; > > + decode_port_info(pi, &port->info); > > + > > + if (node->info.type != IBND_SWITCH_NODE) > > + return 0; > > + > > + node->smalid = port->info.lid; > > + node->smalmc = port->info.lmc; > > + > > + /* after we have the sma information find out the real PortInfo for this port */ > > + if (!smp_query_via(pi, portid, IB_ATTR_PORT_INFO, node->info.localport, timeout_ms, > > + fabric->ibmad_port)) > > + return -1; > > + decode_port_info(pi, &port->info); > > + > > + if (!smp_query_via(si, portid, IB_ATTR_SWITCH_INFO, 0, timeout_ms, > > + fabric->ibmad_port)) > > + node->sw_info.smaenhsp0 = 0; /* assume base SP0 */ > > + else > > + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &node->sw_info.smaenhsp0); > > + > > + IBND_DEBUG("portid %s: got switch node %" PRIx64 " '%s'\n", > > + portid2str(portid), node->info.nodeguid, node->nodedesc); > > + return 1; > > +} > > + > > +static int > > +add_port_to_dpath(ib_dr_path_t *path, int nextport) > > +{ > > + if (path->cnt+2 >= sizeof(path->p)) > > + return -1; > > + ++path->cnt; > > + path->p[path->cnt] = nextport; > > + return path->cnt; > > +} > > + > > +static int > > +extend_dpath(struct ibnd_fabric *f, ib_dr_path_t *path, int nextport) > > +{ > > + int rc = add_port_to_dpath(path, nextport); > > + if ((rc != -1) && (path->cnt > f->fabric.maxhops_discovered)) > > + f->fabric.maxhops_discovered = path->cnt; > > + return (rc); > > +} > > + > > +static void > > +dump_endnode(ib_portid_t *path, char *prompt, > > + struct ibnd_node *node, struct ibnd_port *port) > > +{ > > + if (!show_progress) > > + return; > > + > > + printf("%s -> %s %s {%016" PRIx64 "} portnum %d lid %d-%d\"%s\"\n", > > + portid2str(path), prompt, > > + ibnd_node_type_str((ibnd_node_t *)node), > > + node->node.info.nodeguid, > > + node->node.info.type == IBND_SWITCH_NODE ? 0 : port->port.portnum, > > + port->port.info.lid, port->port.info.lid + (1 << port->port.info.lmc) - 1, > > + node->node.nodedesc); > > +} > > + > > +static struct ibnd_node * > > +find_existing_node(struct ibnd_fabric *fabric, struct ibnd_node *new) > > +{ > > + int hash = HASHGUID(new->node.info.nodeguid) % HTSZ; > > + struct ibnd_node *node; > > + > > + for (node = fabric->nodestbl[hash]; node; node = node->htnext) > > + if (node->node.info.nodeguid == new->node.info.nodeguid) > > + return node; > > + > > + return NULL; > > +} > > + > > +ibnd_node_t * > > +ibnd_find_node_guid(ibnd_fabric_t *fabric, uint64_t guid) > > +{ > > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > > + int hash = HASHGUID(guid) % HTSZ; > > + struct ibnd_node *node; > > + > > + for (node = f->nodestbl[hash]; node; node = node->htnext) > > + if (node->node.info.nodeguid == guid) > > + return (ibnd_node_t *)node; > > + > > + return NULL; > > +} > > + > > +ibnd_node_t * > > +ibnd_update_node(ibnd_node_t *node) > > +{ > > + char portinfo[64]; > > + void *pi = portinfo; > > + ibnd_port_info_t port0_info; > > + char switchinfo[64]; > > + void *si = switchinfo; > > + void *nd = node->nodedesc; > > + int p = 0; > > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(node->fabric); > > + struct ibnd_node *n = CONV_NODE_INTERNAL(node); > > + > > + if (query_node_info(f, n, &(n->node.path_portid))) > > + return (NULL); > > + > > + if (!smp_query_via(nd, &(n->node.path_portid), IB_ATTR_NODE_DESC, 0, timeout_ms, > > + f->ibmad_port)) > > + return (NULL); > > + > > + /* update all the port info's */ > > + for (p = 1; p >= n->node.info.numports; p++) { > > + get_port_info(f, CONV_PORT_INTERNAL(n->node.ports[p]), p, &(n->node.path_portid)); > > + } > > + > > + if (n->node.info.type != IBND_SWITCH_NODE) > > + goto done; > > + > > + if (!smp_query_via(pi, &(n->node.path_portid), IB_ATTR_PORT_INFO, 0, timeout_ms, > > + f->ibmad_port)) > > + return (NULL); > > + decode_port_info(pi, &port0_info); > > + > > + n->node.smalid = port0_info.lid; > > + n->node.smalmc = port0_info.lmc; > > + > > + if (!smp_query_via(si, &(n->node.path_portid), IB_ATTR_SWITCH_INFO, 0, timeout_ms, > > + f->ibmad_port)) > > + node->sw_info.smaenhsp0 = 0; /* assume base SP0 */ > > + else > > + mad_decode_field(si, IB_SW_ENHANCED_PORT0_F, &n->node.sw_info.smaenhsp0); > > + > > +done: > > + return (node); > > +} > > + > > +ibnd_node_t * > > +ibnd_find_node_dr(ibnd_fabric_t *fabric, char *dr_str) > > +{ > > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > > + int i = 0; > > + ibnd_node_t *rc = f->fabric.from_node; > > + ib_dr_path_t path; > > + > > + if (str2drpath(&path, dr_str, 0, 0) == -1) { > > + return (NULL); > > + } > > + > > + for (i = 0; i <= path.cnt; i++) { > > + ibnd_port_t *remote_port = NULL; > > + if (path.p[i] == 0) > > + continue; > > + if (!rc->ports) > > + return (NULL); > > + > > + remote_port = rc->ports[path.p[i]]->remoteport; > > + if (!remote_port) > > + return (NULL); > > + > > + rc = remote_port->node; > > + } > > + > > + return (rc); > > +} > > + > > +static void > > +add_to_nodeguid_hash(struct ibnd_node *node, struct ibnd_node *hash[]) > > +{ > > + int hash_idx = HASHGUID(node->node.info.nodeguid) % HTSZ; > > + > > + node->htnext = hash[hash_idx]; > > + hash[hash_idx] = node; > > +} > > + > > +static void > > +add_to_portguid_hash(struct ibnd_port *port, struct ibnd_port *hash[]) > > +{ > > + int hash_idx = HASHGUID(port->port.guid) % HTSZ; > > + > > + port->htnext = hash[hash_idx]; > > + hash[hash_idx] = port; > > +} > > + > > +static void > > +add_to_type_list(struct ibnd_node*node, struct ibnd_fabric *fabric) > > +{ > > + switch (node->node.info.type) { > > + case IBND_CA_NODE: > > + node->type_next = fabric->ch_adapters; > > + fabric->ch_adapters = node; > > + break; > > + case IBND_SWITCH_NODE: > > + node->type_next = fabric->switches; > > + fabric->switches = node; > > + break; > > + case IBND_ROUTER_NODE: > > + node->type_next = fabric->routers; > > + fabric->routers = node; > > + break; > > + } > > +} > > + > > +static void > > +add_to_nodedist(struct ibnd_node *node, struct ibnd_fabric *fabric) > > +{ > > + int dist = node->node.dist; > > + if (node->node.info.type != IBND_SWITCH_NODE) > > + dist = MAXHOPS; /* special Ca list */ > > + > > + node->dnext = fabric->nodesdist[dist]; > > + fabric->nodesdist[dist] = node; > > +} > > + > > + > > +static struct ibnd_node * > > +create_node(struct ibnd_fabric *fabric, struct ibnd_node *temp, ib_portid_t *path, int dist) > > +{ > > + struct ibnd_node *node; > > + > > + node = malloc(sizeof(*node)); > > + if (!node) { > > + IBPANIC("OOM: node creation failed\n"); > > + return NULL; > > + } > > + > > + memcpy(node, temp, sizeof(*node)); > > + node->node.dist = dist; > > + node->node.path_portid = *path; > > + node->node.fabric = (ibnd_fabric_t *)fabric; > > + > > + add_to_nodeguid_hash(node, fabric->nodestbl); > > + > > + /* add this to the all nodes list */ > > + node->node.next = fabric->fabric.nodes; > > + fabric->fabric.nodes = (ibnd_node_t *)node; > > + > > + add_to_type_list(node, fabric); > > + add_to_nodedist(node, fabric); > > + > > + return node; > > +} > > + > > +static struct ibnd_port * > > +find_existing_port_node(struct ibnd_node *node, struct ibnd_port *port) > > +{ > > + if (port->port.portnum > node->node.info.numports || node->node.ports == NULL ) > > + return (NULL); > > + > > + return (CONV_PORT_INTERNAL(node->node.ports[port->port.portnum])); > > +} > > + > > +static struct ibnd_port * > > +add_port_to_node(struct ibnd_fabric *fabric, struct ibnd_node *node, struct ibnd_port *temp) > > +{ > > + struct ibnd_port *port; > > + > > + port = malloc(sizeof(*port)); > > + if (!port) > > + return NULL; > > + > > + memcpy(port, temp, sizeof(*port)); > > + port->port.node = (ibnd_node_t *)node; > > + port->port.ext_portnum = 0; > > + > > + if (node->node.ports == NULL) { > > + node->node.ports = calloc(sizeof(*node->node.ports), node->node.info.numports + 1); > > + if (!node->node.ports) { > > + IBND_ERROR("Failed to allocate the ports array\n"); > > + return (NULL); > > + } > > + } > > + > > + node->node.ports[temp->port.portnum] = (ibnd_port_t *)port; > > + > > + add_to_portguid_hash(port, fabric->portstbl); > > + return port; > > +} > > + > > +static void > > +link_ports(struct ibnd_node *node, struct ibnd_port *port, > > + struct ibnd_node *remotenode, struct ibnd_port *remoteport) > > +{ > > + IBND_DEBUG("linking: 0x%" PRIx64 " %p->%p:%u and 0x%" PRIx64 " %p->%p:%u\n", > > + node->node.info.nodeguid, node, port, port->port.portnum, > > + remotenode->node.info.nodeguid, remotenode, > > + remoteport, remoteport->port.portnum); > > + if (port->port.remoteport) > > + port->port.remoteport->remoteport = NULL; > > + if (remoteport->port.remoteport) > > + remoteport->port.remoteport->remoteport = NULL; > > + port->port.remoteport = (ibnd_port_t *)remoteport; > > + remoteport->port.remoteport = (ibnd_port_t *)port; > > +} > > + > > +static int > > +get_remote_node(struct ibnd_fabric *fabric, struct ibnd_node *node, struct ibnd_port *port, ib_portid_t *path, > > + int portnum, int dist) > > +{ > > + struct ibnd_node node_buf; > > + struct ibnd_port port_buf; > > + struct ibnd_node *remotenode, *oldnode; > > + struct ibnd_port *remoteport, *oldport; > > + > > + memset(&node_buf, 0, sizeof(node_buf)); > > + memset(&port_buf, 0, sizeof(port_buf)); > > + > > + IBND_DEBUG("handle node %p port %p:%d dist %d\n", node, port, portnum, dist); > > + if (port->port.info.phys_state != 5) /* LinkUp */ > > + return -1; > > + > > + if (extend_dpath(fabric, &path->drpath, portnum) < 0) > > + return -1; > > + > > + if (query_node(fabric, &node_buf, &port_buf, path) < 0) { > > + IBWARN("NodeInfo on %s failed, skipping port", > > + portid2str(path)); > > + path->drpath.cnt--; /* restore path */ > > + return -1; > > + } > > + > > + oldnode = find_existing_node(fabric, &node_buf); > > + if (oldnode) > > + remotenode = oldnode; > > + else if (!(remotenode = create_node(fabric, &node_buf, path, dist + 1))) > > + IBPANIC("no memory"); > > + > > + oldport = find_existing_port_node(remotenode, &port_buf); > > + if (oldport) { > > + remoteport = oldport; > > + } else if (!(remoteport = add_port_to_node(fabric, remotenode, &port_buf))) > > + IBPANIC("no memory"); > > + > > + dump_endnode(path, oldnode ? "known remote" : "new remote", > > + remotenode, remoteport); > > + > > + link_ports(node, port, remotenode, remoteport); > > + > > + path->drpath.cnt--; /* restore path */ > > + return 0; > > +} > > + > > +static void * > > +ibnd_init_port(char *dev_name, int dev_port) > > +{ > > + int mgmt_classes[2] = {IB_SMI_CLASS, IB_SMI_DIRECT_CLASS}; > > + > > + /* Crank up the mad lib */ > > + return (mad_rpc_open_port(dev_name, dev_port, mgmt_classes, 2)); > > +} > > + > > +ibnd_fabric_t * > > +ibnd_discover_fabric(char *dev_name, int dev_port, int timeout_ms, > > + ib_portid_t *from, int hops) > > +{ > > + struct ibnd_fabric *fabric = NULL; > > + ib_portid_t my_portid = {0}; > > + struct ibnd_node node_buf; > > + struct ibnd_port port_buf; > > + struct ibnd_node *node; > > + struct ibnd_port *port; > > + int i; > > + int dist = 0; > > + ib_portid_t *path; > > + int max_hops = MAXHOPS-1; /* default find everything */ > > + > > + /* if not everything how much? */ > > + if (hops >= 0) { > > + max_hops = hops; > > + } > > + > > + /* If not specified start from "my" port */ > > + if (!from) { > > + from = &my_portid; > > + } > > + > > + fabric = malloc(sizeof(*fabric)); > > + > > + if (!fabric) { > > + IBPANIC("OOM: failed to malloc ibnd_fabric_t\n"); > > + return (NULL); > > + } > > + > > + memset(fabric, 0, sizeof(*fabric)); > > + > > + fabric->ibmad_port = ibnd_init_port(dev_name, dev_port); > > + if (!fabric->ibmad_port) { > > + IBPANIC("OOM: failed to open \"%s\" port %d\n", > > + dev_name, dev_port); > > + goto error; > > + } > > + > > + IBND_DEBUG("from %s\n", portid2str(from)); > > + > > + memset(&node_buf, 0, sizeof(node_buf)); > > + memset(&port_buf, 0, sizeof(port_buf)); > > + > > + if (query_node(fabric, &node_buf, &port_buf, from) < 0) { > > + IBWARN("can't reach node %s\n", portid2str(from)); > > + goto error; > > + } > > + > > + node = create_node(fabric, &node_buf, from, 0); > > + if (!node) > > + goto error; > > + > > + fabric->fabric.from_node = (ibnd_node_t *)node; > > + > > + port = add_port_to_node(fabric, node, &port_buf); > > + if (!port) > > + IBPANIC("out of memory"); > > + > > + if (node->node.info.type != IBND_SWITCH_NODE && > > + get_remote_node(fabric, node, port, from, node->node.info.localport, 0) < 0) > > + return ((ibnd_fabric_t *)fabric); > > + > > + for (dist = 0; dist <= max_hops; dist++) { > > + > > + for (node = fabric->nodesdist[dist]; node; node = node->dnext) { > > + > > + path = &node->node.path_portid; > > + > > + IBND_DEBUG("dist %d node %p\n", dist, node); > > + dump_endnode(path, "processing", node, port); > > + > > + for (i = 1; i <= node->node.info.numports; i++) { > > + if (i == node->node.info.localport) > > + continue; > > + > > + if (get_port_info(fabric, &port_buf, i, path) < 0) { > > + IBWARN("can't reach node %s port %d", portid2str(path), i); > > + continue; > > + } > > + > > + port = find_existing_port_node(node, &port_buf); > > + if (port) > > + continue; > > + > > + port = add_port_to_node(fabric, node, &port_buf); > > + if (!port) > > + IBPANIC("out of memory"); > > + > > + /* If switch, set port GUID to node port GUID */ > > + if (node->node.info.type == IBND_SWITCH_NODE) > > + port->port.guid = node->node.info.nodeportguid; > > + > > + get_remote_node(fabric, node, port, path, i, dist); > > + } > > + } > > + } > > + > > + fabric->fabric.chassis = group_nodes(fabric); > > + > > + return ((ibnd_fabric_t *)fabric); > > +error: > > + free(fabric); > > + return (NULL); > > +} > > + > > +static void > > +destroy_node(struct ibnd_node *node) > > +{ > > + int p = 0; > > + > > + for (p = 0; p <= node->node.info.numports; p++) { > > + free(node->node.ports[p]); > > + } > > + free(node->node.ports); > > + free(node); > > +} > > + > > +void > > +ibnd_destroy_fabric(ibnd_fabric_t *fabric) > > +{ > > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > > + int dist = 0; > > + struct ibnd_node *node = NULL; > > + struct ibnd_node *next = NULL; > > + ibnd_chassis_t *ch, *ch_next; > > + > > + ch = f->first_chassis; > > + while (ch) { > > + ch_next = ch->next; > > + free(ch); > > + ch = ch_next; > > + } > > + for (dist = 0; dist <= MAXHOPS; dist++) { > > + node = f->nodesdist[dist]; > > + while (node) { > > + next = node->dnext; > > + destroy_node(node); > > + node = next; > > + } > > + } > > + if (f->ibmad_port) > > + mad_rpc_close_port(f->ibmad_port); > > + free(f); > > +} > > + > > +void > > +ibnd_debug(int i) > > +{ > > + if (i) { > > + ibdebug++; > > + madrpc_show_errors(1); > > + umad_debug(i); > > + } else { > > + ibdebug = 0; > > + madrpc_show_errors(0); > > + umad_debug(0); > > + } > > +} > > + > > +void > > +ibnd_show_progress(int i) > > +{ > > + show_progress = i; > > +} > > + > > +const char* > > +ibnd_node_type_str(ibnd_node_t *node) > > +{ > > + switch(node->info.type) { > > + case IBND_CA_NODE: return "Ca"; > > + case IBND_SWITCH_NODE: return "Switch"; > > + case IBND_ROUTER_NODE: return "Router"; > > + } > > + return "??"; > > +} > > + > > +const char* > > +ibnd_node_type_str_short(ibnd_node_t *node) > > +{ > > + switch(node->info.type) { > > + case IBND_SWITCH_NODE: return "SW"; > > + case IBND_CA_NODE: return "CA"; > > + case IBND_ROUTER_NODE: return "RT"; > > + } > > + return "??"; > > +} > > + > > + > > +void > > +ibnd_iter_nodes(ibnd_fabric_t *fabric, > > + ibnd_iter_node_func_t func, > > + void *user_data) > > +{ > > + ibnd_node_t *cur = NULL; > > + > > + for (cur = fabric->nodes; cur; cur = cur->next) { > > + func(cur, user_data); > > + } > > +} > > + > > + > > +void > > +ibnd_iter_nodes_type(ibnd_fabric_t *fabric, > > + ibnd_iter_node_func_t func, > > + ibnd_node_type_t node_type, > > + void *user_data) > > +{ > > + struct ibnd_fabric *f = CONV_FABRIC_INTERNAL(fabric); > > + struct ibnd_node *list = NULL; > > + struct ibnd_node *cur = NULL; > > + > > + switch (node_type) { > > + case IBND_SWITCH_NODE: > > + list = f->switches; > > + break; > > + case IBND_CA_NODE: > > + list = f->ch_adapters; > > + break; > > + case IBND_ROUTER_NODE: > > + list = f->routers; > > + break; > > + default: > > + IBND_DEBUG("Invalid node_type specified %d\n", node_type); > > + break; > > + } > > + > > + for (cur = list; cur; cur = cur->type_next) { > > + func((ibnd_node_t *)cur, user_data); > > + } > > +} > > + > > diff --git a/infiniband-diags/libibnetdisc/src/internal.h b/infiniband-diags/libibnetdisc/src/internal.h > > new file mode 100644 > > index 0000000..89f238f > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/src/internal.h > > @@ -0,0 +1,82 @@ > > +/* > > + * Copyright (c) 2008 Lawrence Livermore National Laboratory > > + * > > + * This software is available to you under a choice of one of two > > + * licenses. You may choose to be licensed under the terms of the GNU > > + * General Public License (GPL) Version 2, available from the file > > + * COPYING in the main directory of this source tree, or the > > + * OpenIB.org BSD license below: > > + * > > + * Redistribution and use in source and binary forms, with or > > + * without modification, are permitted provided that the following > > + * conditions are met: > > + * > > + * - Redistributions of source code must retain the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer. > > + * > > + * - Redistributions in binary form must reproduce the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer in the documentation and/or other materials > > + * provided with the distribution. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > > + * SOFTWARE. > > + * > > + */ > > + > > +/** ========================================================================= > > + * Define the internal data structures. > > + */ > > + > > +#ifndef _INTERNAL_H_ > > +#define _INTERNAL_H_ > > + > > +#include > > + > > +struct ibnd_node { > > + /* This member MUST BE FIRST */ > > + ibnd_node_t node; > > + > > + /* internal use only */ > > + unsigned char ch_found; > > + struct ibnd_node *htnext; /* hash table list */ > > + struct ibnd_node *dnext; /* nodesdist next */ > > + struct ibnd_node *type_next; /* next based on type */ > > +}; > > +#define CONV_NODE_INTERNAL(node) ((struct ibnd_node *)node) > > + > > +struct ibnd_port { > > + /* This member MUST BE FIRST */ > > + ibnd_port_t port; > > + > > + /* internal use only */ > > + struct ibnd_port *htnext; > > +}; > > +#define CONV_PORT_INTERNAL(port) ((struct ibnd_port *)port) > > + > > +struct ibnd_fabric { > > + /* This member MUST BE FIRST */ > > + ibnd_fabric_t fabric; > > + > > + /* internal use only */ > > + void *ibmad_port; > > + struct ibnd_node *nodestbl[HTSZ]; > > + struct ibnd_port *portstbl[HTSZ]; > > + struct ibnd_node *nodesdist[MAXHOPS+1]; > > + ibnd_chassis_t *first_chassis; > > + ibnd_chassis_t *current_chassis; > > + ibnd_chassis_t *last_chassis; > > + struct ibnd_node *switches; > > + struct ibnd_node *ch_adapters; > > + struct ibnd_node *routers; > > +}; > > +#define CONV_FABRIC_INTERNAL(fabric) ((struct ibnd_fabric *)fabric) > > + > > +#endif /* _INTERNAL_H_ */ > > diff --git a/infiniband-diags/libibnetdisc/src/libibnetdisc.map b/infiniband-diags/libibnetdisc/src/libibnetdisc.map > > new file mode 100644 > > index 0000000..5e8c315 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/src/libibnetdisc.map > > @@ -0,0 +1,27 @@ > > +IBNETDISC_1.0 { > > + global: > > + ibnd_debug; > > + ibnd_show_progress; > > + ibnd_discover_fabric; > > + ibnd_cache_fabric; > > + ibnd_read_fabric; > > + ibnd_destroy_fabric; > > + ibnd_find_node_guid; > > + ibnd_update_node; > > + ibnd_find_node_dr; > > + ibnd_linkwidth_str; > > + ibnd_linkspeed_str; > > + ibnd_node_type_str; > > + ibnd_node_type_str_short; > > + ibnd_is_xsigo_guid; > > + ibnd_is_xsigo_tca; > > + ibnd_is_xsigo_hca; > > + ibnd_get_chassis_guid; > > + ibnd_get_chassis_type; > > + ibnd_get_chassis_slot_str; > > + ibnd_linkstate_str; > > + ibnd_physstate_str; > > + ibnd_iter_nodes; > > + ibnd_iter_nodes_type; > > + local: *; > > +}; > > diff --git a/infiniband-diags/libibnetdisc/test/iblinkinfotest.c b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c > > new file mode 100644 > > index 0000000..6e63f4a > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c > > @@ -0,0 +1,395 @@ > > +/* > > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > > + * > > + * This software is available to you under a choice of one of two > > + * licenses. You may choose to be licensed under the terms of the GNU > > + * General Public License (GPL) Version 2, available from the file > > + * COPYING in the main directory of this source tree, or the > > + * OpenIB.org BSD license below: > > + * > > + * Redistribution and use in source and binary forms, with or > > + * without modification, are permitted provided that the following > > + * conditions are met: > > + * > > + * - Redistributions of source code must retain the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer. > > + * > > + * - Redistributions in binary form must reproduce the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer in the documentation and/or other materials > > + * provided with the distribution. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > > + * SOFTWARE. > > + * > > + */ > > + > > +#if HAVE_CONFIG_H > > +# include > > +#endif /* HAVE_CONFIG_H */ > > + > > +#define _GNU_SOURCE > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > + > > +char *argv0 = "iblinkinfotest"; > > +static FILE *f; > > + > > +static char *node_name_map_file = NULL; > > +static nn_map_t *node_name_map = NULL; > > + > > +static int timeout_ms = 500; > > + > > +static int debug = 0; > > +#define DEBUG(str, args...) \ > > + if (debug) fprintf(stderr, str, ##args) > > + > > +static int down_links_only = 0; > > +static int line_mode = 0; > > +static int add_sw_settings = 0; > > +static int print_port_guids = 0; > > + > > +static unsigned int > > +get_max(unsigned int num) > > +{ > > + unsigned int v = num; // 32-bit word to find the log base 2 of > > + unsigned r = 0; // r will be lg(v) > > + > > + while (v >>= 1) // unroll for more speed... > > + { > > + r++; > > + } > > + > > + return (1 << r); > > +} > > + > > +void > > +get_msg(char *width_msg, char *speed_msg, int msg_size, ibnd_port_t *port) > > +{ > > + int max_speed = 0; > > + > > + int max_width = get_max(port->info.link_width_supported > > + & port->remoteport->info.link_width_supported); > > + if ((max_width & port->info.link_width_active) == 0) { > > + // we are not at the max supported width > > + // print what we could be at. > > + snprintf(width_msg, msg_size, "Could be %s", > > + ibnd_linkwidth_str(max_width)); > > + } > > + > > + max_speed = get_max(port->info.link_speed_supported > > + & port->remoteport->info.link_speed_supported); > > + if ((max_speed & port->info.link_speed_active) == 0) { > > + // we are not at the max supported speed > > + // print what we could be at. > > + snprintf(speed_msg, msg_size, "Could be %s", > > + ibnd_linkspeed_str(max_speed, 1)); > > + } > > +} > > + > > +void > > +print_port(ibnd_node_t *node, ibnd_port_t *port) > > +{ > > + char remote_guid_str[256]; > > + char remote_str[256]; > > + char link_str[256]; > > + char width_msg[256]; > > + char speed_msg[256]; > > + char ext_port_str[256]; > > + > > + if (!port) > > + return; > > + > > + remote_guid_str[0] = '\0'; > > + remote_str[0] = '\0'; > > + link_str[0] = '\0'; > > + width_msg[0] = '\0'; > > + speed_msg[0] = '\0'; > > + > > + if (port->remoteport) { > > + char remote_name_buf[256]; > > + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); > > + > > + if (port->remoteport->ext_portnum) > > + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); > > + else > > + ext_port_str[0] = '\0'; > > + > > + get_msg(width_msg, speed_msg, 256, port); > > + if (line_mode) { > > + if (print_port_guids) { > > + snprintf(remote_guid_str, 256, > > + "0x%016lx ", > > + port->remoteport->guid); > > + } else { > > + snprintf(remote_guid_str, 256, > > + "0x%016lx ", > > + port->remoteport->node->info.nodeguid); > > + } > > + } > > + > > + snprintf(remote_str, 256, > > + "%s%6d %4d[%2s] \"%s\" (%s %s)\n", > > + remote_guid_str, > > + port->remoteport->info.lid ? > > + port->remoteport->info.lid : > > + port->remoteport->node->smalid, > > + port->remoteport->portnum, > > + ext_port_str, > > + remap_node_name(node_name_map, > > + port->remoteport->node->info.nodeguid, > > + remote_name_buf), > > + width_msg, > > + speed_msg > > + ); > > + } else { > > + snprintf(remote_str, 256, > > + "%6s %4s[%2s] \"\" ( )\n", "", "", ""); > > + } > > + > > + if (add_sw_settings) { > > + snprintf(link_str, 256, > > + "(%3s %s %6s/%8s) (HOQ:%d VL_Stall:%d)", > > + ibnd_linkwidth_str(port->info.link_width_active), > > + ibnd_linkspeed_str(port->info.link_speed_active, 1), > > + ibnd_linkstate_str(port->info.link_state), > > + ibnd_physstate_str(port->info.phys_state), > > + port->info.hoq_lifetime, > > + port->info.vl_stall_count > > + ); > > + } else { > > + snprintf(link_str, 256, > > + "(%3s %s %6s/%8s)", > > + ibnd_linkwidth_str(port->info.link_width_active), > > + ibnd_linkspeed_str(port->info.link_speed_active, 1), > > + ibnd_linkstate_str(port->info.link_state), > > + ibnd_physstate_str(port->info.phys_state) > > + ); > > + } > > + > > + if (port->ext_portnum) > > + snprintf(ext_port_str, 256, "%d", port->ext_portnum); > > + else > > + ext_port_str[0] = '\0'; > > + > > + if (line_mode) { > > + char name_buf[256]; > > + strncpy(name_buf, node->nodedesc, 256); > > + printf("0x%016lx \"%30s\" %6d %4d[%2s] ==%s==> %s", > > + node->info.nodeguid, > > + remap_node_name(node_name_map, > > + node->info.nodeguid, > > + name_buf), > > + node->smalid, port->portnum, > > + ext_port_str, > > + link_str, > > + remote_str > > + ); > > + } else { > > + printf(" %6d %4d[%2s] ==%s==> %s", > > + node->smalid, port->portnum, > > + ext_port_str, > > + link_str, > > + remote_str > > + ); > > + } > > +} > > + > > +void > > +print_switch(ibnd_node_t *node, void *user_data) > > +{ > > + int i = 0; > > + > > + if (!line_mode) { > > + char name_buf[256]; > > + strncpy(name_buf, node->nodedesc, 256); > > + printf("Switch 0x%016lx %s:\n", > > + node->info.nodeguid, > > + remap_node_name(node_name_map, > > + node->info.nodeguid, > > + name_buf)); > > + } > > + > > + for (i = 1; i <= node->info.numports; i++) { > > + ibnd_port_t *port = node->ports[i]; > > + if (!port) > > + continue; > > + if (!down_links_only || port->info.link_state == IBND_LINK_DOWN) { > > + print_port(node, port); > > + } > > + } > > +} > > + > > +void > > +usage(void) > > +{ > > + fprintf(stderr, > > + "Usage: %s [-hclp -S -D -C -P ]\n" > > + " Report link speed and connection for each port of each switch which is active\n" > > + " -h This help message\n" > > + " -S output only the node specified by guid\n" > > + " -D print only node specified by \n" > > + " -f specify node to start \"from\"\n" > > + " -n Number of hops to include away from specified node\n" > > + " -d print only down links\n" > > + " -l (line mode) print all information for each link on each line\n" > > + " -p print additional switch settings (PktLifeTime,HoqLife,VLStallCount)\n" > > + > > + > > + " -t timeout for any single fabric query\n" > > + " -s show errors\n" > > + " --node-name-map use specified node name map\n" > > + > > + " -C use selected Channel Adaptor name for queries\n" > > + " -P use selected channel adaptor port for queries\n" > > + " -g print port guids instead of node guids\n" > > + " --debug print debug messages\n" > > + , > > + argv0); > > + exit(-1); > > +} > > + > > +int > > +main(int argc, char **argv) > > +{ > > + char *ca = 0; > > + int ca_port = 0; > > + ibnd_fabric_t *fabric = NULL; > > + uint64_t guid = 0; > > + char *dr_path = NULL; > > + char *from = NULL; > > + int hops = 0; > > + ib_portid_t port_id; > > + > > + static char const str_opts[] = "S:D:n:C:P:t:sldgphuf:"; > > + static const struct option long_opts[] = { > > + { "S", 1, 0, 'S'}, > > + { "D", 1, 0, 'D'}, > > + { "num-hops", 1, 0, 'n'}, > > + { "down-links-only", 0, 0, 'd'}, > > + { "line-mode", 0, 0, 'l'}, > > + { "ca-name", 1, 0, 'C'}, > > + { "ca-port", 1, 0, 'P'}, > > + { "timeout", 1, 0, 't'}, > > + { "show", 0, 0, 's'}, > > + { "print-port-guids", 0, 0, 'g'}, > > + { "print-additional", 0, 0, 'p'}, > > + { "help", 0, 0, 'h'}, > > + { "usage", 0, 0, 'u'}, > > + { "node-name-map", 1, 0, 1}, > > + { "debug", 0, 0, 2}, > > + { "from", 1, 0, 'f'}, > > + { } > > + }; > > + > > + f = stdout; > > + > > + argv0 = argv[0]; > > + > > + while (1) { > > + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); > > + if ( ch == -1 ) > > + break; > > + switch(ch) { > > + case 1: > > + node_name_map_file = strdup(optarg); > > + break; > > + case 2: > > + debug = 1; > > + ibnd_debug(1); > > + break; > > + case 'f': > > + from = strdup(optarg); > > + break; > > + case 'C': > > + ca = strdup(optarg); > > + break; > > + case 'P': > > + ca_port = strtoul(optarg, 0, 0); > > + break; > > + case 'D': > > + dr_path = strdup(optarg); > > + break; > > + case 'n': > > + hops = (int)strtol(optarg, NULL, 0); > > + break; > > + case 'd': > > + down_links_only = 1; > > + break; > > + case 'l': > > + line_mode = 1; > > + break; > > + case 't': > > + timeout_ms = strtoul(optarg, 0, 0); > > + break; > > + case 'g': > > + print_port_guids = 1; > > + break; > > + case 'S': > > + guid = (uint64_t)strtoull(optarg, 0, 0); > > + break; > > + case 'p': > > + add_sw_settings = 1; > > + break; > > + default: > > + usage(); > > + break; > > + } > > + } > > + argc -= optind; > > + argv += optind; > > + > > + if (argc && !(f = fopen(argv[0], "w"))) > > + fprintf(stderr, "can't open file %s for writing", argv[0]); > > + > > + node_name_map = open_node_name_map(node_name_map_file); > > + > > + if (from) { > > + /* only scan part of the fabric */ > > + str2drpath(&(port_id.drpath), from, 0, 0); > > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { > > + fprintf(stderr, "discover failed\n"); > > + exit(1); > > + } > > + guid = 0; > > + } else { > > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { > > + fprintf(stderr, "discover failed\n"); > > + exit(1); > > + } > > + } > > + > > + if (guid) { > > + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); > > + print_switch(sw, NULL); > > + } else if (dr_path) { > > + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); > > + print_switch(sw, NULL); > > + } else { > > + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); > > + } > > + > > + ibnd_destroy_fabric(fabric); > > + > > + close_node_name_map(node_name_map); > > + exit(0); > > +} > > diff --git a/infiniband-diags/libibnetdisc/test/ibnetdisctest.c b/infiniband-diags/libibnetdisc/test/ibnetdisctest.c > > new file mode 100644 > > index 0000000..fc6e234 > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/test/ibnetdisctest.c > > @@ -0,0 +1,675 @@ > > +/* > > + * Copyright (c) 2004-2008 Voltaire Inc. All rights reserved. > > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > > + * > > + * This software is available to you under a choice of one of two > > + * licenses. You may choose to be licensed under the terms of the GNU > > + * General Public License (GPL) Version 2, available from the file > > + * COPYING in the main directory of this source tree, or the > > + * OpenIB.org BSD license below: > > + * > > + * Redistribution and use in source and binary forms, with or > > + * without modification, are permitted provided that the following > > + * conditions are met: > > + * > > + * - Redistributions of source code must retain the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer. > > + * > > + * - Redistributions in binary form must reproduce the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer in the documentation and/or other materials > > + * provided with the distribution. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > > + * SOFTWARE. > > + * > > + */ > > + > > +#if HAVE_CONFIG_H > > +# include > > +#endif /* HAVE_CONFIG_H */ > > + > > +#define _GNU_SOURCE > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > +#include > > + > > +static int verbose; > > +#define LIST_CA_NODE (1 << IBND_CA_NODE) > > +#define LIST_SWITCH_NODE (1 << IBND_SWITCH_NODE) > > +#define LIST_ROUTER_NODE (1 << IBND_ROUTER_NODE) > > + > > +char *argv0 = "ibnetdiscover"; > > +static FILE *f; > > + > > +static char *node_name_map_file = NULL; > > +static nn_map_t *node_name_map = NULL; > > + > > +static int timeout_ms = 2000; > > + > > +static int debug = 0; > > +#define DEBUG(str, args...) \ > > + if (debug) fprintf(stderr, str, ##args) > > + > > + > > +char * > > +node_name(ibnd_node_t *node) > > +{ > > + static char buf[256]; > > + > > + switch(node->info.type) { > > + case IBND_CA_NODE: > > + sprintf(buf, "\"%s", "H"); > > + break; > > + case IBND_SWITCH_NODE: > > + sprintf(buf, "\"%s", "S"); > > + break; > > + case IBND_ROUTER_NODE: > > + sprintf(buf, "\"%s", "R"); > > + break; > > + default: > > + sprintf(buf, "\"%s", "?"); > > + break; > > + } > > + sprintf(buf+2, "-%016" PRIx64 "\"", node->info.nodeguid); > > + > > + return buf; > > +} > > + > > +void > > +list_node(ibnd_node_t *node, void *user_data) > > +{ > > + char *nodename = remap_node_name(node_name_map, node->info.nodeguid, > > + node->nodedesc); > > + > > + fprintf(f, "%s\t : 0x%016" PRIx64 " ports %d devid 0x%x vendid 0x%x \"%s\"\n", > > + ibnd_node_type_str(node), > > + node->info.nodeguid, node->info.numports, node->info.devid, > > + node->info.vendid, > > + nodename); > > + > > + free(nodename); > > +} > > + > > +void > > +list_nodes(ibnd_fabric_t *fabric, int list) > > +{ > > + if (list & LIST_CA_NODE) { > > + ibnd_iter_nodes_type(fabric, list_node, IBND_CA_NODE, NULL); > > + } > > + if (list & LIST_SWITCH_NODE) { > > + ibnd_iter_nodes_type(fabric, list_node, IBND_SWITCH_NODE, NULL); > > + } > > + if (list & LIST_ROUTER_NODE) { > > + ibnd_iter_nodes_type(fabric, list_node, IBND_ROUTER_NODE, NULL); > > + } > > +} > > + > > +void > > +out_ids(ibnd_node_t *node, int group, char *chname) > > +{ > > + fprintf(f, "\nvendid=0x%x\ndevid=0x%x\n", node->info.vendid, node->info.devid); > > + if (node->info.sysimgguid) > > + fprintf(f, "sysimgguid=0x%" PRIx64, node->info.sysimgguid); > > + if (group > > + && node->chassis && node->chassis->chassisnum) { > > + fprintf(f, "\t\t# Chassis %d", node->chassis->chassisnum); > > + if (chname) > > + fprintf(f, " (%s)", clean_nodedesc(chname)); > > + if (ibnd_is_xsigo_tca(node->info.nodeguid) > > + && node->ports[1] > > + && node->ports[1]->remoteport) > > + fprintf(f, " slot %d", node->ports[1]->remoteport->portnum); > > + } > > + fprintf(f, "\n"); > > +} > > + > > + > > +uint64_t > > +out_chassis(ibnd_fabric_t *fabric, int chassisnum) > > +{ > > + uint64_t guid; > > + > > + fprintf(f, "\nChassis %d", chassisnum); > > + guid = ibnd_get_chassis_guid(fabric, chassisnum); > > + if (guid) > > + fprintf(f, " (guid 0x%" PRIx64 ")", guid); > > + fprintf(f, "\n"); > > + return guid; > > +} > > + > > +void > > +out_switch(ibnd_node_t *node, int group, char *chname) > > +{ > > + char *str; > > + char str2[256]; > > + char *nodename = NULL; > > + > > + out_ids(node, group, chname); > > + fprintf(f, "switchguid=0x%" PRIx64, node->info.nodeguid); > > + fprintf(f, "(%" PRIx64 ")", node->info.nodeportguid); > > + if (group) { > > + str = ibnd_get_chassis_type(node); > > + if (str) > > + fprintf(f, "%s ", str); > > + str = ibnd_get_chassis_slot_str(node, str2, 256); > > + if (str) > > + fprintf(f, "%s", str); > > + } > > + > > + nodename = remap_node_name(node_name_map, node->info.nodeguid, > > + node->nodedesc); > > + > > + fprintf(f, "\nSwitch\t%d %s\t\t# \"%s\" %s port 0 lid %d lmc %d\n", > > + node->info.numports, node_name(node), > > + nodename, > > + node->sw_info.smaenhsp0 ? "enhanced" : "base", > > + node->smalid, node->smalmc); > > + > > + free(nodename); > > +} > > + > > +void > > +out_ca(ibnd_node_t *node, int group, char *chname) > > +{ > > + char *node_type; > > + char *node_type2; > > + > > + out_ids(node, group, chname); > > + switch(node->info.type) { > > + case IBND_CA_NODE: > > + node_type = "ca"; > > + node_type2 = "Ca"; > > + break; > > + case IBND_ROUTER_NODE: > > + node_type = "rt"; > > + node_type2 = "Rt"; > > + break; > > + default: > > + node_type = "???"; > > + node_type2 = "???"; > > + break; > > + } > > + > > + fprintf(f, "%sguid=0x%" PRIx64 "\n", node_type, node->info.nodeguid); > > + fprintf(f, "%s\t%d %s\t\t# \"%s\"", > > + node_type2, node->info.numports, node_name(node), > > + clean_nodedesc(node->nodedesc)); > > + if (group && ibnd_is_xsigo_hca(node->info.nodeguid)) > > + fprintf(f, " (scp)"); > > + fprintf(f, "\n"); > > +} > > + > > +#define OUT_BUFFER_SIZE 16 > > +static char * > > +out_ext_port(ibnd_port_t *port, int group) > > +{ > > + static char mapping[OUT_BUFFER_SIZE]; > > + > > + if (group && port->ext_portnum != 0) { > > + snprintf(mapping, OUT_BUFFER_SIZE, > > + "[ext %d]", port->ext_portnum); > > + return (mapping); > > + } > > + > > + return (NULL); > > +} > > + > > +void > > +out_switch_port(ibnd_port_t *port, int group) > > +{ > > + char *ext_port_str = NULL; > > + char *rem_nodename = NULL; > > + > > + DEBUG("port %p:%d remoteport %p\n", port, port->portnum, port->remoteport); > > + fprintf(f, "[%d]", port->portnum); > > + > > + ext_port_str = out_ext_port(port, group); > > + if (ext_port_str) > > + fprintf(f, "%s", ext_port_str); > > + > > + rem_nodename = remap_node_name(node_name_map, > > + port->remoteport->node->info.nodeguid, > > + port->remoteport->node->nodedesc); > > + > > + ext_port_str = out_ext_port(port->remoteport, group); > > + fprintf(f, "\t%s[%d]%s", > > + node_name(port->remoteport->node), > > + port->remoteport->portnum, > > + ext_port_str ? ext_port_str : ""); > > + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) > > + fprintf(f, "(%" PRIx64 ") ", port->remoteport->guid); > > + fprintf(f, "\t\t# \"%s\" lid %d %s%s", > > + rem_nodename, > > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, > > + ibnd_linkwidth_str(port->info.link_width_active), > > + ibnd_linkspeed_str(port->info.link_speed_active, 0)); > > + > > + if (ibnd_is_xsigo_tca(port->remoteport->guid)) > > + fprintf(f, " slot %d", port->portnum); > > + else if (ibnd_is_xsigo_hca(port->remoteport->guid)) > > + fprintf(f, " (scp)"); > > + fprintf(f, "\n"); > > + > > + free(rem_nodename); > > +} > > + > > +void > > +out_ca_port(ibnd_port_t *port, int group) > > +{ > > + char *str = NULL; > > + char *rem_nodename = NULL; > > + > > + fprintf(f, "[%d]", port->portnum); > > + if (port->node->info.type != IBND_SWITCH_NODE) > > + fprintf(f, "(%" PRIx64 ") ", port->guid); > > + fprintf(f, "\t%s[%d]", > > + node_name(port->remoteport->node), > > + port->remoteport->portnum); > > + str = out_ext_port(port->remoteport, group); > > + if (str) > > + fprintf(f, "%s", str); > > + if (port->remoteport->node->info.type != IBND_SWITCH_NODE) > > + fprintf(f, " (%" PRIx64 ") ", port->remoteport->guid); > > + > > + rem_nodename = remap_node_name(node_name_map, > > + port->remoteport->node->info.nodeguid, > > + port->remoteport->node->nodedesc); > > + > > + fprintf(f, "\t\t# lid %d lmc %d \"%s\" lid %d %s%s\n", > > + port->info.lid, port->info.lmc, rem_nodename, > > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? port->remoteport->node->smalid : port->remoteport->info.lid, > > + ibnd_linkwidth_str(port->info.link_width_active), > > + ibnd_linkspeed_str(port->info.link_speed_active, 0)); > > + > > + free(rem_nodename); > > +} > > + > > +struct iter_user_data { > > + int group; > > + int skip_chassis_nodes; > > +}; > > + > > +static void > > +switch_iter_func(ibnd_node_t *node, void *iter_user_data) > > +{ > > + ibnd_port_t *port; > > + int p = 0; > > + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; > > + > > + DEBUG("SWITCH: node %p\n", node); > > + > > + /* skip chassis based switches if flagged */ > > + if (data->skip_chassis_nodes && node->chassis && node->chassis->chassisnum) > > + return; > > + > > + out_switch(node, data->group, NULL); > > + for (p = 1; p <= node->info.numports; p++) { > > + port = node->ports[p]; > > + if (port && port->remoteport) > > + out_switch_port(port, data->group); > > + } > > +} > > + > > +static void > > +ca_iter_func(ibnd_node_t *node, void *iter_user_data) > > +{ > > + ibnd_port_t *port; > > + int p = 0; > > + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; > > + > > + DEBUG("CA: node %p\n", node); > > + /* Now, skip chassis based CAs */ > > + if (data->group && node->chassis && node->chassis->chassisnum) > > + return; > > + out_ca(node, data->group, NULL); > > + > > + for (p = 1; p <= node->info.numports; p++) { > > + port = node->ports[p]; > > + if (port && port->remoteport) > > + out_ca_port(port, data->group); > > + } > > +} > > + > > +static void > > +router_iter_func(ibnd_node_t *node, void *iter_user_data) > > +{ > > + ibnd_port_t *port; > > + int p = 0; > > + struct iter_user_data *data = (struct iter_user_data *)iter_user_data; > > + > > + DEBUG("RT: node %p\n", node); > > + /* Now, skip chassis based RTs */ > > + if (data->group && node->chassis && node->chassis->chassisnum) > > + return; > > + out_ca(node, data->group, NULL); > > + for (p = 1; p <= node->info.numports; p++) { > > + port = node->ports[p]; > > + if (port && port->remoteport) > > + out_ca_port(port, data->group); > > + } > > +} > > + > > +int > > +dump_topology(int group, ibnd_fabric_t *fabric) > > +{ > > + ibnd_node_t *node; > > + ibnd_port_t *port; > > + int i = 0, p = 0; > > + time_t t = time(0); > > + uint64_t chguid; > > + char *chname = NULL; > > + struct iter_user_data iter_user_data; > > + > > + fprintf(f, "#\n# Topology file: generated on %s#\n", ctime(&t)); > > + fprintf(f, "# Max of %d hops discovered\n", fabric->maxhops_discovered); > > + fprintf(f, "# Initiated from node %016" PRIx64 " port %016" PRIx64 "\n", > > + fabric->from_node->info.nodeguid, fabric->from_node->info.nodeportguid); > > + > > + /* Make pass on switches */ > > + if (group) { > > + ibnd_chassis_t *ch = NULL; > > + > > + /* Chassis based switches first */ > > + for (ch = fabric->chassis; ch; ch = ch->next) { > > + int n = 0; > > + > > + if (!ch->chassisnum) > > + continue; > > + chguid = out_chassis(fabric, ch->chassisnum); > > + > > + chname = NULL; > > +/** > > + * Will this work for Xsigo? > > + */ > > + if (ibnd_is_xsigo_guid(chguid)) { > > + for (node = ch->nodes; node; > > + node = node->next_chassis_node) { > > + if (ibnd_is_xsigo_hca(node->info.nodeguid)) { > > + chname = node->nodedesc; > > + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); > > + } > > + } > > + > > +#if 0 > > +/** > > + * vs. this? > > + * I don't want to expose the nodesdist array to the end user. > > + */ > > + for (node = fabric->nodesdist[MAXHOPS]; node; node = node->dnext) { > > + if (!node->chrecord || > > + !node->chrecord->chassisnum) > > + continue; > > + > > + if (node->chrecord->chassisnum != ch->chassisnum) > > + continue; > > + > > + if (ibnd_is_xsigo_hca(node->nodeguid)) { > > + chname = node->nodedesc; > > + fprintf(f, "Hostname: %s\n", clean_nodedesc(node->nodedesc)); > > + } > > + } > > +#endif > > + } > > + > > + fprintf(f, "\n# Spine Nodes"); > > + for (n = 1; n <= SPINES_MAX_NUM; n++) { > > + if (ch->spinenode[n]) { > > + out_switch(ch->spinenode[n], group, chname); > > + for (p = 1; p <= ch->spinenode[n]->info.numports; p++) { > > + port = ch->spinenode[n]->ports[p]; > > + if (port && port->remoteport) > > + out_switch_port(port, group); > > + } > > + } > > + } > > + fprintf(f, "\n# Line Nodes"); > > + for (n = 1; n <= LINES_MAX_NUM; n++) { > > + if (ch->linenode[n]) { > > + out_switch(ch->linenode[n], group, chname); > > + for (p = 1; p <= ch->linenode[n]->info.numports; p++) { > > + port = ch->linenode[n]->ports[p]; > > + if (port && port->remoteport) > > + out_switch_port(port, group); > > + } > > + } > > + } > > + > > + fprintf(f, "\n# Chassis Switches"); > > + for (node = ch->nodes; node; > > + node = node->next_chassis_node) { > > + if (node->info.type == IBND_SWITCH_NODE) { > > + out_switch(node, group, chname); > > + for (p = 1; p <= node->info.numports; p++) { > > + port = node->ports[p]; > > + if (port && port->remoteport) > > + out_switch_port(port, group); > > + } > > + } > > + } > > + > > + fprintf(f, "\n# Chassis CAs"); > > + for (node = ch->nodes; node; > > + node = node->next_chassis_node) { > > + if (node->info.type == IBND_CA_NODE) { > > + out_ca(node, group, chname); > > + for (p = 1; p <= node->info.numports; p++) { > > + port = node->ports[p]; > > + if (port && port->remoteport) > > + out_ca_port(port, group); > > + } > > + } > > + } > > + > > + } > > + > > + } else { /* !group */ > > + iter_user_data.group = group; > > + iter_user_data.skip_chassis_nodes = 0; > > + > > + ibnd_iter_nodes_type(fabric, switch_iter_func, > > + IBND_SWITCH_NODE, &iter_user_data); > > + } > > + > > + chname = NULL; > > + if (group) { > > + iter_user_data.group = group; > > + iter_user_data.skip_chassis_nodes = 1; > > + > > + fprintf(f, "\nNon-Chassis Nodes\n"); > > + ibnd_iter_nodes_type(fabric, switch_iter_func, > > + IBND_SWITCH_NODE, &iter_user_data); > > + > > + } > > + > > + iter_user_data.group = group; > > + iter_user_data.skip_chassis_nodes = 0; > > + > > + /* Make pass on CAs */ > > + ibnd_iter_nodes_type(fabric, ca_iter_func, IBND_CA_NODE, > > + &iter_user_data); > > + > > + /* make pass on routers */ > > + ibnd_iter_nodes_type(fabric, router_iter_func, IBND_ROUTER_NODE, > > + &iter_user_data); > > + > > + return i; > > +} > > + > > + > > +void dump_ports_report (ibnd_node_t *node, void *user_data) > > +{ > > + int p = 0; > > + ibnd_port_t *port = NULL; > > + > > + /* for each port */ > > + for (p = node->info.numports, port = node->ports[p]; > > + p > 0; > > + port = node->ports[--p]) { > > + if (port == NULL) > > + continue; > > + > > + fprintf(stdout, > > + "%2s %5d %2d 0x%016" PRIx64 " %s %s", > > + ibnd_node_type_str_short(node), > > + node->info.type == IBND_SWITCH_NODE ? node->smalid : port->info.lid, > > + port->portnum, > > + port->guid, > > + ibnd_linkwidth_str(port->info.link_width_active), > > + ibnd_linkspeed_str(port->info.link_speed_active, 0)); > > + if (port->remoteport) > > + fprintf(stdout, > > + " - %2s %5d %2d 0x%016" PRIx64 > > + " ( '%s' - '%s' )\n", > > + ibnd_node_type_str_short(port->remoteport->node), > > + port->remoteport->node->info.type == IBND_SWITCH_NODE ? > > + port->remoteport->node->smalid : port->remoteport->info.lid, > > + port->remoteport->portnum, > > + port->remoteport->guid, > > + port->node->nodedesc, > > + port->remoteport->node->nodedesc); > > + else > > + fprintf(stdout, "%36s'%s'\n", "", > > + port->node->nodedesc); > > + } > > +} > > + > > +void > > +usage(void) > > +{ > > + fprintf(stderr, "Usage: %s [-d(ebug)] -s(how) -l(ist) -g(rouping) -H(ca_list) -S(witch_list) -R(outer_list) -V(ersion) -C ca_name -P ca_port " > > + "-t(imeout) timeout_ms --node-name-map node-name-map] -p(orts) []\n", > > + argv0); > > + fprintf(stderr, " --node-name-map specify a node name map file\n"); > > + exit(-1); > > +} > > + > > +int > > +main(int argc, char **argv) > > +{ > > + int list = 0; > > + char *ca = 0; > > + int ca_port = 0; > > + int group = 0; > > + int ports_report = 0; > > + ibnd_fabric_t *fabric = NULL; > > + > > + static char const str_opts[] = "C:P:t:devslgHSRpVhu"; > > + static const struct option long_opts[] = { > > + { "C", 1, 0, 'C'}, > > + { "P", 1, 0, 'P'}, > > + { "debug", 0, 0, 'd'}, > > + { "verbose", 0, 0, 'v'}, > > + { "show", 0, 0, 's'}, > > + { "list", 0, 0, 'l'}, > > + { "grouping", 0, 0, 'g'}, > > + { "Hca_list", 0, 0, 'H'}, > > + { "Switch_list", 0, 0, 'S'}, > > + { "Router_list", 0, 0, 'R'}, > > + { "timeout", 1, 0, 't'}, > > + { "node-name-map", 1, 0, 1}, > > + { "ports", 0, 0, 'p'}, > > + { "Version", 0, 0, 'V'}, > > + { "help", 0, 0, 'h'}, > > + { "usage", 0, 0, 'u'}, > > + { } > > + }; > > + > > + f = stdout; > > + > > + argv0 = argv[0]; > > + > > + while (1) { > > + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); > > + if ( ch == -1 ) > > + break; > > + switch(ch) { > > + case 1: > > + node_name_map_file = strdup(optarg); > > + break; > > + case 'C': > > + ca = optarg; > > + break; > > + case 'P': > > + ca_port = strtoul(optarg, 0, 0); > > + break; > > + case 'd': > > + debug = 1; > > + ibnd_debug(1); > > + break; > > + case 't': > > + timeout_ms = strtoul(optarg, 0, 0); > > + break; > > + case 'v': > > + verbose++; > > + break; > > + case 's': > > + ibnd_show_progress(1); > > + break; > > + case 'l': > > + list = LIST_CA_NODE | LIST_SWITCH_NODE | LIST_ROUTER_NODE; > > + break; > > + case 'g': > > + group = 1; > > + break; > > + case 'S': > > + list |= LIST_SWITCH_NODE; > > + break; > > + case 'H': > > + list |= LIST_CA_NODE; > > + break; > > + case 'R': > > + list |= LIST_ROUTER_NODE; > > + break; > > + case 'p': > > + ports_report = 1; > > + break; > > + default: > > + usage(); > > + break; > > + } > > + } > > + argc -= optind; > > + argv += optind; > > + > > + if (argc && !(f = fopen(argv[0], "w"))) > > + fprintf(stderr, "can't open file %s for writing", argv[0]); > > + > > + node_name_map = open_node_name_map(node_name_map_file); > > + > > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { > > + fprintf(stderr, "discover failed\n"); > > + exit(1); > > + } > > + > > + if (ports_report) > > + ibnd_iter_nodes(fabric, > > + dump_ports_report, > > + NULL); > > + else if (list) > > + list_nodes(fabric, list); > > + else > > + dump_topology(group, fabric); > > + > > + ibnd_destroy_fabric(fabric); > > + close_node_name_map(node_name_map); > > + exit(0); > > +} > > diff --git a/infiniband-diags/libibnetdisc/test/testleaks.c b/infiniband-diags/libibnetdisc/test/testleaks.c > > new file mode 100644 > > index 0000000..3fbf7af > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/test/testleaks.c > > @@ -0,0 +1,268 @@ > > +/* > > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > > + * > > + * This software is available to you under a choice of one of two > > + * licenses. You may choose to be licensed under the terms of the GNU > > + * General Public License (GPL) Version 2, available from the file > > + * COPYING in the main directory of this source tree, or the > > + * OpenIB.org BSD license below: > > + * > > + * Redistribution and use in source and binary forms, with or > > + * without modification, are permitted provided that the following > > + * conditions are met: > > + * > > + * - Redistributions of source code must retain the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer. > > + * > > + * - Redistributions in binary form must reproduce the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer in the documentation and/or other materials > > + * provided with the distribution. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > > + * SOFTWARE. > > + * > > + */ > > + > > +#if HAVE_CONFIG_H > > +# include > > +#endif /* HAVE_CONFIG_H */ > > + > > +#define _GNU_SOURCE > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > + > > +char *argv0 = "iblinkinfotest"; > > +static FILE *f; > > + > > +static int timeout_ms = 500; > > + > > +void > > +print_port(ibnd_node_t *node, ibnd_port_t *port) > > +{ > > + char remote_guid_str[256]; > > + char remote_str[256]; > > + char link_str[256]; > > + char speed_msg[256]; > > + char ext_port_str[256]; > > + > > + if (!port) > > + return; > > + > > + remote_guid_str[0] = '\0'; > > + remote_str[0] = '\0'; > > + link_str[0] = '\0'; > > + speed_msg[0] = '\0'; > > + > > + if (port->remoteport) { > > + char remote_name_buf[256]; > > + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); > > + > > + if (port->remoteport->ext_portnum) > > + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); > > + else > > + ext_port_str[0] = '\0'; > > + > > + snprintf(remote_str, 256, > > + "%s%6d %4d[%2s] \"%s\" (%s)\n", > > + remote_guid_str, > > + port->remoteport->info.lid ? > > + port->remoteport->info.lid : > > + port->remoteport->node->smalid, > > + port->remoteport->portnum, > > + ext_port_str, > > + port->remoteport->node->nodedesc, > > + speed_msg > > + ); > > + } else { > > + snprintf(remote_str, 256, > > + "%6s %4s[%2s] \"\" ( )\n", "", "", ""); > > + } > > + > > + snprintf(link_str, 256, > > + "(%3s %s %6s/%8s)", > > + ibnd_linkwidth_str(port->info.link_width_active), > > + ibnd_linkspeed_str(port->info.link_speed_active, 0), > > + ibnd_linkstate_str(port->info.link_state), > > + ibnd_physstate_str(port->info.phys_state) > > + ); > > + > > + if (port->ext_portnum) > > + snprintf(ext_port_str, 256, "%d", port->ext_portnum); > > + else > > + ext_port_str[0] = '\0'; > > + > > + printf(" %6d %4d[%2s] ==%s==> %s", > > + node->smalid, port->portnum, > > + ext_port_str, > > + link_str, > > + remote_str > > + ); > > +} > > + > > +void > > +print_switch(ibnd_node_t *node, void *user_data) > > +{ > > + int i = 0; > > + > > + for (i = 1; i <= node->info.numports; i++) { > > + ibnd_port_t *port = node->ports[i]; > > + if (!port) > > + continue; > > + if (port->info.link_state == IBND_LINK_DOWN) { > > + print_port(node, port); > > + } > > + } > > +} > > + > > +void > > +usage(void) > > +{ > > + fprintf(stderr, > > + "Usage: %s [-hclp -S -D -C -P ]\n" > > + " Report link speed and connection for each port of each switch which is active\n" > > + " -h This help message\n" > > + " -i Number of iterations to run (default -1 == infinate)\n" > > + > > + " -S output only the node specified by guid\n" > > + " -D print only node specified by \n" > > + " -f specify node to start \"from\"\n" > > + " -n Number of hops to include away from specified node\n" > > + > > + " -t timeout for any single fabric query\n" > > + " -s show errors\n" > > + > > + " -C use selected Channel Adaptor name for queries\n" > > + " -P use selected channel adaptor port for queries\n" > > + " --debug print debug messages\n" > > + , > > + argv0); > > + exit(-1); > > +} > > + > > +int > > +main(int argc, char **argv) > > +{ > > + char *ca = 0; > > + int ca_port = 0; > > + ibnd_fabric_t *fabric = NULL; > > + uint64_t guid = 0; > > + char *dr_path = NULL; > > + char *from = NULL; > > + int hops = 0; > > + ib_portid_t port_id; > > + int iters = -1; > > + > > + static char const str_opts[] = "S:D:n:C:P:t:shuf:i:"; > > + static const struct option long_opts[] = { > > + { "S", 1, 0, 'S'}, > > + { "D", 1, 0, 'D'}, > > + { "num-hops", 1, 0, 'n'}, > > + { "ca-name", 1, 0, 'C'}, > > + { "ca-port", 1, 0, 'P'}, > > + { "timeout", 1, 0, 't'}, > > + { "show", 0, 0, 's'}, > > + { "help", 0, 0, 'h'}, > > + { "usage", 0, 0, 'u'}, > > + { "debug", 0, 0, 2}, > > + { "from", 1, 0, 'f'}, > > + { "iters", 1, 0, 'i'}, > > + { } > > + }; > > + > > + f = stdout; > > + > > + argv0 = argv[0]; > > + > > + while (1) { > > + int ch = getopt_long(argc, argv, str_opts, long_opts, NULL); > > + if ( ch == -1 ) > > + break; > > + switch(ch) { > > + case 2: > > + ibnd_debug(1); > > + break; > > + case 'f': > > + from = strdup(optarg); > > + break; > > + case 'C': > > + ca = strdup(optarg); > > + break; > > + case 'P': > > + ca_port = strtoul(optarg, 0, 0); > > + break; > > + case 'D': > > + dr_path = strdup(optarg); > > + break; > > + case 'n': > > + hops = (int)strtol(optarg, NULL, 0); > > + break; > > + case 'i': > > + iters = (int)strtol(optarg, NULL, 0); > > + break; > > + case 't': > > + timeout_ms = strtoul(optarg, 0, 0); > > + break; > > + case 'S': > > + guid = (uint64_t)strtoull(optarg, 0, 0); > > + break; > > + default: > > + usage(); > > + break; > > + } > > + } > > + argc -= optind; > > + argv += optind; > > + > > + while (iters == -1 || iters-- > 0) { > > + if (from) { > > + /* only scan part of the fabric */ > > + str2drpath(&(port_id.drpath), from, 0, 0); > > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, &port_id, hops)) == NULL) { > > + fprintf(stderr, "discover failed\n"); > > + exit(1); > > + } > > + guid = 0; > > + } else { > > + if ((fabric = ibnd_discover_fabric(ca, ca_port, timeout_ms, NULL, -1)) == NULL) { > > + fprintf(stderr, "discover failed\n"); > > + exit(1); > > + } > > + } > > + > > +#if 0 > > + if (guid) { > > + ibnd_node_t *sw = ibnd_find_node_guid(fabric, guid); > > + print_switch(sw, NULL); > > + } else if (dr_path) { > > + ibnd_node_t *sw = ibnd_find_node_dr(fabric, dr_path); > > + print_switch(sw, NULL); > > + } else { > > + ibnd_iter_nodes_type(fabric, print_switch, IBND_SWITCH_NODE, NULL); > > + } > > +#endif > > + > > + ibnd_destroy_fabric(fabric); > > + } > > + > > + exit(0); > > +} > > -- > > 1.5.4.5 > > From weiny2 at llnl.gov Tue Dec 23 14:24:51 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 23 Dec 2008 14:24:51 -0800 Subject: [ofa-general] Re: [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: <20081221152708.GO25208@sashak.voltaire.com> References: <20081211162031.0c591f54.weiny2@llnl.gov> <20081221152708.GO25208@sashak.voltaire.com> Message-ID: <20081223142451.39bd25cb.weiny2@llnl.gov> On Sun, 21 Dec 2008 17:27:08 +0200 Sasha Khapyorsky wrote: > On 16:20 Thu 11 Dec , Ira Weiny wrote: > > > > [snip...] > > > diff --git a/infiniband-diags/libibnetdisc/test/iblinkinfotest.c b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c > > new file mode 100644 > > index 0000000..6e63f4a > > --- /dev/null > > +++ b/infiniband-diags/libibnetdisc/test/iblinkinfotest.c > > @@ -0,0 +1,395 @@ > > +/* > > + * Copyright (c) 2004-2007 Voltaire Inc. All rights reserved. > > + * Copyright (c) 2007 Xsigo Systems Inc. All rights reserved. > > + * Copyright (c) 2008 Lawrence Livermore National Lab. All rights reserved. > > + * > > + * This software is available to you under a choice of one of two > > + * licenses. You may choose to be licensed under the terms of the GNU > > + * General Public License (GPL) Version 2, available from the file > > + * COPYING in the main directory of this source tree, or the > > + * OpenIB.org BSD license below: > > + * > > + * Redistribution and use in source and binary forms, with or > > + * without modification, are permitted provided that the following > > + * conditions are met: > > + * > > + * - Redistributions of source code must retain the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer. > > + * > > + * - Redistributions in binary form must reproduce the above > > + * copyright notice, this list of conditions and the following > > + * disclaimer in the documentation and/or other materials > > + * provided with the distribution. > > + * > > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > > + * SOFTWARE. > > + * > > + */ > > + > > +#if HAVE_CONFIG_H > > +# include > > +#endif /* HAVE_CONFIG_H */ > > + > > +#define _GNU_SOURCE > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > + > > +char *argv0 = "iblinkinfotest"; > > +static FILE *f; > > + > > +static char *node_name_map_file = NULL; > > +static nn_map_t *node_name_map = NULL; > > + > > +static int timeout_ms = 500; > > + > > +static int debug = 0; > > +#define DEBUG(str, args...) \ > > + if (debug) fprintf(stderr, str, ##args) > > + > > +static int down_links_only = 0; > > +static int line_mode = 0; > > +static int add_sw_settings = 0; > > +static int print_port_guids = 0; > > + > > +static unsigned int > > +get_max(unsigned int num) > > +{ > > + unsigned int v = num; // 32-bit word to find the log base 2 of > > + unsigned r = 0; // r will be lg(v) > > + > > + while (v >>= 1) // unroll for more speed... > > + { > > + r++; > > + } > > + > > + return (1 << r); > > +} > > + > > +void > > +get_msg(char *width_msg, char *speed_msg, int msg_size, ibnd_port_t *port) > > +{ > > + int max_speed = 0; > > + > > + int max_width = get_max(port->info.link_width_supported > > + & port->remoteport->info.link_width_supported); > > + if ((max_width & port->info.link_width_active) == 0) { > > + // we are not at the max supported width > > + // print what we could be at. > > + snprintf(width_msg, msg_size, "Could be %s", > > + ibnd_linkwidth_str(max_width)); > > + } > > + > > + max_speed = get_max(port->info.link_speed_supported > > + & port->remoteport->info.link_speed_supported); > > + if ((max_speed & port->info.link_speed_active) == 0) { > > + // we are not at the max supported speed > > + // print what we could be at. > > + snprintf(speed_msg, msg_size, "Could be %s", > > + ibnd_linkspeed_str(max_speed, 1)); > > + } > > +} > > + > > +void > > +print_port(ibnd_node_t *node, ibnd_port_t *port) > > +{ > > + char remote_guid_str[256]; > > + char remote_str[256]; > > + char link_str[256]; > > + char width_msg[256]; > > + char speed_msg[256]; > > + char ext_port_str[256]; > > + > > + if (!port) > > + return; > > + > > + remote_guid_str[0] = '\0'; > > + remote_str[0] = '\0'; > > + link_str[0] = '\0'; > > + width_msg[0] = '\0'; > > + speed_msg[0] = '\0'; > > + > > + if (port->remoteport) { > > + char remote_name_buf[256]; > > + strncpy(remote_name_buf, port->remoteport->node->nodedesc, 256); > > + > > + if (port->remoteport->ext_portnum) > > + snprintf(ext_port_str, 256, "%d", port->remoteport->ext_portnum); > > + else > > + ext_port_str[0] = '\0'; > > + > > + get_msg(width_msg, speed_msg, 256, port); > > + if (line_mode) { > > + if (print_port_guids) { > > + snprintf(remote_guid_str, 256, > > + "0x%016lx ", > > + port->remoteport->guid); > > Here and below, printing uint64_t as %lx generates warning on 32-bit > machine. I would suggest to use portable string macros - PRIx64. > Fixed, Ira From weiny2 at llnl.gov Tue Dec 23 14:56:46 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 23 Dec 2008 14:56:46 -0800 Subject: [ofa-general] Re: [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: References: <20081211162031.0c591f54.weiny2@llnl.gov> <20081221152708.GO25208@sashak.voltaire.com> <000001c964c6$7d8462a0$c4e0180a@amr.corp.intel.com> Message-ID: <20081223145646.7419180b.weiny2@llnl.gov> On Tue, 23 Dec 2008 07:32:36 -0800 Roland Dreier wrote: > > This function ends up using a lot of stack space. > > This is userspace... are a few KB on the stack really an issue? Also that is in the test code (libibnetdisc/test/iblinkinfotest.c). The real iblinkinfo has them all declared static. void print_port(ibnd_node_t *node, ibnd_port_t *port) { static char remote_guid_str[256]; static char remote_str[256]; static char link_str[256]; static char width_msg[256]; static char speed_msg[256]; static char ext_port_str[256]; static char loc_sma_lid[16]; ... Sorry about the confusion, Ira From rdreier at cisco.com Tue Dec 23 15:43:46 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 23 Dec 2008 15:43:46 -0800 Subject: [ofa-general] Re: [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: <20081223145646.7419180b.weiny2@llnl.gov> (Ira Weiny's message of "Tue, 23 Dec 2008 14:56:46 -0800") References: <20081211162031.0c591f54.weiny2@llnl.gov> <20081221152708.GO25208@sashak.voltaire.com> <000001c964c6$7d8462a0$c4e0180a@amr.corp.intel.com> <20081223145646.7419180b.weiny2@llnl.gov> Message-ID: > Also that is in the test code (libibnetdisc/test/iblinkinfotest.c). The real > iblinkinfo has them all declared static. > > void > print_port(ibnd_node_t *node, ibnd_port_t *port) > { > static char remote_guid_str[256]; > static char remote_str[256]; so this function is not thread-safe.... This seems to be setting an unnecessary booby-trap; is there a reason not to put these variables on the stack? From weiny2 at llnl.gov Tue Dec 23 15:54:11 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 23 Dec 2008 15:54:11 -0800 Subject: [ofa-general] Re: [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: References: <20081211162031.0c591f54.weiny2@llnl.gov> <20081221152708.GO25208@sashak.voltaire.com> <000001c964c6$7d8462a0$c4e0180a@amr.corp.intel.com> <20081223145646.7419180b.weiny2@llnl.gov> Message-ID: <20081223155411.0d82b4af.weiny2@llnl.gov> On Tue, 23 Dec 2008 15:43:46 -0800 Roland Dreier wrote: > > Also that is in the test code (libibnetdisc/test/iblinkinfotest.c). The real > > iblinkinfo has them all declared static. > > > > void > > print_port(ibnd_node_t *node, ibnd_port_t *port) > > { > > static char remote_guid_str[256]; > > static char remote_str[256]; > > so this function is not thread-safe.... This seems to be setting an > unnecessary booby-trap; is there a reason not to put these variables on > the stack? Just what Sean mentioned. There are actually 2 print_port functions in the 3 patches which were sent. One is in a prototype tool called iblinkinfotest which was left in the library directory and optionally built for testing. The other is in a single threaded tool which uses the library maintained in the official src dir of infiniband_diags. _Neither_ function _is_ part of the library and both are in single threaded tools. Ira From weiny2 at llnl.gov Tue Dec 23 16:41:41 2008 From: weiny2 at llnl.gov (Ira Weiny) Date: Tue, 23 Dec 2008 16:41:41 -0800 Subject: ***SPAM*** Re: [ofa-general] [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: <20081223184331.GL31213@obsidianresearch.com> References: <20081211162031.0c591f54.weiny2@llnl.gov> <1230056943.23747.21.camel@auk31.llnl.gov> <20081223184331.GL31213@obsidianresearch.com> Message-ID: <20081223164141.241dd3f0.weiny2@llnl.gov> On Tue, 23 Dec 2008 11:43:31 -0700 Jason Gunthorpe wrote: > On Tue, Dec 23, 2008 at 10:29:02AM -0800, Al Chu wrote: > > > > +#define IBND_DEBUG(str, args...) \ > > > + if (ibdebug) printf("%s:%d; "str, __FILE__, __LINE__, ##args) > > > +#define IBND_ERROR(str, args...) \ > > > + fprintf(stderr, "%s:%d; "str, __FILE__, __LINE__, ##args) > > > > I believe the "args ..." and "##args" are only for gcc. Not sure how > > much this portability issue matters for OFED. Personally, I always > > do > > Right that format is an obsolete gcc extension. Ira, it should be > > #define debug(format, ...) fprintf (stderr, format, __VA_ARGS__) > > Which is how C99 standardized varadic macros. > Ok, I think this C99 compliant... I had to break the call up into 2 printf's. 16:30:20 > git diff diff --git a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h b/infiniband-diags/libibnetdisc/in index 737ffac..773c64b 100644 --- a/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h +++ b/infiniband-diags/libibnetdisc/include/infiniband/ibnetdisc.h @@ -43,10 +43,16 @@ #define HASHGUID(guid) ((uint32_t)(((uint32_t)(guid) * 101) ^ ((uint32_t)((guid) >> 32) * 103))) #define HTSZ 137 -#define IBND_DEBUG(str, args...) \ - if (ibdebug) printf("%s:%d; "str, __FILE__, __LINE__, ##args) -#define IBND_ERROR(str, args...) \ - fprintf(stderr, "%s:%d; "str, __FILE__, __LINE__, ##args) +#define IBND_DEBUG(...) \ + if (ibdebug) { \ + printf("%s:%d; ", __FILE__, __LINE__); \ + printf(__VA_ARGS__); \ + } +#define IBND_ERROR(...) \ + { \ + fprintf(stderr, "%s:%d; ", __FILE__, __LINE__); \ + fprintf(stderr, __VA_ARGS__); \ + } /** ========================================================================= * ENUM definitions Otherwise calls to the macro with only 1 parameter fail to compile. It seems that GCC has a couple of extensions [*] but the above should be C99 compliant without GCC extensions. Does that seem right? Ira [*] Ref: http://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html You can have named arguments as well as variable arguments in a variadic macro. We could define eprintf like this, instead: #define eprintf(format, ...) fprintf (stderr, format, __VA_ARGS__) This formulation looks more descriptive, but unfortunately it is less flexible: you must now supply at least one argument after the format string. In standard C, you cannot omit the comma separating the named argument from the variable arguments. Furthermore, if you leave the variable argument empty, you will get a syntax error, because there will be an extra comma after the format string. eprintf("success!\n", ); ==> fprintf(stderr, "success!\n", ); GNU CPP has a pair of extensions which deal with this problem. First, you are allowed to leave the variable argument out entirely: eprintf ("success!\n") ==> fprintf(stderr, "success!\n", ); Second, the `##' token paste operator has a special meaning when placed between a comma and a variable argument. If you write #define eprintf(format, ...) fprintf (stderr, format, ##__VA_ARGS__) From sean.hefty at intel.com Tue Dec 23 22:20:20 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 23 Dec 2008 22:20:20 -0800 Subject: [ofa-general] PATCH[1/6] Windows port of libibmad - mad.h In-Reply-To: <20081223185214.GM31213@obsidianresearch.com> References: <20081221205124.GE28259@sashak.voltaire.com> <000101c964c9$ebcaa460$c4e0180a@amr.corp.intel.com> <20081223185214.GM31213@obsidianresearch.com> Message-ID: <000001c9658f$b39e8950$47248686@amr.corp.intel.com> >> >Windows don't like "inline"? >> >> The compiler doesn't allow it in the header file. > >Not even static inline? Inline functions in header should all be >static inline or extern inline to avoid comdefs.. I'm wrong. I thought this was the problem, but it's more likely that the issue was that the _set_field type calls weren't exported. So trying to make mad_set_field inline failed, since all it did was call _set_field. - Sean From vlad at lists.openfabrics.org Wed Dec 24 03:17:41 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 24 Dec 2008 03:17:41 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081224-0200 daily build status Message-ID: <20081224111742.15BDCE60082@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From alekseys at voltaire.com Wed Dec 24 05:36:22 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Wed, 24 Dec 2008 15:36:22 +0200 Subject: [ofa-general] [RDMA CM IPv6 PATCHv7 1/2] IB addr patch Message-ID: <1230125782.32528.5.camel@alst60> >From e44639d524ebe8b15457e3e83abfb41d5a649651 Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Wed, 24 Dec 2008 15:21:26 +0200 Subject: [PATCH] IB addr IPv6 support Using sockaddr_storage instead of sockaddr structure in addr_req structure in order to support AF_INET6 address family. Support for network discovery in addr_send_arp function. Local IPv6 address resolution. Added remote IPv6 address resolusion for RDMA CM. Function addr_resolve_remote used as wrapper for two other functions: addr4_resolve_remote ( original addr_resolve_remote ) addr6_resolve_remote ( new function ) Signed-off-by: Aleksey Senin --- drivers/infiniband/core/addr.c | 192 ++++++++++++++++++++++++++++++--------- 1 files changed, 147 insertions(+), 45 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 09a2bec..8ac21ba 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -41,6 +41,8 @@ #include #include #include +#include +#include #include MODULE_AUTHOR("Sean Hefty"); @@ -49,8 +51,8 @@ MODULE_LICENSE("Dual BSD/GPL"); struct addr_req { struct list_head list; - struct sockaddr src_addr; - struct sockaddr dst_addr; + struct sockaddr_storage src_addr; + struct sockaddr_storage dst_addr; struct rdma_dev_addr *addr; struct rdma_addr_client *client; void *context; @@ -113,15 +115,30 @@ EXPORT_SYMBOL(rdma_copy_addr); int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr) { struct net_device *dev; - __be32 ip = ((struct sockaddr_in *) addr)->sin_addr.s_addr; - int ret; + int ret = -EADDRNOTAVAIL; - dev = ip_dev_find(&init_net, ip); - if (!dev) - return -EADDRNOTAVAIL; + switch (addr->sa_family) { + case AF_INET: + dev = ip_dev_find(&init_net, + ((struct sockaddr_in *) addr)->sin_addr.s_addr); + + if (!dev) + return ret; - ret = rdma_copy_addr(dev_addr, dev, NULL); - dev_put(dev); + ret = rdma_copy_addr(dev_addr, dev, NULL); + dev_put(dev); + break; + case AF_INET6: + for_each_netdev(&init_net, dev) { + if (ipv6_chk_addr(&init_net, &((struct sockaddr_in6 *) addr)->sin6_addr, dev, 1)) { + ret = rdma_copy_addr(dev_addr, dev, NULL); + break; + } + } + break; + default: + break; + } return ret; } EXPORT_SYMBOL(rdma_translate_ip); @@ -156,22 +173,37 @@ static void queue_req(struct addr_req *req) mutex_unlock(&lock); } -static void addr_send_arp(struct sockaddr_in *dst_in) +static void addr_send_arp(struct sockaddr *dst_in) { struct rtable *rt; struct flowi fl; - __be32 dst_ip = dst_in->sin_addr.s_addr; + struct dst_entry *dst; memset(&fl, 0, sizeof fl); - fl.nl_u.ip4_u.daddr = dst_ip; - if (ip_route_output_key(&init_net, &rt, &fl)) - return; + if (dst_in->sa_family == AF_INET) { + fl.nl_u.ip4_u.daddr = + ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; - neigh_event_send(rt->u.dst.neighbour, NULL); - ip_rt_put(rt); + if (ip_route_output_key(&init_net, &rt, &fl)) + return; + + neigh_event_send(rt->u.dst.neighbour, NULL); + ip_rt_put(rt); + + } else { + fl.nl_u.ip6_u.daddr = + ((struct sockaddr_in6 *)dst_in)->sin6_addr; + + dst = ip6_route_output(&init_net, NULL, &fl); + if (!dst) + return; + + neigh_event_send(dst->neighbour, NULL); + dst_release(dst); + } } -static int addr_resolve_remote(struct sockaddr_in *src_in, +static int addr4_resolve_remote(struct sockaddr_in *src_in, struct sockaddr_in *dst_in, struct rdma_dev_addr *addr) { @@ -220,10 +252,51 @@ out: return ret; } +static int addr6_resolve_remote(struct sockaddr_in6 *src_in, + struct sockaddr_in6 *dst_in, + struct rdma_dev_addr *addr) +{ + struct flowi fl; + struct neighbour *neigh; + struct dst_entry *dst; + int ret = -ENODATA; + + memset(&fl, 0, sizeof fl); + fl.nl_u.ip6_u.daddr = dst_in->sin6_addr; + fl.nl_u.ip6_u.saddr = src_in->sin6_addr; + + dst = ip6_route_output(&init_net, NULL, &fl); + if (!dst) + return ret; + + if (dst->dev->flags & IFF_NOARP) { + ret = rdma_copy_addr(addr, dst->dev, NULL); + } else { + neigh = dst->neighbour; + if (neigh && (neigh->nud_state & NUD_VALID)) + ret = rdma_copy_addr(addr, neigh->dev, neigh->ha); + } + + dst_release(dst); + return ret; +} + +static int addr_resolve_remote(struct sockaddr *src_in, + struct sockaddr *dst_in, + struct rdma_dev_addr *addr) +{ + if (src_in->sa_family == AF_INET) { + return addr4_resolve_remote((struct sockaddr_in *)src_in, + (struct sockaddr_in *)dst_in, addr); + } else + return addr6_resolve_remote((struct sockaddr_in6 *)src_in, + (struct sockaddr_in6 *)dst_in, addr); +} + static void process_req(struct work_struct *work) { struct addr_req *req, *temp_req; - struct sockaddr_in *src_in, *dst_in; + struct sockaddr *src_in, *dst_in; struct list_head done_list; INIT_LIST_HEAD(&done_list); @@ -231,8 +304,8 @@ static void process_req(struct work_struct *work) mutex_lock(&lock); list_for_each_entry_safe(req, temp_req, &req_list, list) { if (req->status == -ENODATA) { - src_in = (struct sockaddr_in *) &req->src_addr; - dst_in = (struct sockaddr_in *) &req->dst_addr; + src_in = (struct sockaddr *) &req->src_addr; + dst_in = (struct sockaddr *) &req->dst_addr; req->status = addr_resolve_remote(src_in, dst_in, req->addr); if (req->status && time_after_eq(jiffies, req->timeout)) @@ -251,41 +324,70 @@ static void process_req(struct work_struct *work) list_for_each_entry_safe(req, temp_req, &done_list, list) { list_del(&req->list); - req->callback(req->status, &req->src_addr, req->addr, - req->context); + req->callback(req->status, (struct sockaddr *) &req->src_addr, + req->addr, req->context); put_client(req->client); kfree(req); } } -static int addr_resolve_local(struct sockaddr_in *src_in, - struct sockaddr_in *dst_in, +static int addr_resolve_local(struct sockaddr *src_in, + struct sockaddr *dst_in, struct rdma_dev_addr *addr) { struct net_device *dev; - __be32 src_ip = src_in->sin_addr.s_addr; - __be32 dst_ip = dst_in->sin_addr.s_addr; int ret; - dev = ip_dev_find(&init_net, dst_ip); - if (!dev) - return -EADDRNOTAVAIL; - - if (ipv4_is_zeronet(src_ip)) { - src_in->sin_family = dst_in->sin_family; - src_in->sin_addr.s_addr = dst_ip; - ret = rdma_copy_addr(addr, dev, dev->dev_addr); - } else if (ipv4_is_loopback(src_ip)) { - ret = rdma_translate_ip((struct sockaddr *)dst_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + if (dst_in->sa_family == AF_INET) { + __be32 src_ip = ((struct sockaddr_in *)src_in)->sin_addr.s_addr; + __be32 dst_ip = ((struct sockaddr_in *)dst_in)->sin_addr.s_addr; + + dev = ip_dev_find(&init_net, dst_ip); + if (!dev) + return -EADDRNOTAVAIL; + + if (ipv4_is_zeronet(src_ip)) { + src_in->sa_family = dst_in->sa_family; + ((struct sockaddr_in *)src_in)->sin_addr.s_addr = dst_ip; + ret = rdma_copy_addr(addr, dev, dev->dev_addr); + } else if (ipv4_is_loopback(src_ip)) { + ret = rdma_translate_ip(dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } else { + ret = rdma_translate_ip(src_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } + dev_put(dev); } else { - ret = rdma_translate_ip((struct sockaddr *)src_in, addr); - if (!ret) - memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + struct in6_addr *a; + + for_each_netdev(&init_net, dev) + if (ipv6_chk_addr(&init_net, &((struct sockaddr_in6 *) addr)->sin6_addr, dev, 1)) + break; + + if (!dev) + return -EADDRNOTAVAIL; + + a = &((struct sockaddr_in6 *)src_in)->sin6_addr; + + if (ipv6_addr_any(a)) { + src_in->sa_family = dst_in->sa_family; + ((struct sockaddr_in6 *)src_in)->sin6_addr = + ((struct sockaddr_in6 *)dst_in)->sin6_addr; + ret = rdma_copy_addr(addr, dev, dev->dev_addr); + } else if (ipv6_addr_loopback(a)) { + ret = rdma_translate_ip(dst_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } else { + ret = rdma_translate_ip(src_in, addr); + if (!ret) + memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); + } } - dev_put(dev); return ret; } @@ -296,7 +398,7 @@ int rdma_resolve_ip(struct rdma_addr_client *client, struct rdma_dev_addr *addr, void *context), void *context) { - struct sockaddr_in *src_in, *dst_in; + struct sockaddr *src_in, *dst_in; struct addr_req *req; int ret = 0; @@ -313,8 +415,8 @@ int rdma_resolve_ip(struct rdma_addr_client *client, req->client = client; atomic_inc(&client->refcount); - src_in = (struct sockaddr_in *) &req->src_addr; - dst_in = (struct sockaddr_in *) &req->dst_addr; + src_in = (struct sockaddr *) &req->src_addr; + dst_in = (struct sockaddr *) &req->dst_addr; req->status = addr_resolve_local(src_in, dst_in, addr); if (req->status == -EADDRNOTAVAIL) -- 1.6.0.2 From alekseys at voltaire.com Wed Dec 24 05:48:12 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Wed, 24 Dec 2008 15:48:12 +0200 Subject: [ofa-general] [RDMA CM IPv6 PATCHv7 2/2] RDMA CM In-Reply-To: <1230125782.32528.5.camel@alst60> References: <1230125782.32528.5.camel@alst60> Message-ID: <1230126492.32528.7.camel@alst60> >From 774a822be601135b438efe8a6c630b433db517c9 Mon Sep 17 00:00:00 2001 From: Aleksey Senin Date: Wed, 24 Dec 2008 15:27:36 +0200 Subject: [PATCH] RDMA CM IPv6 support AF_INET6 case in rdma_bind_addr added AF_INET6 support in cma_format_hdr function AF_INET6 support when checking loopback address Use sockaddr_storage structure in cma_bind_any function Signed-off-by: Aleksey Senin --- drivers/infiniband/core/cma.c | 86 ++++++++++++++++++++++++++++------------ 1 files changed, 60 insertions(+), 26 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index d951896..2a2e508 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -42,6 +42,7 @@ #include #include +#include #include #include @@ -636,7 +637,12 @@ static inline int cma_zero_addr(struct sockaddr *addr) static inline int cma_loopback_addr(struct sockaddr *addr) { - return ipv4_is_loopback(((struct sockaddr_in *) addr)->sin_addr.s_addr); + if (addr->sa_family == AF_INET) + return ipv4_is_loopback( + ((struct sockaddr_in *) addr)->sin_addr.s_addr); + else + return ipv6_addr_loopback( + &((struct sockaddr_in6 *) addr)->sin6_addr); } static inline int cma_any_addr(struct sockaddr *addr) @@ -1467,10 +1473,10 @@ static void cma_listen_on_all(struct rdma_id_private *id_priv) static int cma_bind_any(struct rdma_cm_id *id, sa_family_t af) { - struct sockaddr_in addr_in; + struct sockaddr_storage addr_in; memset(&addr_in, 0, sizeof addr_in); - addr_in.sin_family = af; + addr_in.ss_family = af; return rdma_bind_addr(id, (struct sockaddr *) &addr_in); } @@ -2073,7 +2079,7 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) struct rdma_id_private *id_priv; int ret; - if (addr->sa_family != AF_INET) + if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6) return -EAFNOSUPPORT; id_priv = container_of(id, struct rdma_id_private, id); @@ -2113,31 +2119,59 @@ EXPORT_SYMBOL(rdma_bind_addr); static int cma_format_hdr(void *hdr, enum rdma_port_space ps, struct rdma_route *route) { - struct sockaddr_in *src4, *dst4; struct cma_hdr *cma_hdr; struct sdp_hh *sdp_hdr; - src4 = (struct sockaddr_in *) &route->addr.src_addr; - dst4 = (struct sockaddr_in *) &route->addr.dst_addr; - - switch (ps) { - case RDMA_PS_SDP: - sdp_hdr = hdr; - if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) - return -EINVAL; - sdp_set_ip_ver(sdp_hdr, 4); - sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; - sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; - sdp_hdr->port = src4->sin_port; - break; - default: - cma_hdr = hdr; - cma_hdr->cma_version = CMA_VERSION; - cma_set_ip_ver(cma_hdr, 4); - cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; - cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; - cma_hdr->port = src4->sin_port; - break; + if (route->addr.src_addr.ss_family == AF_INET) { + struct sockaddr_in *src4, *dst4; + + src4 = (struct sockaddr_in *) &route->addr.src_addr; + dst4 = (struct sockaddr_in *) &route->addr.dst_addr; + + switch (ps) { + case RDMA_PS_SDP: + sdp_hdr = hdr; + if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) + return -EINVAL; + sdp_set_ip_ver(sdp_hdr, 4); + sdp_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + sdp_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + sdp_hdr->port = src4->sin_port; + break; + default: + cma_hdr = hdr; + cma_hdr->cma_version = CMA_VERSION; + cma_set_ip_ver(cma_hdr, 4); + cma_hdr->src_addr.ip4.addr = src4->sin_addr.s_addr; + cma_hdr->dst_addr.ip4.addr = dst4->sin_addr.s_addr; + cma_hdr->port = src4->sin_port; + break; + } + } else { + struct sockaddr_in6 *src6, *dst6; + + src6 = (struct sockaddr_in6 *) &route->addr.src_addr; + dst6 = (struct sockaddr_in6 *) &route->addr.dst_addr; + + switch (ps) { + case RDMA_PS_SDP: + sdp_hdr = hdr; + if (sdp_get_majv(sdp_hdr->sdp_version) != SDP_MAJ_VERSION) + return -EINVAL; + sdp_set_ip_ver(sdp_hdr, 6); + sdp_hdr->src_addr.ip6 = src6->sin6_addr; + sdp_hdr->dst_addr.ip6 = dst6->sin6_addr; + sdp_hdr->port = src6->sin6_port; + break; + default: + cma_hdr = hdr; + cma_hdr->cma_version = CMA_VERSION; + cma_set_ip_ver(cma_hdr, 6); + cma_hdr->src_addr.ip6 = src6->sin6_addr; + cma_hdr->dst_addr.ip6 = dst6->sin6_addr; + cma_hdr->port = src6->sin6_port; + break; + } } return 0; } -- 1.6.0.2 From todd.rimmer at qlogic.com Wed Dec 24 06:24:16 2008 From: todd.rimmer at qlogic.com (Todd Rimmer) Date: Wed, 24 Dec 2008 08:24:16 -0600 Subject: [ofa-general] [ipoib]patch for ipoib failure during startup with non-default pkey set. In-Reply-To: References: <4C2744E8AD2982428C5BFE523DF8CDCB3E746246B9@MNEXMB1.qlogic.org> <4C2744E8AD2982428C5BFE523DF8CDCB3E746246C0@MNEXMB1.qlogic.org> Message-ID: <5AEC2602AE03EB46BFC16C6B9B200DA813477FC77F@MNEXMB2.qlogic.org> > From: Roland Dreier > Sent: Friday, December 19, 2008 6:51 PM > To: Alex Estrin > Cc: general at lists.openfabrics.org > Subject: Re: [ofa-general] [ipoib]patch for ipoib failure during startup > with non-default pkey set. > > Can you provide some detail about what this patch is doing? What > exactly is the bug, and how does this fix it? (Please always provide > that with all patches -- it's too hard to reverse engineer every patch I > see before I apply it) Alex is on Holiday til Jan 5th, however I can describe the problem which he fixed. When IPoIB comes up it uses the 1st PKey in the PKey table. This works great if ipoib is modprobe'd after the IB port is Active. However in the typical case, ipoib is started at boot and get's modprobe'd before the SM has programmed the port. In this case, IPoIB grabs the 1st pkey (0xffff) and continues to use it, even if the SM has programmed the port differently. A similar problem occurs if the SM is reconfigured and the 1st pkey changes. His patch will adjust the pkey used by IPoIB (which is also part of the IPoIB IPv4 broadcast address) when the PKey table changes. Todd Rimmer Chief Architect QLogic Network Systems Group Voice: 610-233-4852 Fax: 610-233-4777 Todd.Rimmer at QLogic.com www.QLogic.com From rdreier at cisco.com Wed Dec 24 10:11:56 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 Dec 2008 10:11:56 -0800 Subject: [ofa-general] Re: [RDMA CM IPv6 PATCHv7 1/2] IB addr patch In-Reply-To: <1230125782.32528.5.camel@alst60> (Aleksey Senin's message of "Wed, 24 Dec 2008 15:36:22 +0200") References: <1230125782.32528.5.camel@alst60> Message-ID: thanks, applied. From rdreier at cisco.com Wed Dec 24 10:20:53 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 Dec 2008 10:20:53 -0800 Subject: [ofa-general] [RDMA CM IPv6 PATCHv7 2/2] RDMA CM In-Reply-To: <1230126492.32528.7.camel@alst60> (Aleksey Senin's message of "Wed, 24 Dec 2008 15:48:12 +0200") References: <1230125782.32528.5.camel@alst60> <1230126492.32528.7.camel@alst60> Message-ID: Thanks, applied as well. By the way (this goes for both patches, and I'm not sure how many times I've written this to various people, but I'll write it one more time to you), please don't put extraneous junk like: > >From 774a822be601135b438efe8a6c630b433db517c9 Mon Sep 17 00:00:00 2001 > Date: Wed, 24 Dec 2008 15:27:36 +0200 > Subject: [PATCH] RDMA CM IPv6 support in the body of your email -- I just have to edit it out by hand to avoid git importing into the kernel log forever. The "From:" line is fine, since git will use that as the author of the patch, except in your case, you had > From: Aleksey Senin and I think it's better for the author of the patch to have a more reasonable email address, so I had to delete that by hand too. A better subject line for the email would be good too. Just having > Subject: Re: [ofa-general] [RDMA CM IPv6 PATCHv7 2/2] RDMA CM ends up putting "RDMA CM" in the kernel log after everything is stripped, which is not really useful to someone looking at the shortlog and trying to see what changed. So I had to edit that by hand too. Finally (and this may be too hard for some people who are not totally comfortable with writing English), a better changelog would be good. If you just write: > AF_INET6 case in rdma_bind_addr added > AF_INET6 support in cma_format_hdr function > AF_INET6 support when checking loopback address > Use sockaddr_storage structure in cma_bind_any function then all that shows is the details of what your patch changes, which I can see just as easily by reading the patch. A changelog should give additional commentary, like what the high-level goal of your change is, why we want to make that change, and any other information that would be useful for someone trying to understand the change (possibly years in the future). - R. From rdreier at cisco.com Wed Dec 24 20:30:32 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 Dec 2008 20:30:32 -0800 Subject: [ofa-general] [PATCH 03/10 v2] RDMA/nes: Remove tx_free_list In-Reply-To: <20081212204613.GA6760@ctung-MOBL> (Chien Tung's message of "Fri, 12 Dec 2008 14:46:13 -0600") References: <20081212204613.GA6760@ctung-MOBL> Message-ID: thanks, applied. From rdreier at cisco.com Wed Dec 24 20:32:53 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 Dec 2008 20:32:53 -0800 Subject: [ofa-general] Re: [PATCH] mlx4: Adjust ownership bit properly in resize_cq when copying over CQEs In-Reply-To: <200812141814.17091.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Sun, 14 Dec 2008 18:14:16 +0200") References: <200812141814.17091.jackm@dev.mellanox.co.il> Message-ID: Thanks, applied. From rdreier at cisco.com Wed Dec 24 20:34:14 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 24 Dec 2008 20:34:14 -0800 Subject: [ofa-general] Re: [PATCH] libmlx4: Adjust ownership bit properly in resize_cq when copying over CQEs In-Reply-To: <200812141814.21076.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Sun, 14 Dec 2008 18:14:20 +0200") References: <200812141814.21076.jackm@dev.mellanox.co.il> Message-ID: thanks, applied. From vlad at lists.openfabrics.org Thu Dec 25 03:11:11 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Thu, 25 Dec 2008 03:11:11 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081225-0200 daily build status Message-ID: <20081225111111.EB67CE60D83@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From rdreier at cisco.com Thu Dec 25 07:21:04 2008 From: rdreier at cisco.com (Roland Dreier) Date: Thu, 25 Dec 2008 07:21:04 -0800 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get the first tranche of InfiniBand/RDMA changes for 2.6.29: Aleksey Senin (2): RDMA/addr: Add support for translating IPv6 addresses RDMA/cma: Add IPv6 support Chien Tung (2): RDMA/nes: Add loopback check to make_cm_node() RDMA/nes: Cleanup warnings Dave Olson (4): IB/ipath: Don't count IB symbol and link errors unless link is UP IB/ipath: Only do 1X workaround on rev1 chips IB/ipath: Fix spi_pioindex value IB/ipath: Add locking for interrupt use of ipath_pd contexts vs free David Disseldorp (1): IB/iser: Avoid recv buffer exhaustion caused by unexpected PDUs Faisal Latif (7): RDMA/nes: Cleanup cqp_request list usage RDMA/nes: Lock down connected_nodes list while processing it RDMA/nes: Avoid race between MPA request and reset event to rdma_cm RDMA/nes: Forward packets for a new connection with stale APBVT entry RDMA/nes: Fix TCP compliance test failures RDMA/nes: Check cqp_avail_reqs is empty after locking the list RDMA/nes: Remove tx_free_list Jack Morgenstein (1): IB/mlx4: Set ownership bit correctly when copying CQEs during CQ resize Joachim Fenkes (1): IB/ehca: Fix locking for shca_list_lock Julia Lawall (1): IB/ehca: Remove redundant test of vpage Michael Ellerman (1): IB/ipath: Fix pointer-to-pointer thinko in ipath_fs.c Ralph Campbell (3): IB/ipath: Improve UD loopback performance by allocating temp array only once IB/ipath: Fix PSN of send WQEs after an RDMA read resend IB/ipath: Check return value of dma_map_single() Roland Dreier (2): mlx4_core: Delete incorrect comment Merge branches 'cma', 'ehca', 'ipath', 'iser', 'mlx4' and 'nes' into for-next Stefan Roscher (1): IB/ehca: Replace modulus operations in flush error completion path Yevgeny Petrilin (1): mlx4_core: Add support for multiple completion event vectors drivers/infiniband/core/addr.c | 196 +++++++++++++---- drivers/infiniband/core/cma.c | 86 ++++++--- drivers/infiniband/hw/ehca/ehca_classes.h | 7 + drivers/infiniband/hw/ehca/ehca_eq.c | 2 +- drivers/infiniband/hw/ehca/ehca_main.c | 17 +- drivers/infiniband/hw/ehca/ehca_qp.c | 12 +- drivers/infiniband/hw/ehca/ehca_reqs.c | 13 +- drivers/infiniband/hw/ipath/ipath_driver.c | 49 +++-- drivers/infiniband/hw/ipath/ipath_file_ops.c | 30 ++-- drivers/infiniband/hw/ipath/ipath_fs.c | 2 +- drivers/infiniband/hw/ipath/ipath_iba6120.c | 61 ++++++ drivers/infiniband/hw/ipath/ipath_iba7220.c | 83 +++++++- drivers/infiniband/hw/ipath/ipath_init_chip.c | 1 + drivers/infiniband/hw/ipath/ipath_kernel.h | 15 ++ drivers/infiniband/hw/ipath/ipath_keys.c | 2 + drivers/infiniband/hw/ipath/ipath_mad.c | 2 + drivers/infiniband/hw/ipath/ipath_qp.c | 32 ++- drivers/infiniband/hw/ipath/ipath_rc.c | 5 +- drivers/infiniband/hw/ipath/ipath_sdma.c | 21 ++- drivers/infiniband/hw/ipath/ipath_stats.c | 8 + drivers/infiniband/hw/ipath/ipath_ud.c | 19 +-- drivers/infiniband/hw/ipath/ipath_verbs.c | 3 +- drivers/infiniband/hw/ipath/ipath_verbs.h | 1 + drivers/infiniband/hw/mlx4/cq.c | 12 +- drivers/infiniband/hw/mlx4/main.c | 2 +- drivers/infiniband/hw/nes/nes.h | 18 +- drivers/infiniband/hw/nes/nes_cm.c | 279 +++++++++++++------------ drivers/infiniband/hw/nes/nes_cm.h | 14 +- drivers/infiniband/hw/nes/nes_hw.c | 42 ++-- drivers/infiniband/hw/nes/nes_utils.c | 9 +- drivers/infiniband/hw/nes/nes_verbs.c | 45 +--- drivers/infiniband/ulp/iser/iscsi_iser.h | 3 + drivers/infiniband/ulp/iser/iser_initiator.c | 132 ++++++++---- drivers/infiniband/ulp/iser/iser_verbs.c | 1 + drivers/net/mlx4/cq.c | 11 +- drivers/net/mlx4/en_cq.c | 9 +- drivers/net/mlx4/en_main.c | 4 +- drivers/net/mlx4/eq.c | 121 ++++++++--- drivers/net/mlx4/main.c | 53 ++++-- drivers/net/mlx4/mlx4.h | 14 +- drivers/net/mlx4/profile.c | 4 +- include/linux/mlx4/device.h | 4 +- 42 files changed, 965 insertions(+), 479 deletions(-) From sashak at voltaire.com Thu Dec 25 07:31:29 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Thu, 25 Dec 2008 17:31:29 +0200 Subject: [ofa-general] [PATCH] opensm/opensm.spec: fix event plugin config options Message-ID: <20081225153129.GE9755@sashak.voltaire.com> Fix default event plugin configure options - should be --enable-default-event-plugin (not just --enable-event-plugin). Signed-off-by: Sasha Khapyorsky --- opensm/opensm.spec.in | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in index 9c23f47..7b82faf 100644 --- a/opensm/opensm.spec.in +++ b/opensm/opensm.spec.in @@ -15,10 +15,10 @@ %endif %if %{?_with_event_plugin:1}%{!?_with_event_plugin:0} -%define _enable_event_plugin --enable-event-plugin +%define _enable_event_plugin --enable-default-event-plugin %endif %if %{?_without_event_plugin:1}%{!?_without_event_plugin:0} -%define _disable_event_plugin --disable-event-plugin +%define _disable_event_plugin --disable-default-event-plugin %endif Summary: InfiniBand subnet manager and administration -- 1.6.0.4.766.g6fc4a From davem at davemloft.net Thu Dec 25 18:16:10 2008 From: davem at davemloft.net (David Miller) Date: Thu, 25 Dec 2008 18:16:10 -0800 (PST) Subject: [ofa-general] Re: [PATCH 1/9] mlx4_en: Memory leak on completion queue free In-Reply-To: <494F6524.1030301@mellanox.co.il> References: <494F6524.1030301@mellanox.co.il> Message-ID: <20081225.181610.141840360.davem@davemloft.net> From: Yevgeny Petrilin Date: Mon, 22 Dec 2008 12:00:04 +0200 > If port is being destroyed without being activated before, > CQ resources are not freed. > > Signed-off-by: Yevgeny Petrilin Applied. From davem at davemloft.net Thu Dec 25 18:16:16 2008 From: davem at davemloft.net (David Miller) Date: Thu, 25 Dec 2008 18:16:16 -0800 (PST) Subject: [ofa-general] Re: [PATCH 2/9] mlx4_en: Removed TX locking when polling TX cq In-Reply-To: <494F6536.6040107@mellanox.co.il> References: <494F6536.6040107@mellanox.co.il> Message-ID: <20081225.181616.195227579.davem@davemloft.net> From: Yevgeny Petrilin Date: Mon, 22 Dec 2008 12:00:22 +0200 > There is no need to synchronize the polling with the transmit > function. The only place to synchronize is when we process > the cq from the transmit function. Also removed spin_lock_irq, > and using spin_trylock, if somebody else is already processing the cq, > no need to wait for it to finish. > > Signed-off-by: Yevgeny Petrilin Applied. From davem at davemloft.net Thu Dec 25 18:16:22 2008 From: davem at davemloft.net (David Miller) Date: Thu, 25 Dec 2008 18:16:22 -0800 (PST) Subject: [ofa-general] Re: [PATCH 3/9] mlx4_en: Removed redundant cq->armed flag In-Reply-To: <494F653F.2010806@mellanox.co.il> References: <494F653F.2010806@mellanox.co.il> Message-ID: <20081225.181621.75505193.davem@davemloft.net> From: Yevgeny Petrilin Date: Mon, 22 Dec 2008 12:00:31 +0200 > Signed-off-by: Yevgeny Petrilin Applied. From davem at davemloft.net Thu Dec 25 18:18:48 2008 From: davem at davemloft.net (David Miller) Date: Thu, 25 Dec 2008 18:18:48 -0800 (PST) Subject: [ofa-general] Re: [PATCH 4/9] mlx4_en: Verify number of RX rings doesn't exceed MAX_RX_RINGS In-Reply-To: <494F6551.3040500@mellanox.co.il> References: <494F6551.3040500@mellanox.co.il> Message-ID: <20081225.181848.211823343.davem@davemloft.net> From: Yevgeny Petrilin Date: Mon, 22 Dec 2008 12:00:49 +0200 > Required in cases were dev->caps.num_comp_vectors > MAX_RX_RINGS. > For current values this would happen on machines that have more > then 16 cores. > > Signed-off-by: Yevgeny Petrilin This patch does not apply to the tree. In fact, there is no reference at all to num_comp_vectors in the mlx4_en driver at all. The code there currently reads: if (!mdev->profile.prof[i].rx_ring_num) { mdev->profile.prof[i].rx_ring_num = 1; mlx4_info(mdev, "Defaulting to %d rx rings for port:%d\n", 1, i); So there is some other patch applied to your copy of the driver already that adds all of that num_comp_vectors stuff. It isn't in the upstream sources, and it certainly isn't in the networking development GIT tree(s), that is for sure. Please, please (did I say please?), please be more careful in the future and don't send patches that do not apply properly to the tree. This wastes a lot of my time and discourages my handling your patches efficiently in the future. From davem at davemloft.net Thu Dec 25 18:19:54 2008 From: davem at davemloft.net (David Miller) Date: Thu, 25 Dec 2008 18:19:54 -0800 (PST) Subject: [ofa-general] Re: [PATCH 5/9] mlx4_en: Removed Interrupt moderation module parameters In-Reply-To: <494F655B.1070405@mellanox.co.il> References: <494F655B.1070405@mellanox.co.il> Message-ID: <20081225.181954.171510066.davem@davemloft.net> From: Yevgeny Petrilin Date: Mon, 22 Dec 2008 12:00:59 +0200 > They are controlled through Ethtool interface, no need to have two > ways to modify them. > > Signed-off-by: Yevgeny Petrilin Applied. From davem at davemloft.net Thu Dec 25 18:20:25 2008 From: davem at davemloft.net (David Miller) Date: Thu, 25 Dec 2008 18:20:25 -0800 (PST) Subject: [ofa-general] Re: [PATCH 6/9] mlx4_en: Remove pauses module parameters In-Reply-To: <494F6569.3080105@mellanox.co.il> References: <494F6569.3080105@mellanox.co.il> Message-ID: <20081225.182025.28632105.davem@davemloft.net> From: Yevgeny Petrilin Date: Mon, 22 Dec 2008 12:01:13 +0200 > mlx4_en: Remove pauses module parameters. > > They are controlled through Ethtool interface. > > Signed-off-by: Yevgeny Petrilin Applied. From davem at davemloft.net Thu Dec 25 18:21:42 2008 From: davem at davemloft.net (David Miller) Date: Thu, 25 Dec 2008 18:21:42 -0800 (PST) Subject: [ofa-general] Re: [PATCH 7/9] mlx4_en: Always allocate RX ring for each completion vector In-Reply-To: <494F6573.2070405@mellanox.co.il> References: <494F6573.2070405@mellanox.co.il> Message-ID: <20081225.182142.18371433.davem@davemloft.net> From: Yevgeny Petrilin Date: Mon, 22 Dec 2008 12:01:23 +0200 > Removed module parameter specifying number of rings. > > Signed-off-by: Yevgeny Petrilin Since patch 4 didn't apply this one won't either. See what a waste of time it is when you submit patches that don't apply? There is a domino effect on the rest of the patches in the series since several if not all of them will fail to apply as well. From davem at davemloft.net Thu Dec 25 18:22:19 2008 From: davem at davemloft.net (David Miller) Date: Thu, 25 Dec 2008 18:22:19 -0800 (PST) Subject: [ofa-general] Re: [PATCH 8/9] mlx4_en: Added "set_ringparam" Ethtool interface implementation In-Reply-To: <494F657C.2080100@mellanox.co.il> References: <494F657C.2080100@mellanox.co.il> Message-ID: <20081225.182219.268310314.davem@davemloft.net> From: Yevgeny Petrilin Date: Mon, 22 Dec 2008 12:01:32 +0200 > Now using Ethtool to determine ring sizes, removed the module parameters > that controlled those values. > Modifying ring size requires restart of the interface. > > Signed-off-by: Yevgeny Petrilin Also doesn't apply because 4 and 7 didn't. From davem at davemloft.net Thu Dec 25 18:23:17 2008 From: davem at davemloft.net (David Miller) Date: Thu, 25 Dec 2008 18:23:17 -0800 (PST) Subject: [ofa-general] Re: [PATCH 9/9] mlx4_en: Multi queue support In-Reply-To: <494F6584.2030304@mellanox.co.il> References: <494F6584.2030304@mellanox.co.il> Message-ID: <20081225.182317.28395960.davem@davemloft.net> From: Yevgeny Petrilin Date: Mon, 22 Dec 2008 12:01:40 +0200 > Added a function that performs hashing on the TX traffic. > The hashing is only done for TCP or UDP packets, all other packets > are sent to a default queue. > We use an indirection table with an entry for each hash result. > For each entry in the table, we hold statistics regarding the stream > that corresponds to that entry. Packets are then directed to a TX queue > according to stream's pattern. > A ring is opened for each queue. > > Signed-off-by: Yevgeny Petrilin Also doesn't apply because patches 4, 7, and 8 did not. From vlad at lists.openfabrics.org Fri Dec 26 03:11:02 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Fri, 26 Dec 2008 03:11:02 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081226-0200 daily build status Message-ID: <20081226111102.D476FE60056@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From bramesh at vt.edu Fri Dec 26 11:40:49 2008 From: bramesh at vt.edu (Bharath Ramesh) Date: Fri, 26 Dec 2008 14:40:49 -0500 Subject: [ofa-general] ibv_post_send returns -1 Message-ID: <49553341.3000702@vt.edu> I using OFED-1.2 for my research project. I am facing an issue where when I try to post IBV_WR_RDMA_WRITE work request to the QP ibv_post_send returns with -1. This I presume means lack of resources to complete the operation. To give a brief summary of my communication mechanism, I have two kind of messages control messages and large messages. I post control messages with the normal IBV_WR_SEND with send flags set to IBV_SEND_INLINE. Every few hundred such control messages I set the send flag to IBV_SEND_INLINE | IBV_SEND_SIGNALED and poll my send CQ. For large messages I use IBV_WR_RDMA_WRITE with send flags set to IBV_SEND_SIGNALED. After posting any large message I poll the send CQ for completion. I have sufficient buffers posted on the receiver. The first thing I did when I encountered this issue is to write a small test which just does only IBV_WR_RDMA_WRITE similar to my application. This test works fine without having any issues. I am wondering what could be potential issues that might be causing such a scenario of unavailable resources. Any help in debugging this situation would be appreciated. I am not subscribed to the list would appreciate if I am copied on the replies. Regards, Bharath From dotanba at gmail.com Fri Dec 26 22:50:24 2008 From: dotanba at gmail.com (Dotan Barak) Date: Sat, 27 Dec 2008 08:50:24 +0200 Subject: [ofa-general] ibv_post_send returns -1 In-Reply-To: <49553341.3000702@vt.edu> References: <49553341.3000702@vt.edu> Message-ID: <4955D030.4060000@gmail.com> Bharath Ramesh wrote: > I using OFED-1.2 for my research project. I am facing an issue where > when I try to post IBV_WR_RDMA_WRITE work request to the QP > ibv_post_send returns with -1. This I presume means lack of resources > to complete the operation. To give a brief summary of my communication > mechanism, I have two kind of messages control messages and large > messages. I post control messages with the normal IBV_WR_SEND with > send flags set to IBV_SEND_INLINE. Every few hundred such control > messages I set the send flag to IBV_SEND_INLINE | IBV_SEND_SIGNALED > and poll my send CQ. For large messages I use IBV_WR_RDMA_WRITE with > send flags set to IBV_SEND_SIGNALED. After posting any large message I > poll the send CQ for completion. I have sufficient buffers posted on > the receiver. > > The first thing I did when I encountered this issue is to write a > small test which just does only IBV_WR_RDMA_WRITE similar to my > application. This test works fine without having any issues. I am > wondering what could be potential issues that might be causing such a > scenario of unavailable resources. Any help in debugging this > situation would be appreciated. I am not subscribed to the list would > appreciate if I am copied on the replies. Hi. ibv_post_send may fail if: 1) too many WR are outstanding on the SQ (maybe you posted too many WR without polling for there completion) 2) bad WR was posted (for example: RDMA Write for UD QP). 3) maybe the size of the WR that you are trying to post as inline is too big (upon QP creation you can see this size in the sq_send_inline) Dotan From vlad at lists.openfabrics.org Sat Dec 27 03:12:09 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sat, 27 Dec 2008 03:12:09 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081227-0200 daily build status Message-ID: <20081227111209.95AABE60397@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From bramesh at vt.edu Sat Dec 27 12:17:38 2008 From: bramesh at vt.edu (Bharath Ramesh) Date: Sat, 27 Dec 2008 15:17:38 -0500 Subject: [ofa-general] ibv_post_send returns -1 In-Reply-To: <4955D030.4060000@gmail.com> References: <49553341.3000702@vt.edu> <4955D030.4060000@gmail.com> Message-ID: <005e01c96860$2d83eee0$888bcca0$@edu> Thanks for the reply Dotan, I am posting what I am doing currently, probably I am missing something which you could point out to me. 1) I don't have completion notifications or completion channel associated with my send CQ. 2) I only have RC QPs. 3) Every few hundred messages (256 messages to be precise) for my small messages (64 bytes long) for which I use IBV_WR_SEND with IBV_SEND_INLINE I set the IBV_SEND_SIGNALED flag. Once I set the flag I poll my send CQ for completion messages. 4) All my RDMA buffers for large messages are all registered, and I use IBV_SEND_SIGNALED flag for the WR and poll the send CQ for completion immediately after ibv_post_send. The only time when I don't poll my send CQ for completion for ibv_post_send is for my small messages. I presume that polling the send CQ once every 256 messages for these should be enough to free up the WR from the SQ for the previously posted WR. Am I wrong in assuming this, should each and every ibv_post_send should be associated with poll to the send CQ? Regards, Bharath -----Original Message----- From: Dotan Barak [mailto:dotanba at gmail.com] Sent: Saturday, December 27, 2008 1:50 AM To: Bharath Ramesh Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] ibv_post_send returns -1 Bharath Ramesh wrote: > I using OFED-1.2 for my research project. I am facing an issue where > when I try to post IBV_WR_RDMA_WRITE work request to the QP > ibv_post_send returns with -1. This I presume means lack of resources > to complete the operation. To give a brief summary of my communication > mechanism, I have two kind of messages control messages and large > messages. I post control messages with the normal IBV_WR_SEND with > send flags set to IBV_SEND_INLINE. Every few hundred such control > messages I set the send flag to IBV_SEND_INLINE | IBV_SEND_SIGNALED > and poll my send CQ. For large messages I use IBV_WR_RDMA_WRITE with > send flags set to IBV_SEND_SIGNALED. After posting any large message I > poll the send CQ for completion. I have sufficient buffers posted on > the receiver. > > The first thing I did when I encountered this issue is to write a > small test which just does only IBV_WR_RDMA_WRITE similar to my > application. This test works fine without having any issues. I am > wondering what could be potential issues that might be causing such a > scenario of unavailable resources. Any help in debugging this > situation would be appreciated. I am not subscribed to the list would > appreciate if I am copied on the replies. Hi. ibv_post_send may fail if: 1) too many WR are outstanding on the SQ (maybe you posted too many WR without polling for there completion) 2) bad WR was posted (for example: RDMA Write for UD QP). 3) maybe the size of the WR that you are trying to post as inline is too big (upon QP creation you can see this size in the sq_send_inline) Dotan From vlad at lists.openfabrics.org Sun Dec 28 03:11:03 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Sun, 28 Dec 2008 03:11:03 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081228-0200 daily build status Message-ID: <20081228111103.722BDE60960@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From ronli.voltaire at gmail.com Sun Dec 28 08:11:10 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Sun, 28 Dec 2008 18:11:10 +0200 Subject: [ofa-general] ***SPAM*** kernel panic while using ipoib with lro enabled Message-ID: <3b5e77ad0812280811h163e6060p465d7d8b66719c11@mail.gmail.com> Hi all, We've discovered that sometimes we get a kernel panic while using ipoib with lro enabled. You can find all the details here: https://bugs.openfabrics.org/show_bug.cgi?id=1473 This bug is reproduced both in kernel 2.6.27 and OFED 1.4. However, we were currently only able to reproduce it on certain types of machines (x86-64 with 8 cores). Ron From sfr at canb.auug.org.au Sun Dec 28 16:43:21 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Mon, 29 Dec 2008 11:43:21 +1100 Subject: [ofa-general] linux-next: origin tree build failure Message-ID: <20081229114321.4b6baea5.sfr@canb.auug.org.au> Hi Roland, Today's linux-next build (powerpc ppc64_defconfig) failed like this: ERROR: ".ipv6_chk_addr" [drivers/infiniband/core/ib_addr.ko] undefined! ERROR: ".ip6_route_output" [drivers/infiniband/core/ib_addr.ko] undefined! Caused by commit 38617c64bf9a10bf20e41d95b69bb81e8560fe9d ("RDMA/addr: Add support for translating IPv6 addresses"). This requires a dependency on IPV6. I have reverted that commit for today (which fixes the build, but may not be the correct solution). -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From rdreier at cisco.com Sun Dec 28 19:36:12 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 28 Dec 2008 19:36:12 -0800 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: <20081229114321.4b6baea5.sfr@canb.auug.org.au> (Stephen Rothwell's message of "Mon, 29 Dec 2008 11:43:21 +1100") References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> Message-ID: > Today's linux-next build (powerpc ppc64_defconfig) failed like this: > > ERROR: ".ipv6_chk_addr" [drivers/infiniband/core/ib_addr.ko] undefined! > ERROR: ".ip6_route_output" [drivers/infiniband/core/ib_addr.ko] undefined! > > Caused by commit 38617c64bf9a10bf20e41d95b69bb81e8560fe9d ("RDMA/addr: > Add support for translating IPv6 addresses"). This requires a dependency > on IPV6. Thanks, I'll get a fix queued up. Sorry about that. - R. From rdreier at cisco.com Sun Dec 28 19:44:52 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 28 Dec 2008 19:44:52 -0800 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: <20081229114321.4b6baea5.sfr@canb.auug.org.au> (Stephen Rothwell's message of "Mon, 29 Dec 2008 11:43:21 +1100") References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> Message-ID: > ERROR: ".ipv6_chk_addr" [drivers/infiniband/core/ib_addr.ko] undefined! > ERROR: ".ip6_route_output" [drivers/infiniband/core/ib_addr.ko] undefined! > > Caused by commit 38617c64bf9a10bf20e41d95b69bb81e8560fe9d ("RDMA/addr: > Add support for translating IPv6 addresses"). This requires a dependency > on IPV6. So how do we want to fix this? (This question is mostly directed to the IB guys) One possibility is to make all this depend on IPV6 in Kconfig, but I think we want the RDMA CM to be buildable/usable even if IPv6 isn't enabled. A better option is to just put all the IPv6 related stuff into #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) and add a Kconfig dependency on (IPV6 || IPV6=n) as we did for IPoIB. But then this leads to the behavior that loading the RDMA CM will cause the ipv6 module to be loaded if IPV6=m in the kernel config, even if the administrator doesn't want to enable IPv6, just as with IPoIB today. And people already complain about that. Anyone see a better solution (which we could use for IPoIB even)? - R. From rdreier at cisco.com Sun Dec 28 19:45:46 2008 From: rdreier at cisco.com (Roland Dreier) Date: Sun, 28 Dec 2008 19:45:46 -0800 Subject: [ofa-general] Re: kernel panic while using ipoib with lro enabled In-Reply-To: <3b5e77ad0812280811h163e6060p465d7d8b66719c11@mail.gmail.com> (Ron Livne's message of "Sun, 28 Dec 2008 18:11:10 +0200") References: <3b5e77ad0812280811h163e6060p465d7d8b66719c11@mail.gmail.com> Message-ID: > We've discovered that sometimes we get a kernel panic while using > ipoib with lro enabled. > You can find all the details here: > > https://bugs.openfabrics.org/show_bug.cgi?id=1473 > > This bug is reproduced both in kernel 2.6.27 and OFED 1.4. I'll try to reproduce it myself, but in the meantime, can you reproduce this with 2.6.28 or even better latest upstream Linus's git tree (without OFED)? - R. From ronli.voltaire at gmail.com Mon Dec 29 00:39:50 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Mon, 29 Dec 2008 10:39:50 +0200 Subject: [ofa-general] ***SPAM*** Re: kernel panic while using ipoib with lro enabled In-Reply-To: References: <3b5e77ad0812280811h163e6060p465d7d8b66719c11@mail.gmail.com> Message-ID: <3b5e77ad0812290039q264cb387s7d1e99066cc5fee6@mail.gmail.com> Yes, it does reproduce on Linus's git tree. Ron On Mon, Dec 29, 2008 at 5:45 AM, Roland Dreier wrote: > > We've discovered that sometimes we get a kernel panic while using > > ipoib with lro enabled. > > You can find all the details here: > > > > https://bugs.openfabrics.org/show_bug.cgi?id=1473 > > > > This bug is reproduced both in kernel 2.6.27 and OFED 1.4. > > I'll try to reproduce it myself, but in the meantime, can you reproduce > this with 2.6.28 or even better latest upstream Linus's git tree > (without OFED)? > > - R. > From alekseys at voltaire.com Mon Dec 29 00:48:19 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Mon, 29 Dec 2008 10:48:19 +0200 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: <20081229114321.4b6baea5.sfr@canb.auug.org.au> References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> Message-ID: <1230540499.4261.24.camel@alst60> I'm going to change it by adding #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) exactly as it solved in ipoib module and as Roland says. On Mon, 2008-12-29 at 11:43 +1100, Stephen Rothwell wrote: > Hi Roland, > > Today's linux-next build (powerpc ppc64_defconfig) failed like this: > > ERROR: ".ipv6_chk_addr" [drivers/infiniband/core/ib_addr.ko] undefined! > ERROR: ".ip6_route_output" [drivers/infiniband/core/ib_addr.ko] undefined! > > Caused by commit 38617c64bf9a10bf20e41d95b69bb81e8560fe9d ("RDMA/addr: > Add support for translating IPv6 addresses"). This requires a dependency > on IPV6. > > I have reverted that commit for today (which fixes the build, but may not > be the correct solution). From alekseys at voltaire.com Mon Dec 29 01:58:57 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Mon, 29 Dec 2008 11:58:57 +0200 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> Message-ID: <1230544737.4261.33.camel@alst60> After another investigation of this problem, I think that proposed solution is #ifdef as good for a first stage. IPv6 support is mandatory when we are talking about running linux in some organization. But, of course, the way how it implemented in IB stack should be changed. So on the second stage, I'd like drop out these "defines" and at the time of module initialization obtain addresses of IPv6 functions and in the case if they are present at the runtime, call them. It should be nice solution for RMDA_CM and IPoIB modules. On Sun, 2008-12-28 at 19:44 -0800, Roland Dreier wrote: > > ERROR: ".ipv6_chk_addr" [drivers/infiniband/core/ib_addr.ko] undefined! > > ERROR: ".ip6_route_output" [drivers/infiniband/core/ib_addr.ko] undefined! > > > > Caused by commit 38617c64bf9a10bf20e41d95b69bb81e8560fe9d ("RDMA/addr: > > Add support for translating IPv6 addresses"). This requires a dependency > > on IPV6. > > So how do we want to fix this? (This question is mostly directed to the > IB guys) One possibility is to make all this depend on IPV6 in Kconfig, > but I think we want the RDMA CM to be buildable/usable even if IPv6 > isn't enabled. A better option is to just put all the IPv6 related > stuff into > > #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) > > and add a Kconfig dependency on (IPV6 || IPV6=n) as we did for IPoIB. > > But then this leads to the behavior that loading the RDMA CM will cause > the ipv6 module to be loaded if IPV6=m in the kernel config, even if the > administrator doesn't want to enable IPv6, just as with IPoIB today. > And people already complain about that. > > Anyone see a better solution (which we could use for IPoIB even)? > > - R. From jackm at dev.mellanox.co.il Mon Dec 29 02:23:11 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Mon, 29 Dec 2008 12:23:11 +0200 Subject: [ofa-general] [PATCH] mlx4_ib: fix for bugzilla 1383 (LSO packet processing) Message-ID: <200812291223.11753.jackm@dev.mellanox.co.il> mlx4_ib: fix for Bugzilla 1383 (LSO packet processing). The LSO segment header in the WQE was written too early. Signed-off-by: Jack Morgenstein diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 39167a7..e931d88 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1462,7 +1462,7 @@ static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg) } static int build_lso_seg(struct mlx4_wqe_lso_seg *wqe, struct ib_send_wr *wr, - struct mlx4_ib_qp *qp, unsigned *lso_seg_len) + struct mlx4_ib_qp *qp, unsigned *lso_seg_len, __be32 *lso_hdr_sz) { unsigned halign = ALIGN(sizeof *wqe + wr->wr.ud.hlen, 16); @@ -1479,10 +1479,7 @@ static int build_lso_seg(struct mlx4_wqe_lso_seg *wqe, struct ib_send_wr *wr, memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen); - /* make sure LSO header is written before overwriting stamping */ - wmb(); - - wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 | + *lso_hdr_sz = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 | wr->wr.ud.hlen); *lso_seg_len = halign; @@ -1519,6 +1516,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int uninitialized_var(size); unsigned uninitialized_var(seglen); int i; + __be32 *lso_wqe; + __be32 uninitialized_var(lso_hdr_sz); spin_lock_irqsave(&qp->sq.lock, flags); @@ -1606,13 +1605,21 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, size += sizeof (struct mlx4_wqe_datagram_seg) / 16; if (wr->opcode == IB_WR_LSO) { - err = build_lso_seg(wqe, wr, qp, &seglen); + err = build_lso_seg(wqe, wr, qp, &seglen, &lso_hdr_sz); if (unlikely(err)) { *bad_wr = wr; goto out; } + lso_wqe = (__be32 *) wqe; wqe += seglen; - size += seglen / 16; + dseg = wqe; + dseg += wr->num_sge - 1; + size += (seglen / 16) + wr->num_sge * + (sizeof (struct mlx4_wqe_data_seg) / 16); + for (i = wr->num_sge - 1; i >= 0; --i, --dseg) + set_data_seg(dseg, wr->sg_list + i); + *lso_wqe = lso_hdr_sz; + goto lso_continue; } break; @@ -1652,6 +1659,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, for (i = wr->num_sge - 1; i >= 0; --i, --dseg) set_data_seg(dseg, wr->sg_list + i); +lso_continue: ctrl->fence_size = (wr->send_flags & IB_SEND_FENCE ? MLX4_WQE_CTRL_FENCE : 0) | size; From dotanba at gmail.com Mon Dec 29 02:58:49 2008 From: dotanba at gmail.com (Dotan Barak) Date: Mon, 29 Dec 2008 12:58:49 +0200 Subject: [ofa-general] ibv_post_send returns -1 In-Reply-To: <005e01c96860$2d83eee0$888bcca0$@edu> References: <49553341.3000702@vt.edu> <4955D030.4060000@gmail.com> <005e01c96860$2d83eee0$888bcca0$@edu> Message-ID: <2f3bf9a60812290258i379e3ef6i47539285c8d4826f@mail.gmail.com> On Sat, Dec 27, 2008 at 10:17 PM, Bharath Ramesh wrote: > Thanks for the reply Dotan, I am posting what I am doing currently, probably > I am missing something which you could point out to me. > > 1) I don't have completion notifications or completion channel associated > with my send CQ. > 2) I only have RC QPs. > 3) Every few hundred messages (256 messages to be precise) for my small > messages (64 bytes long) for which I use IBV_WR_SEND with IBV_SEND_INLINE I > set the IBV_SEND_SIGNALED flag. Once I set the flag I poll my send CQ for > completion messages. > 4) All my RDMA buffers for large messages are all registered, and I use > IBV_SEND_SIGNALED flag for the WR and poll the send CQ for completion > immediately after ibv_post_send. > > The only time when I don't poll my send CQ for completion for ibv_post_send > is for my small messages. I presume that polling the send CQ once every 256 > messages for these should be enough to free up the WR from the SQ for the > previously posted WR. Am I wrong in assuming this, should each and every > ibv_post_send should be associated with poll to the send CQ? > No, you are right. You can poll the CQ once in a while and not after every post, but you have to make sure that the total number of outstanding WR in the Send Queue is less that the value that the QP was created with. Dotan From vlad at lists.openfabrics.org Mon Dec 29 03:13:38 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Mon, 29 Dec 2008 03:13:38 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081229-0200 daily build status Message-ID: <20081229111338.A4DB3E60C44@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From devesh28 at gmail.com Mon Dec 29 03:20:24 2008 From: devesh28 at gmail.com (Devesh Sharma) Date: Mon, 29 Dec 2008 16:50:24 +0530 Subject: ***SPAM*** [ofa-general] compiling OFED-1.2 with RHEL5.1 Message-ID: <309a667c0812290320m54efd47fr27affb1d5cc6dcec@mail.gmail.com> Hello all, I am trying to compile OFED-1.2 with RHEL5.1 I know that this OS is not supported by this distribution, is there any work around other than switing to OFED-1.2.5 or OFED-1.3? Thanks Devesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From vlad at mellanox.co.il Mon Dec 29 03:38:05 2008 From: vlad at mellanox.co.il (Vladimir Sokolovsky) Date: Mon, 29 Dec 2008 13:38:05 +0200 Subject: [ofa-general] [PATCH] libibumad: Add sysfs_*() functions to libibumad.map Message-ID: <20081229113805.GA25616@mellanox.co.il> Signed-off-by: Vladimir Sokolovsky --- libibumad/src/libibumad.map | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/libibumad/src/libibumad.map b/libibumad/src/libibumad.map index 0154b7f..ea8999e 100644 --- a/libibumad/src/libibumad.map +++ b/libibumad/src/libibumad.map @@ -30,5 +30,10 @@ IBUMAD_1.0 { umad_debug; umad_addr_dump; umad_dump; + sys_read_gid; + sys_read_guid; + sys_read_string; + sys_read_uint; + sys_read_uint64; local: *; }; -- 1.5.6.3 From tziporet at dev.mellanox.co.il Mon Dec 29 05:06:50 2008 From: tziporet at dev.mellanox.co.il (Tziporet Koren) Date: Mon, 29 Dec 2008 15:06:50 +0200 Subject: ***SPAM*** [ofa-general] compiling OFED-1.2 with RHEL5.1 In-Reply-To: <309a667c0812290320m54efd47fr27affb1d5cc6dcec@mail.gmail.com> References: <309a667c0812290320m54efd47fr27affb1d5cc6dcec@mail.gmail.com> Message-ID: <4958CB6A.3090306@mellanox.co.il> Devesh Sharma wrote: > Hello all, > I am trying to compile OFED-1.2 with RHEL5.1 I know that this OS is > not supported by this > distribution, is there any work around other than switing to > OFED-1.2.5 or OFED-1.3? > I don't think there is a workaround You can try to take RHEL 5.1 backports from 1.2.5 and use them on 1.2 but I guess you will have to change them Tziporet From ronli.voltaire at gmail.com Mon Dec 29 06:37:55 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Mon, 29 Dec 2008 16:37:55 +0200 Subject: [ofa-general] ***SPAM*** Re: kernel panic while using ipoib with lro enabled In-Reply-To: <3b5e77ad0812290039q264cb387s7d1e99066cc5fee6@mail.gmail.com> References: <3b5e77ad0812280811h163e6060p465d7d8b66719c11@mail.gmail.com> <3b5e77ad0812290039q264cb387s7d1e99066cc5fee6@mail.gmail.com> Message-ID: <3b5e77ad0812290637u64013505j1891019e5c6902f0@mail.gmail.com> Some new findings: - I've added a print of the skb->data_len and skb->len sizes right before the BUG_ON line. skb->data_len was always 0, so I don't understand why there is a kernel panic. - It's unnecessary to use netperf to reproduce the bug. It can be reproduced just by copying a file between 2 hosts with scp. - I removed the BUG_ON line and copied a large file between two hosts. The file got to the other side without any corruptions. Ron On Mon, Dec 29, 2008 at 10:39 AM, Ron Livne wrote: > Yes, it does reproduce on Linus's git tree. > > Ron > > > On Mon, Dec 29, 2008 at 5:45 AM, Roland Dreier wrote: >> > We've discovered that sometimes we get a kernel panic while using >> > ipoib with lro enabled. >> > You can find all the details here: >> > >> > https://bugs.openfabrics.org/show_bug.cgi?id=1473 >> > >> > This bug is reproduced both in kernel 2.6.27 and OFED 1.4. >> >> I'll try to reproduce it myself, but in the meantime, can you reproduce >> this with 2.6.28 or even better latest upstream Linus's git tree >> (without OFED)? >> >> - R. >> > From bramesh at vt.edu Mon Dec 29 08:04:48 2008 From: bramesh at vt.edu (Bharath Ramesh) Date: Mon, 29 Dec 2008 11:04:48 -0500 Subject: [ofa-general] ibv_post_send returns -1 In-Reply-To: <2f3bf9a60812290258i379e3ef6i47539285c8d4826f@mail.gmail.com> References: <49553341.3000702@vt.edu> <4955D030.4060000@gmail.com> <005e01c96860$2d83eee0$888bcca0$@edu> <2f3bf9a60812290258i379e3ef6i47539285c8d4826f@mail.gmail.com> Message-ID: <006a01c969cf$30268f70$9073ae50$@edu> Thanks Dotan, probably I didn't word my query correctly. It should have been like is there a 1-1 mapping between the number of polls to send CQ and the number of calls made to ibv_post_send. The only thing I think that can be going wrong is that, I set the IBV_SEND_SIGNALED flag once in every 256 control messages. Which I assume implies I am getting a completion event only for one of every 256 control messages. Hence the outstanding WR queue is never cleared completely and I finally run out of WR that can be posted to the SQ as large messages are also posted to the same SQ. I have 1024 WR associated with the SQ. I poll the send CQ for every large message but this is not true for every control message that I post to the SQ which might be leading to a starvation of available WR's. Regards, Bharath -----Original Message----- From: Dotan Barak [mailto:dotanba at gmail.com] Sent: Monday, December 29, 2008 5:59 AM To: Bharath Ramesh Cc: general at lists.openfabrics.org Subject: Re: [ofa-general] ibv_post_send returns -1 On Sat, Dec 27, 2008 at 10:17 PM, Bharath Ramesh wrote: > Thanks for the reply Dotan, I am posting what I am doing currently, probably > I am missing something which you could point out to me. > > 1) I don't have completion notifications or completion channel associated > with my send CQ. > 2) I only have RC QPs. > 3) Every few hundred messages (256 messages to be precise) for my small > messages (64 bytes long) for which I use IBV_WR_SEND with IBV_SEND_INLINE I > set the IBV_SEND_SIGNALED flag. Once I set the flag I poll my send CQ for > completion messages. > 4) All my RDMA buffers for large messages are all registered, and I use > IBV_SEND_SIGNALED flag for the WR and poll the send CQ for completion > immediately after ibv_post_send. > > The only time when I don't poll my send CQ for completion for ibv_post_send > is for my small messages. I presume that polling the send CQ once every 256 > messages for these should be enough to free up the WR from the SQ for the > previously posted WR. Am I wrong in assuming this, should each and every > ibv_post_send should be associated with poll to the send CQ? > No, you are right. You can poll the CQ once in a while and not after every post, but you have to make sure that the total number of outstanding WR in the Send Queue is less that the value that the QP was created with. Dotan From rdreier at cisco.com Mon Dec 29 08:13:36 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 Dec 2008 08:13:36 -0800 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: <1230544737.4261.33.camel@alst60> (Aleksey Senin's message of "Mon, 29 Dec 2008 11:58:57 +0200") References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> <1230544737.4261.33.camel@alst60> Message-ID: > After another investigation of this problem, I think that proposed > solution is #ifdef as good for a first stage. IPv6 support is mandatory > when we are talking about running linux in some organization. But, of > course, the way how it implemented in IB stack should be changed. So on > the second stage, I'd like drop out these "defines" and at the time of > module initialization obtain addresses of IPv6 functions and in the case > if they are present at the runtime, call them. It should be nice > solution for RMDA_CM and IPoIB modules. I don't think this second stage sounds like a good idea. Suppose someone loads the RDMA CM first, so it doesn't find the ipv6 functions, and then later loads and configures ipv6. You'll end up in a situation where trying to make an IPv6 connection fails spuriously. (And just the ugliness of looking up function pointers isn't very nice either) - R. From alekseys at voltaire.com Mon Dec 29 08:52:02 2008 From: alekseys at voltaire.com (Aleksey Senin) Date: Mon, 29 Dec 2008 18:52:02 +0200 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> <1230544737.4261.33.camel@alst60> Message-ID: <1230569522.4261.44.camel@alst60> I thought about this.It can be solved by loading ipv6 module before RDMA_CM by specifying modules dependencies in modprobe.conf file. At least this solution helps in the the case when administrator want IB, but not IPv6. On Mon, 2008-12-29 at 08:13 -0800, Roland Dreier wrote: > > After another investigation of this problem, I think that proposed > > solution is #ifdef as good for a first stage. IPv6 support is mandatory > > when we are talking about running linux in some organization. But, of > > course, the way how it implemented in IB stack should be changed. So on > > the second stage, I'd like drop out these "defines" and at the time of > > module initialization obtain addresses of IPv6 functions and in the case > > if they are present at the runtime, call them. It should be nice > > solution for RMDA_CM and IPoIB modules. > > I don't think this second stage sounds like a good idea. Suppose > someone loads the RDMA CM first, so it doesn't find the ipv6 functions, > and then later loads and configures ipv6. You'll end up in a situation > where trying to make an IPv6 connection fails spuriously. (And just the > ugliness of looking up function pointers isn't very nice either) > > - R. From sean.hefty at intel.com Mon Dec 29 12:14:32 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Mon, 29 Dec 2008 12:14:32 -0800 Subject: [ofa-general] [PATCH] libibumad: Add sysfs_*() functions to libibumad.map In-Reply-To: <20081229113805.GA25616@mellanox.co.il> References: <20081229113805.GA25616@mellanox.co.il> Message-ID: <000001c969f2$10655bd0$0bfd070a@amr.corp.intel.com> >diff --git a/libibumad/src/libibumad.map b/libibumad/src/libibumad.map >index 0154b7f..ea8999e 100644 >--- a/libibumad/src/libibumad.map >+++ b/libibumad/src/libibumad.map >@@ -30,5 +30,10 @@ IBUMAD_1.0 { > umad_debug; > umad_addr_dump; > umad_dump; >+ sys_read_gid; >+ sys_read_guid; >+ sys_read_string; >+ sys_read_uint; >+ sys_read_uint64; > local: *; Why expose these? - Sean From rdreier at cisco.com Mon Dec 29 12:18:06 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 Dec 2008 12:18:06 -0800 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: <1230544737.4261.33.camel@alst60> (Aleksey Senin's message of "Mon, 29 Dec 2008 11:58:57 +0200") References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> <1230544737.4261.33.camel@alst60> Message-ID: Something like the following maybe? (This turns off the RDMA CM if INFINIBAND=y and IPV6=m -- another possibility would be to just turn off RDMA CM IPv6 support in the case that IB is build-in but IPv6 is modular, but that seems like a worse idea overall) diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index a5dc78a..538a0ba 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -36,7 +36,7 @@ config INFINIBAND_USER_MEM config INFINIBAND_ADDR_TRANS bool - depends on INET + depends on INET && !(INFINIBAND = y && IPV6 = m) default y source "drivers/infiniband/hw/mthca/Kconfig" diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index d98b05b..ec7abb5 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -128,6 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr) ret = rdma_copy_addr(dev_addr, dev, NULL); dev_put(dev); break; +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) case AF_INET6: for_each_netdev(&init_net, dev) { if (ipv6_chk_addr(&init_net, @@ -138,6 +139,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr) } } break; +#endif default: break; } @@ -179,10 +181,11 @@ static void addr_send_arp(struct sockaddr *dst_in) { struct rtable *rt; struct flowi fl; - struct dst_entry *dst; memset(&fl, 0, sizeof fl); - if (dst_in->sa_family == AF_INET) { + + switch (dst_in->sa_family) { + case AF_INET: fl.nl_u.ip4_u.daddr = ((struct sockaddr_in *) dst_in)->sin_addr.s_addr; @@ -191,8 +194,13 @@ static void addr_send_arp(struct sockaddr *dst_in) neigh_event_send(rt->u.dst.neighbour, NULL); ip_rt_put(rt); + break; + +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + { + struct dst_entry *dst; - } else { fl.nl_u.ip6_u.daddr = ((struct sockaddr_in6 *) dst_in)->sin6_addr; @@ -202,6 +210,9 @@ static void addr_send_arp(struct sockaddr *dst_in) neigh_event_send(dst->neighbour, NULL); dst_release(dst); + break; + } +#endif } } @@ -254,6 +265,7 @@ out: return ret; } +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) static int addr6_resolve_remote(struct sockaddr_in6 *src_in, struct sockaddr_in6 *dst_in, struct rdma_dev_addr *addr) @@ -282,6 +294,14 @@ static int addr6_resolve_remote(struct sockaddr_in6 *src_in, dst_release(dst); return ret; } +#else +static int addr6_resolve_remote(struct sockaddr_in6 *src_in, + struct sockaddr_in6 *dst_in, + struct rdma_dev_addr *addr) +{ + return -EADDRNOTAVAIL; +} +#endif static int addr_resolve_remote(struct sockaddr *src_in, struct sockaddr *dst_in, @@ -340,7 +360,9 @@ static int addr_resolve_local(struct sockaddr *src_in, struct net_device *dev; int ret; - if (dst_in->sa_family == AF_INET) { + switch (dst_in->sa_family) { + case AF_INET: + { __be32 src_ip = ((struct sockaddr_in *) src_in)->sin_addr.s_addr; __be32 dst_ip = ((struct sockaddr_in *) dst_in)->sin_addr.s_addr; @@ -362,7 +384,12 @@ static int addr_resolve_local(struct sockaddr *src_in, memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); } dev_put(dev); - } else { + break; + } + +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) + case AF_INET6: + { struct in6_addr *a; for_each_netdev(&init_net, dev) @@ -390,6 +417,13 @@ static int addr_resolve_local(struct sockaddr *src_in, if (!ret) memcpy(addr->dst_dev_addr, dev->dev_addr, MAX_ADDR_LEN); } + break; + } +#endif + + default: + ret = -EADDRNOTAVAIL; + break; } return ret; From torvalds at linux-foundation.org Mon Dec 29 13:07:16 2008 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 29 Dec 2008 13:07:16 -0800 (PST) Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> <1230544737.4261.33.camel@alst60> Message-ID: On Mon, 29 Dec 2008, Roland Dreier wrote: > > Something like the following maybe? (This turns off the RDMA CM if > INFINIBAND=y and IPV6=m -- another possibility would be to just turn off > RDMA CM IPv6 support in the case that IB is build-in but IPv6 is > modular, but that seems like a worse idea overall) > > diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig > index a5dc78a..538a0ba 100644 > --- a/drivers/infiniband/Kconfig > +++ b/drivers/infiniband/Kconfig > @@ -36,7 +36,7 @@ config INFINIBAND_USER_MEM > > config INFINIBAND_ADDR_TRANS > bool > - depends on INET > + depends on INET && !(INFINIBAND = y && IPV6 = m) > default y I'd suggest config IF_IPV6 bool depends on INET depends on !(INFINIBAND = y && IPV6 = m) default y and then you can use it not only for the above: config INFINIBAND_ADDR_TRANS bool depends on IF_IPV6 default y but also use it in the source code as a more readable version: > diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c > index d98b05b..ec7abb5 100644 > --- a/drivers/infiniband/core/addr.c> +++ b/drivers/infiniband/core/addr.c > @@ -128,6 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr) > ret = rdma_copy_addr(dev_addr, dev, NULL); > dev_put(dev); > break; > +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) ie use #ifdef CONFIG_IF_IPV6 here instead. Linus From rdreier at cisco.com Mon Dec 29 13:35:02 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 Dec 2008 13:35:02 -0800 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: (Linus Torvalds's message of "Mon, 29 Dec 2008 13:07:16 -0800 (PST)") References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> <1230544737.4261.33.camel@alst60> Message-ID: > I'd suggest > > config IF_IPV6 > bool > depends on INET > depends on !(INFINIBAND = y && IPV6 = m) > default y Makes sense, will do. How about calling it INFINIBAND_USE_IPV6 or something like that, though? (Since it's under the INFINIBAND config stuff and exists to forbid INFINIBAND=y && IPV6=m trying to use IPv6). But see below: > but also use it in the source code as a more readable version: > > > diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c > > index d98b05b..ec7abb5 100644 > > --- a/drivers/infiniband/core/addr.c> +++ b/drivers/infiniband/core/addr.c > > @@ -128,6 +128,7 @@ int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr) > > ret = rdma_copy_addr(dev_addr, dev, NULL); > > dev_put(dev); > > break; > > +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) > > ie use > > #ifdef CONFIG_IF_IPV6 this doesn't make sense, does it? Your CONFIG_IF_IPV6 will be set in the case IPV6=n too I think. (Which is the whole point... we want to build this code, just without IPv6 support, if IPv6 is turned off completely) - R. From rdreier at cisco.com Mon Dec 29 13:48:18 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 Dec 2008 13:48:18 -0800 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: (Roland Dreier's message of "Mon, 29 Dec 2008 13:35:02 -0800") References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> <1230544737.4261.33.camel@alst60> Message-ID: > > I'd suggest > > > > config IF_IPV6 > > bool > > depends on INET > > depends on !(INFINIBAND = y && IPV6 = m) > > default y > > Makes sense, will do. How about calling it INFINIBAND_USE_IPV6 or > something like that, though? (Since it's under the INFINIBAND config > stuff and exists to forbid INFINIBAND=y && IPV6=m trying to use IPv6). Actually, thinking about this for 30 more seconds, I'm not sure how another config symbol helps at all. I do like splitting dependencies onto multiple lines as a replacement for &&, so I have: config INFINIBAND_ADDR_TRANS bool depends on INET depends on !(INFINIBAND = y && IPV6 = m) default y right now. Not sure if it's worth introducing another Kconfig symbol that depends on IPV6 != n to avoid the #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) tests. I note that there are tons of that construction all over the tree, and the places without it look somewhat dubious (eg net/ipv4/ip_gre.c looks as if it will do the wrong thing if IPV6=m). Maybe adding CONFIG_IPV6_ENABLED or something and cleaning up the whole tree would be a good janitorial project? - R. From rdreier at cisco.com Mon Dec 29 14:02:30 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 Dec 2008 14:02:30 -0800 Subject: [ofa-general] Re: [PATCH] mlx4_ib: fix for bugzilla 1383 (LSO packet processing) In-Reply-To: <200812291223.11753.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Mon, 29 Dec 2008 12:23:11 +0200") References: <200812291223.11753.jackm@dev.mellanox.co.il> Message-ID: > mlx4_ib: fix for Bugzilla 1383 (LSO packet processing). having this duplicate of the subject line just means if I use git tools to import it, I have to delete it by hand. I would love it if everyone tried feeding the emails they send into "git am" and then examined the result, and fixed things until the git tree that you get looks right. And in this case the subject line is particularly useless to someone reading the shortlog. It would be much better to say what the patch actually does in the subject line, and then say something like This fixes in the body of the email. (Otherwise how could anyone not totally familiar with IB development know which bugzilla to look in?) Anyway, I guess the bug is that build_lso_seg() overwrites the stamping and allows the WQE to start executing before the gather list is written? The fix looks kind of ugly: > @@ -1606,13 +1605,21 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, > size += sizeof (struct mlx4_wqe_datagram_seg) / 16; > > if (wr->opcode == IB_WR_LSO) { > - err = build_lso_seg(wqe, wr, qp, &seglen); > + err = build_lso_seg(wqe, wr, qp, &seglen, &lso_hdr_sz); > if (unlikely(err)) { > *bad_wr = wr; > goto out; > } > + lso_wqe = (__be32 *) wqe; > wqe += seglen; > - size += seglen / 16; > + dseg = wqe; > + dseg += wr->num_sge - 1; > + size += (seglen / 16) + wr->num_sge * > + (sizeof (struct mlx4_wqe_data_seg) / 16); > + for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > + set_data_seg(dseg, wr->sg_list + i); > + *lso_wqe = lso_hdr_sz; > + goto lso_continue; rather than duplicating all this code and adding the ugly goto, can we just add another "if (wr->opcode == IB_WR_LSO)" after writing the data segments and do the *lso_wqe = lso_hdr_sz there? Also is your code missing a memory barrier between the set_data_seg() loop and the lso_wqe assignment? It seems that an out-of-order CPU could make the lso_wqe visible before all the data segments are visible, so the bug could show up there anyway. - R. From rdreier at cisco.com Mon Dec 29 14:03:30 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 Dec 2008 14:03:30 -0800 Subject: [ofa-general] Re: kernel panic while using ipoib with lro enabled In-Reply-To: <3b5e77ad0812290637u64013505j1891019e5c6902f0@mail.gmail.com> (Ron Livne's message of "Mon, 29 Dec 2008 16:37:55 +0200") References: <3b5e77ad0812280811h163e6060p465d7d8b66719c11@mail.gmail.com> <3b5e77ad0812290039q264cb387s7d1e99066cc5fee6@mail.gmail.com> <3b5e77ad0812290637u64013505j1891019e5c6902f0@mail.gmail.com> Message-ID: > - I've added a print of the skb->data_len and skb->len sizes right > before the BUG_ON line. > skb->data_len was always 0, so I don't understand why there is a kernel panic. A race of two CPUs doing an skb_pull or something like that? - R. From devesh28 at gmail.com Mon Dec 29 21:08:54 2008 From: devesh28 at gmail.com (Devesh Sharma) Date: Tue, 30 Dec 2008 10:38:54 +0530 Subject: ***SPAM*** Re: ***SPAM*** [ofa-general] compiling OFED-1.2 with RHEL5.1 In-Reply-To: <4958CB6A.3090306@mellanox.co.il> References: <309a667c0812290320m54efd47fr27affb1d5cc6dcec@mail.gmail.com> <4958CB6A.3090306@mellanox.co.il> Message-ID: <309a667c0812292108w162e747ayfa132a60df729e01@mail.gmail.com> hello Tziporet, thanks for replying, I will try to do this, how many changes do you think I will have to made, are they many? If there are some problems I will contact to you for further help -Devesh On Mon, Dec 29, 2008 at 6:36 PM, Tziporet Koren wrote: > Devesh Sharma wrote: > >> Hello all, >> I am trying to compile OFED-1.2 with RHEL5.1 I know that this OS is not >> supported by this >> distribution, is there any work around other than switing to OFED-1.2.5 or >> OFED-1.3? >> >> > I don't think there is a workaround > You can try to take RHEL 5.1 backports from 1.2.5 and use them on 1.2 but I > guess you will have to change them > > Tziporet > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgunthorpe at obsidianresearch.com Mon Dec 29 23:34:12 2008 From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe) Date: Tue, 30 Dec 2008 00:34:12 -0700 Subject: [ofa-general] [PATCH V2 1/3] Create a new library libibnetdisc In-Reply-To: <20081223164141.241dd3f0.weiny2@llnl.gov> References: <20081211162031.0c591f54.weiny2@llnl.gov> <1230056943.23747.21.camel@auk31.llnl.gov> <20081223184331.GL31213@obsidianresearch.com> <20081223164141.241dd3f0.weiny2@llnl.gov> Message-ID: <20081230073411.GD24047@obsidianresearch.com> On Tue, Dec 23, 2008 at 04:41:41PM -0800, Ira Weiny wrote: > Otherwise calls to the macro with only 1 parameter fail to compile. It seems > that GCC has a couple of extensions [*] but the above should be C99 compliant > without GCC extensions. Does that seem right? Right, that is the unfortunate oversight of the C99 committee. I think the gcc extension in the C99 case is pretty common.. msdn says that VC++ will supress the comma if the argument is omitted, which I suspect is compatible with the GCC extension behavior. http://msdn.microsoft.com/en-us/library/ms177415(VS.80).aspx Jason From rdreier at cisco.com Mon Dec 29 23:38:18 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 Dec 2008 23:38:18 -0800 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: <20081229114321.4b6baea5.sfr@canb.auug.org.au> (Stephen Rothwell's message of "Mon, 29 Dec 2008 11:43:21 +1100") References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> Message-ID: > ERROR: ".ipv6_chk_addr" [drivers/infiniband/core/ib_addr.ko] undefined! > ERROR: ".ip6_route_output" [drivers/infiniband/core/ib_addr.ko] undefined! OK, I pushed out a change that should fix this, so you can drop the revert. Let me know if you see further issues. - R. From rdreier at cisco.com Mon Dec 29 23:38:55 2008 From: rdreier at cisco.com (Roland Dreier) Date: Mon, 29 Dec 2008 23:38:55 -0800 Subject: [ofa-general] Re: kernel panic while using ipoib with lro enabled In-Reply-To: <3b5e77ad0812290637u64013505j1891019e5c6902f0@mail.gmail.com> (Ron Livne's message of "Mon, 29 Dec 2008 16:37:55 +0200") References: <3b5e77ad0812280811h163e6060p465d7d8b66719c11@mail.gmail.com> <3b5e77ad0812290039q264cb387s7d1e99066cc5fee6@mail.gmail.com> <3b5e77ad0812290637u64013505j1891019e5c6902f0@mail.gmail.com> Message-ID: By the way, I don't seem to be able to reproduce this easily in my setup. What kind of HCA/system are you using? - R. From vlad at dev.mellanox.co.il Tue Dec 30 00:01:40 2008 From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky) Date: Tue, 30 Dec 2008 10:01:40 +0200 Subject: [ofa-general] [PATCH] libibumad: Add sysfs_*() functions to libibumad.map In-Reply-To: <000001c969f2$10655bd0$0bfd070a@amr.corp.intel.com> References: <20081229113805.GA25616@mellanox.co.il> <000001c969f2$10655bd0$0bfd070a@amr.corp.intel.com> Message-ID: <4959D564.6090305@dev.mellanox.co.il> Sean Hefty wrote: >> diff --git a/libibumad/src/libibumad.map b/libibumad/src/libibumad.map >> index 0154b7f..ea8999e 100644 >> --- a/libibumad/src/libibumad.map >> +++ b/libibumad/src/libibumad.map >> @@ -30,5 +30,10 @@ IBUMAD_1.0 { >> umad_debug; >> umad_addr_dump; >> umad_dump; >> + sys_read_gid; >> + sys_read_guid; >> + sys_read_string; >> + sys_read_uint; >> + sys_read_uint64; >> local: *; >> > > Why expose these? > > - Sean > These functions are used by srptools. Regards, Vladimir From sfr at canb.auug.org.au Tue Dec 30 00:30:48 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Tue, 30 Dec 2008 19:30:48 +1100 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> Message-ID: <20081230193048.db7ee462.sfr@canb.auug.org.au> Hi Roland, On Mon, 29 Dec 2008 23:38:18 -0800 Roland Dreier wrote: > > > ERROR: ".ipv6_chk_addr" [drivers/infiniband/core/ib_addr.ko] undefined! > > ERROR: ".ip6_route_output" [drivers/infiniband/core/ib_addr.ko] undefined! > > OK, I pushed out a change that should fix this, so you can drop the > revert. Let me know if you see further issues. I can drop the revert after your fix goes into Linus' tree (since that is where the breakage is) ... looking forward to it, thanks. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From ronli.voltaire at gmail.com Tue Dec 30 00:32:00 2008 From: ronli.voltaire at gmail.com (Ron Livne) Date: Tue, 30 Dec 2008 10:32:00 +0200 Subject: [ofa-general] ***SPAM*** Re: kernel panic while using ipoib with lro enabled In-Reply-To: References: <3b5e77ad0812280811h163e6060p465d7d8b66719c11@mail.gmail.com> <3b5e77ad0812290039q264cb387s7d1e99066cc5fee6@mail.gmail.com> <3b5e77ad0812290637u64013505j1891019e5c6902f0@mail.gmail.com> Message-ID: <3b5e77ad0812300032x1de8c88dq621f0e93d473bc34@mail.gmail.com> HCA: mellanox connectX Redhat 5.2 x86_64 (intel Xeox E5335, 8 cores) Ron On Tue, Dec 30, 2008 at 9:38 AM, Roland Dreier wrote: > By the way, I don't seem to be able to reproduce this easily in my > setup. What kind of HCA/system are you using? > > - R. > From vlad at lists.openfabrics.org Tue Dec 30 03:13:54 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Tue, 30 Dec 2008 03:13:54 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081230-0200 daily build status Message-ID: <20081230111354.A2F8EE60C93@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From nicolas.morey-chaisemartin at ext.bull.net Tue Dec 30 05:23:37 2008 From: nicolas.morey-chaisemartin at ext.bull.net (Nicolas Morey Chaisemartin) Date: Tue, 30 Dec 2008 14:23:37 +0100 Subject: [ofa-general] [PATCH] OpenSM: update osmeventplugin example for the new TRAP event. In-Reply-To: <20081218164813.55696c45.weiny2@llnl.gov> References: <20081218164813.55696c45.weiny2@llnl.gov> Message-ID: <495A20D9.1020509@ext.bull.net> Hello, I was wondering if there is a doc somewhere with a list of the trap codes (for generic traps) and what is stored into the associated ib_mad_notice_attr_t structure? I'm a writing a perf manager plugin for OpenSM (originally based on opensmskumme) and I'd like to handle TRAP events. The problem is without the list of trap IDs and their meaning, I'm not really sure how to handle them, and what to store in the database. Thanks Nicolas (and by the way Happy New Year to everyone) Ira Weiny wrote: > It turns out that I already was using the "OSM_EVENT_ID_TRAP" in the example > plugin. > > This makes the use work, > Ira > > > >From 7b744c38fc2aad67586ade81d65326a139a85681 Mon Sep 17 00:00:00 2001 > From: Ira Weiny > Date: Thu, 18 Dec 2008 16:16:37 -0800 > Subject: [PATCH] OpenSM: update osmeventplugin example for the new TRAP event. > > > Signed-off-by: Ira Weiny > --- > opensm/include/opensm/osm_event_plugin.h | 12 ------------ > opensm/osmeventplugin/src/osmeventplugin.c | 28 ++++++++++++++++++++-------- > 2 files changed, 20 insertions(+), 20 deletions(-) > > diff --git a/opensm/include/opensm/osm_event_plugin.h b/opensm/include/opensm/osm_event_plugin.h > index 0922c65..41a5810 100644 > --- a/opensm/include/opensm/osm_event_plugin.h > +++ b/opensm/include/opensm/osm_event_plugin.h > @@ -131,18 +131,6 @@ typedef struct osm_api_ps_event { > } osm_epi_ps_event_t; > > /** ========================================================================= > - * Trap events > - */ > -typedef struct osm_epi_trap_event { > - osm_epi_port_id_t port_id; > - uint8_t type; > - uint32_t prod_type; > - uint16_t trap_num; > - uint16_t issuer_lid; > - time_t time; > -} osm_epi_trap_event_t; > - > -/** ========================================================================= > * Plugin creators should allocate an object of this type > * (named OSM_EVENT_PLUGIN_IMPL_NAME) > * The version should be set to OSM_EVENT_PLUGIN_INTERFACE_VER > diff --git a/opensm/osmeventplugin/src/osmeventplugin.c b/opensm/osmeventplugin/src/osmeventplugin.c > index f0781eb..b4d9ce9 100644 > --- a/opensm/osmeventplugin/src/osmeventplugin.c > +++ b/opensm/osmeventplugin/src/osmeventplugin.c > @@ -137,13 +137,21 @@ static void handle_port_select(_log_events_t * log, osm_epi_ps_event_t * ps) > > /** ========================================================================= > */ > -static void handle_trap_event(_log_events_t * log, osm_epi_trap_event_t * trap) > +static void handle_trap_event(_log_events_t *log, ib_mad_notice_attr_t *p_ntc) > { > - fprintf(log->log_file, > - "Trap event %d from 0x%" PRIx64 " (%s) port %d\n", > - trap->trap_num, > - trap->port_id.node_guid, > - trap->port_id.node_name, trap->port_id.port_num); > + if (ib_notice_is_generic(p_ntc)) { > + fprintf(log->log_file, > + "Generic trap type %d; event %d; from LID 0x%x\n", > + ib_notice_get_type(p_ntc), > + cl_ntoh16(p_ntc->g_or_v.generic.trap_num), > + cl_ntoh16(p_ntc->issuer_lid)); > + } else { > + fprintf(log->log_file, > + "Vendor trap type %d; from LID 0x%x\n", > + ib_notice_get_type(p_ntc), > + cl_ntoh16(p_ntc->issuer_lid)); > + } > + > } > > /** ========================================================================= > @@ -163,13 +171,17 @@ static void report(void *_log, osm_epi_event_id_t event_id, void *event_data) > handle_port_select(log, (osm_epi_ps_event_t *) event_data); > break; > case OSM_EVENT_ID_TRAP: > - handle_trap_event(log, (osm_epi_trap_event_t *) event_data); > + handle_trap_event(log, (ib_mad_notice_attr_t *) event_data); > + break; > + case OSM_EVENT_ID_SUBNET_UP: > + fprintf(log->log_file, "Subnet up reported\n"); > break; > case OSM_EVENT_ID_MAX: > default: > osm_log(log->osmlog, OSM_LOG_ERROR, > - "Unknown event reported to plugin\n"); > + "Unknown event (%d) reported to plugin\n", event_id); > } > + fflush(log->log_file); > } > > /** ========================================================================= > From rdreier at cisco.com Tue Dec 30 07:41:36 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Dec 2008 07:41:36 -0800 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: <20081230193048.db7ee462.sfr@canb.auug.org.au> (Stephen Rothwell's message of "Tue, 30 Dec 2008 19:30:48 +1100") References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> <20081230193048.db7ee462.sfr@canb.auug.org.au> Message-ID: > I can drop the revert after your fix goes into Linus' tree (since that is > where the breakage is) ... looking forward to it, thanks. Shouldn't it work if the fix is in my for-next branch, since you pull that as part of the -next tree? Or am I unclear on how -next works? - R. From sfr at canb.auug.org.au Tue Dec 30 07:46:54 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 31 Dec 2008 02:46:54 +1100 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> <20081230193048.db7ee462.sfr@canb.auug.org.au> Message-ID: <20081231024654.a73f20df.sfr@canb.auug.org.au> Hi Roland, On Tue, 30 Dec 2008 07:41:36 -0800 Roland Dreier wrote: > > > I can drop the revert after your fix goes into Linus' tree (since that is > > where the breakage is) ... looking forward to it, thanks. > > Shouldn't it work if the fix is in my for-next branch, since you pull > that as part of the -next tree? Or am I unclear on how -next works? I also build the tree between merging most of the trees (including after fetching Linus' latest tree). -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From venkatvenkatsubra at yahoo.com Tue Dec 30 08:24:35 2008 From: venkatvenkatsubra at yahoo.com (Venkat Venkatsubra) Date: Tue, 30 Dec 2008 08:24:35 -0800 (PST) Subject: ***SPAM*** Re: [ofa-general] [RDMA CM IPv6 PATCHv7 2/2] RDMA CM Message-ID: <994594.55667.qm@web58303.mail.re3.yahoo.com> I had couple of questions regarding RDMA CM supporting IPv6. When an iWARP NIC doesn't support IPv6, what is the earliest an error could be returned saying feature unsupported ? Any time sooner than rdma_connect() for the active connect side ? And what about the passive side ? Venkat -------------- next part -------------- An HTML attachment was scrubbed... URL: From jackm at dev.mellanox.co.il Tue Dec 30 09:20:49 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Tue, 30 Dec 2008 19:20:49 +0200 Subject: [ofa-general] Re: [PATCH] mlx4_ib: fix for bugzilla 1383 (LSO packet processing) In-Reply-To: References: <200812291223.11753.jackm@dev.mellanox.co.il> Message-ID: <200812301920.50336.jackm@dev.mellanox.co.il> On Tuesday 30 December 2008 00:02, Roland Dreier wrote: > > I would love it if everyone tried feeding the emails they send into "git > am" and then examined the result, and fixed things until the git tree > that you get looks right. I apologize, I've been swamped and just pushed this one out the door. I guess, though, everyone here is always swamped -- I'll put in the extra little bit of effort and do things properly next time. > > Anyway, I guess the bug is that build_lso_seg() overwrites the stamping > and allows the WQE to start executing before the gather list is written? Correct. > > > @@ -1606,13 +1605,21 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, > > size += sizeof (struct mlx4_wqe_datagram_seg) / 16; > > > > if (wr->opcode == IB_WR_LSO) { > > - err = build_lso_seg(wqe, wr, qp, &seglen); + LSO data filled in HERE: err = build_lso_seg(wqe, wr, qp, &seglen, &lso_hdr_sz); > > if (unlikely(err)) { > > *bad_wr = wr; > > goto out; > > } > > + lso_wqe = (__be32 *) wqe; > > wqe += seglen; > > - size += seglen / 16; > > + dseg = wqe; > > + dseg += wr->num_sge - 1; > > + size += (seglen / 16) + wr->num_sge * > > + (sizeof (struct mlx4_wqe_data_seg) / 16); > > + for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > > + set_data_seg(dseg, wr->sg_list + i); *** wmb is called in set_data_seg. + stamping dword written here: *lso_wqe = lso_hdr_sz; > > + goto lso_continue; > > rather than duplicating all this code and adding the ugly goto, can we > just add another "if (wr->opcode == IB_WR_LSO)" after writing the data > segments and do the *lso_wqe = lso_hdr_sz there? this is in the data path, so I put in a lot of effort to guarantee that there would not be an extra "if" for ALL post_sends. This way, although a bit ugly, only LSO packets are affected, and there is no extra "if". > Also is your code missing a memory barrier between the set_data_seg() > loop and the lso_wqe assignment? It seems that an out-of-order CPU > could make the lso_wqe visible before all the data segments are visible, > so the bug could show up there anyway. The memory barrier is unnecessary -- since the lso segment (without stamping) is written first, then all the data segments are written (wmb() called for each segment), then finally the lso stamping dword. Thus, this will always follow a wmb(). (no one would send an LSO without any data). - Jack From sean.hefty at intel.com Tue Dec 30 12:27:07 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 30 Dec 2008 12:27:07 -0800 Subject: [ofa-general] [PATCH] libibumad: Add sysfs_*() functions to libibumad.map In-Reply-To: <4959D564.6090305@dev.mellanox.co.il> References: <20081229113805.GA25616@mellanox.co.il> <000001c969f2$10655bd0$0bfd070a@amr.corp.intel.com> <4959D564.6090305@dev.mellanox.co.il> Message-ID: <000001c96abc$fca09cb0$5e248686@amr.corp.intel.com> >>> + sys_read_gid; >>> + sys_read_guid; >>> + sys_read_string; >>> + sys_read_uint; >>> + sys_read_uint64; >>> local: *; >>> >> >> Why expose these? >> >> - Sean >> > >These functions are used by srptools. Please don't expose them. We're trying to port the IB diags and other MAD code to Windows. This is adding OS specific routines to the lowest layer. What values does srptools actually need? From sfr at canb.auug.org.au Tue Dec 30 14:52:28 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 31 Dec 2008 09:52:28 +1100 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: <20081231024654.a73f20df.sfr@canb.auug.org.au> References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> <20081230193048.db7ee462.sfr@canb.auug.org.au> <20081231024654.a73f20df.sfr@canb.auug.org.au> Message-ID: <20081231095228.b21cffe0.sfr@canb.auug.org.au> Hi Roland, On Wed, 31 Dec 2008 02:46:54 +1100 Stephen Rothwell wrote: > > On Tue, 30 Dec 2008 07:41:36 -0800 Roland Dreier wrote: > > > > > I can drop the revert after your fix goes into Linus' tree (since that is > > > where the breakage is) ... looking forward to it, thanks. > > > > Shouldn't it work if the fix is in my for-next branch, since you pull > > that as part of the -next tree? Or am I unclear on how -next works? > > I also build the tree between merging most of the trees (including after > fetching Linus' latest tree). So instead of the revert, I will cherry-pick your fix commit just after Linus' tree today on the assumption that you will send it to him ASAP. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From rdreier at cisco.com Tue Dec 30 14:56:05 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Dec 2008 14:56:05 -0800 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: <20081231095228.b21cffe0.sfr@canb.auug.org.au> (Stephen Rothwell's message of "Wed, 31 Dec 2008 09:52:28 +1100") References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> <20081230193048.db7ee462.sfr@canb.auug.org.au> <20081231024654.a73f20df.sfr@canb.auug.org.au> <20081231095228.b21cffe0.sfr@canb.auug.org.au> Message-ID: > So instead of the revert, I will cherry-pick your fix commit just after > Linus' tree today on the assumption that you will send it to him ASAP. Yes, I will batch up a few other things and send a pull request today. - R. From rdreier at cisco.com Tue Dec 30 15:00:02 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Dec 2008 15:00:02 -0800 Subject: [ofa-general] Re: [PATCH] mlx4_ib: fix for bugzilla 1383 (LSO packet processing) In-Reply-To: <200812301920.50336.jackm@dev.mellanox.co.il> (Jack Morgenstein's message of "Tue, 30 Dec 2008 19:20:49 +0200") References: <200812291223.11753.jackm@dev.mellanox.co.il> <200812301920.50336.jackm@dev.mellanox.co.il> Message-ID: > I apologize, I've been swamped and just pushed this one out the door. > I guess, though, everyone here is always swamped -- I'll put in the > extra little bit of effort and do things properly next time. It's not just you, but I wish everyone would just take the time to set things up once and for all (and possibly learn how to use automation such as git-send-email) so that I don't have to keep editing trivial things in nearly every patch I get. It doesn't take any longer to send a patch that automated tools can handle -- you just have to care enough to learn how to do it. > this is in the data path, so I put in a lot of effort to guarantee that > there would not be an extra "if" for ALL post_sends. This way, although > a bit ugly, only LSO packets are affected, and there is no extra "if". Yeah, I figured that was what you wanted to do. But I'm not convinced that the bigger I-cache footprint of duplicating code doesn't hurt more than a single conditional branch. Anyway, what about doing something like the below (the cost now becomes two more assignments): diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 39167a7..2d8ae4c 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -1462,7 +1462,8 @@ static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg) } static int build_lso_seg(struct mlx4_wqe_lso_seg *wqe, struct ib_send_wr *wr, - struct mlx4_ib_qp *qp, unsigned *lso_seg_len) + struct mlx4_ib_qp *qp, unsigned *lso_seg_len, + __be32 *lso_hdr_sz) { unsigned halign = ALIGN(sizeof *wqe + wr->wr.ud.hlen, 16); @@ -1479,12 +1480,8 @@ static int build_lso_seg(struct mlx4_wqe_lso_seg *wqe, struct ib_send_wr *wr, memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen); - /* make sure LSO header is written before overwriting stamping */ - wmb(); - - wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 | - wr->wr.ud.hlen); - + *lso_hdr_sz = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 | + wr->wr.ud.hlen); *lso_seg_len = halign; return 0; } @@ -1518,6 +1515,9 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int uninitialized_var(stamp); int uninitialized_var(size); unsigned uninitialized_var(seglen); + __be32 *lso_wqe; + __be32 uninitialized_var(lso_hdr_sz); + __be32 dummy; int i; spin_lock_irqsave(&qp->sq.lock, flags); @@ -1525,6 +1525,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, ind = qp->sq_next_wqe; for (nreq = 0; wr; ++nreq, wr = wr->next) { + lso_wqe = &dummy; + if (mlx4_wq_overflow(&qp->sq, nreq, qp->ibqp.send_cq)) { err = -ENOMEM; *bad_wr = wr; @@ -1606,11 +1608,12 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, size += sizeof (struct mlx4_wqe_datagram_seg) / 16; if (wr->opcode == IB_WR_LSO) { - err = build_lso_seg(wqe, wr, qp, &seglen); + err = build_lso_seg(wqe, wr, qp, &seglen, &lso_hdr_sz); if (unlikely(err)) { *bad_wr = wr; goto out; } + lso_wqe = (__be32 *) wqe; wqe += seglen; size += seglen / 16; } @@ -1652,6 +1655,13 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, for (i = wr->num_sge - 1; i >= 0; --i, --dseg) set_data_seg(dseg, wr->sg_list + i); + /* + * Possibly overwrite stamping in cacheline with LSO + * segment only after making sure all data segments + * are written. + */ + *lso_wqe = lso_hdr_sz; + ctrl->fence_size = (wr->send_flags & IB_SEND_FENCE ? MLX4_WQE_CTRL_FENCE : 0) | size; From rdreier at cisco.com Tue Dec 30 15:06:19 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Dec 2008 15:06:19 -0800 Subject: [ofa-general] [PATCH] IB/mlx4: Fix taking SL field of cqe->sl_vid Message-ID: Commit f780a9f1 ("mlx4_core: Add ethernet fields to CQE struct") introduced a bug in how wc->sl is set in mlx4_ib_poll_one() -- since cqe->sl_vid is a big-endian value, the shift must be done after converting to host endianness. This bug was found using sparse endianness checking. Signed-off-by: Roland Dreier --- Looks like an obvious fix to me... I'll send it upstream. One more reason to use sparse before sending a patch. drivers/infiniband/hw/mlx4/cq.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 8415ecc..a3c5af1 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -699,7 +699,7 @@ repoll: } wc->slid = be16_to_cpu(cqe->rlid); - wc->sl = be16_to_cpu(cqe->sl_vid >> 12); + wc->sl = be16_to_cpu(cqe->sl_vid) >> 12; g_mlpath_rqpn = be32_to_cpu(cqe->g_mlpath_rqpn); wc->src_qp = g_mlpath_rqpn & 0xffffff; wc->dlid_path_bits = (g_mlpath_rqpn >> 24) & 0x7f; -- 1.6.0.4 From sfr at canb.auug.org.au Tue Dec 30 15:17:17 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 31 Dec 2008 10:17:17 +1100 Subject: [ofa-general] Re: linux-next: origin tree build failure In-Reply-To: References: <20081229114321.4b6baea5.sfr@canb.auug.org.au> <20081230193048.db7ee462.sfr@canb.auug.org.au> <20081231024654.a73f20df.sfr@canb.auug.org.au> <20081231095228.b21cffe0.sfr@canb.auug.org.au> Message-ID: <20081231101717.0ef30e15.sfr@canb.auug.org.au> Hi Roland, On Tue, 30 Dec 2008 14:56:05 -0800 Roland Dreier wrote: > > > So instead of the revert, I will cherry-pick your fix commit just after > > Linus' tree today on the assumption that you will send it to him ASAP. > > Yes, I will batch up a few other things and send a pull request today. Excellent! -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From rdreier at cisco.com Tue Dec 30 15:17:54 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Dec 2008 15:17:54 -0800 Subject: [ofa-general] [PATCH v2] ipoib: do not join broadcast group if interface is brought down In-Reply-To: <492D78EF.4010703@Voltaire.COM> (Yossi Etigin's message of "Wed, 26 Nov 2008 18:27:27 +0200") References: <49246EB7.3070607@Voltaire.COM> <492D0E88.6080009@dev.mellanox.co.il> <492D78EF.4010703@Voltaire.COM> Message-ID: > @@ -587,8 +589,10 @@ void ipoib_mcast_join_task(struct work_s > __ipoib_mcast_add(dev, priv->broadcast); > spin_unlock_irq(&priv->lock); > } > + rtnl_unlock(); > > - if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { > + if (priv->broadcast && > + !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { I'm trying to understand this patch. What protects access to priv->broadcast here and prevents it from becoming NULL right after the test? - R. From rdreier at cisco.com Tue Dec 30 15:38:43 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Dec 2008 15:38:43 -0800 Subject: [ofa-general] [GIT PULL] please pull infiniband.git Message-ID: Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus A build fix and an endianness bug fix before the new year: Roland Dreier (3): RDMA/addr: Fix build breakage when IPv6 is disabled IB/mlx4: Fix reading SL field out of cqe->sl_vid Merge branches 'cma' and 'mlx4' into for-linus drivers/infiniband/Kconfig | 1 + drivers/infiniband/core/addr.c | 47 +++++++++++++++++++++++++++++++++----- drivers/infiniband/hw/mlx4/cq.c | 2 +- 3 files changed, 42 insertions(+), 8 deletions(-) From rdreier at cisco.com Tue Dec 30 16:15:20 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Dec 2008 16:15:20 -0800 Subject: [ofa-general] Re: [PATCH 3/4] ipoib: fix a deadlock between ipoib start/stop and child interface create/delete In-Reply-To: <4946A28C.8030409@Voltaire.COM> (Yossi Etigin's message of "Mon, 15 Dec 2008 20:31:40 +0200") References: <49469C1E.8010307@Voltaire.COM> <4946A28C.8030409@Voltaire.COM> Message-ID: > + atomic_t vlan_task_flag; why is this atomic_t? I only see: > + atomic_set(&priv->vlan_task_flag, 1); > + atomic_set(&priv->vlan_task_flag, 0); > + iffup_value = atomic_read(&priv->vlan_task_flag) ? IFF_UP : 0; so as far as I can tell you are not using anything atomic. So if there's a race you're worried about, it's still there... - R. From sfr at canb.auug.org.au Tue Dec 30 19:12:57 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 31 Dec 2008 14:12:57 +1100 Subject: [ofa-general] [PATCH] infiniband/ehca: spin_lock_irqsave takes an unsigned long Message-ID: <20081231141257.9bafac41.sfr@canb.auug.org.au> This will also help prevent some warnings when we change u64 to unsigned long long. Signed-off-by: Stephen Rothwell --- drivers/infiniband/hw/ehca/ehca_main.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c index 3b77b67..c7b8a50 100644 --- a/drivers/infiniband/hw/ehca/ehca_main.c +++ b/drivers/infiniband/hw/ehca/ehca_main.c @@ -955,7 +955,7 @@ void ehca_poll_eqs(unsigned long data) struct ehca_eq *eq = &shca->eq; int max = 3; volatile u64 q_ofs, q_ofs2; - u64 flags; + unsigned long flags; spin_lock_irqsave(&eq->spinlock, flags); q_ofs = eq->ipz_queue.current_q_offset; spin_unlock_irqrestore(&eq->spinlock, flags); -- 1.6.0.5 -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ From sfr at canb.auug.org.au Tue Dec 30 19:14:53 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 31 Dec 2008 14:14:53 +1100 Subject: [ofa-general] [PATCH] infiniband/ehca: use consistent type Message-ID: <20081231141453.45d7f2c1.sfr@canb.auug.org.au> ehca_plpar_hcall9() takes an unsigned long array, so pass that. This change will avoid some warnings when we change u64 to unsigned long long. Signed-off-by: Stephen Rothwell --- drivers/infiniband/hw/ehca/hcp_if.c | 26 +++++++++++++------------- 1 files changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c index 415d3a4..79d95d9 100644 --- a/drivers/infiniband/hw/ehca/hcp_if.c +++ b/drivers/infiniband/hw/ehca/hcp_if.c @@ -226,7 +226,7 @@ u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle, u32 *eq_ist) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; u64 allocate_controls; /* resource type */ @@ -270,7 +270,7 @@ u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle, struct ehca_alloc_cq_parms *param) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, adapter_handle.handle, /* r4 */ @@ -297,7 +297,7 @@ u64 hipz_h_alloc_resource_qp(const struct ipz_adapter_handle adapter_handle, { u64 ret; u64 allocate_controls, max_r10_reg, r11, r12; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; allocate_controls = EHCA_BMASK_SET(H_ALL_RES_QP_ENHANCED_OPS, parms->ext_type) @@ -525,7 +525,7 @@ u64 hipz_h_disable_and_get_wqe(const struct ipz_adapter_handle adapter_handle, int dis_and_get_function_code) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs, adapter_handle.handle, /* r4 */ @@ -548,7 +548,7 @@ u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle, struct h_galpa gal) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; ret = ehca_plpar_hcall9(H_MODIFY_QP, outs, adapter_handle.handle, /* r4 */ qp_handle.handle, /* r5 */ @@ -579,7 +579,7 @@ u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle, struct ehca_qp *qp) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; ret = hcp_galpas_dtor(&qp->galpas); if (ret) { @@ -625,7 +625,7 @@ u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle, u32 * bma_qp_nr) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; ret = ehca_plpar_hcall9(H_DEFINE_AQP1, outs, adapter_handle.handle, /* r4 */ @@ -733,7 +733,7 @@ u64 hipz_h_alloc_resource_mr(const struct ipz_adapter_handle adapter_handle, struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, adapter_handle.handle, /* r4 */ @@ -794,7 +794,7 @@ u64 hipz_h_query_mr(const struct ipz_adapter_handle adapter_handle, struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; ret = ehca_plpar_hcall9(H_QUERY_MR, outs, adapter_handle.handle, /* r4 */ @@ -828,7 +828,7 @@ u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle, struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; ret = ehca_plpar_hcall9(H_REREGISTER_PMR, outs, adapter_handle.handle, /* r4 */ @@ -855,7 +855,7 @@ u64 hipz_h_register_smr(const struct ipz_adapter_handle adapter_handle, struct ehca_mr_hipzout_parms *outparms) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; ret = ehca_plpar_hcall9(H_REGISTER_SMR, outs, adapter_handle.handle, /* r4 */ @@ -877,7 +877,7 @@ u64 hipz_h_alloc_resource_mw(const struct ipz_adapter_handle adapter_handle, struct ehca_mw_hipzout_parms *outparms) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs, adapter_handle.handle, /* r4 */ @@ -895,7 +895,7 @@ u64 hipz_h_query_mw(const struct ipz_adapter_handle adapter_handle, struct ehca_mw_hipzout_parms *outparms) { u64 ret; - u64 outs[PLPAR_HCALL9_BUFSIZE]; + unsigned long outs[PLPAR_HCALL9_BUFSIZE]; ret = ehca_plpar_hcall9(H_QUERY_MW, outs, adapter_handle.handle, /* r4 */ -- 1.6.0.5 -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ From rdreier at cisco.com Tue Dec 30 20:20:29 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Dec 2008 20:20:29 -0800 Subject: [ofa-general] Re: [PATCH] infiniband/ehca: spin_lock_irqsave takes an unsigned long In-Reply-To: <20081231141257.9bafac41.sfr@canb.auug.org.au> (Stephen Rothwell's message of "Wed, 31 Dec 2008 14:12:57 +1100") References: <20081231141257.9bafac41.sfr@canb.auug.org.au> Message-ID: are you trying to land the 'typedef unsigned long u64' change for 2.6.28, or can these patches wait for 2.6.29? - R. From rdreier at cisco.com Tue Dec 30 20:21:30 2008 From: rdreier at cisco.com (Roland Dreier) Date: Tue, 30 Dec 2008 20:21:30 -0800 Subject: [ofa-general] Re: [PATCH] mlx4_ib: fix for bugzilla 1383 (LSO packet processing) In-Reply-To: (Roland Dreier's message of "Tue, 30 Dec 2008 15:00:02 -0800") References: <200812291223.11753.jackm@dev.mellanox.co.il> <200812301920.50336.jackm@dev.mellanox.co.il> Message-ID: by the way, if you're looking to micro-optimize the send path, moving the MLX (GSI/SMI) stuff out of line might be worth it... that's one conditional branch that could be saved I suppose. - R. From sfr at canb.auug.org.au Tue Dec 30 20:44:24 2008 From: sfr at canb.auug.org.au (Stephen Rothwell) Date: Wed, 31 Dec 2008 15:44:24 +1100 Subject: [ofa-general] Re: [PATCH] infiniband/ehca: spin_lock_irqsave takes an unsigned long In-Reply-To: References: <20081231141257.9bafac41.sfr@canb.auug.org.au> Message-ID: <20081231154424.77961256.sfr@canb.auug.org.au> Hi Roland, On Tue, 30 Dec 2008 20:20:29 -0800 Roland Dreier wrote: > > are you trying to land the 'typedef unsigned long u64' change for > 2.6.28, or can these patches wait for 2.6.29? 2.6.29 (or even 30), I would think at this point. -- Cheers, Stephen Rothwell sfr at canb.auug.org.au http://www.canb.auug.org.au/~sfr/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: From sean.hefty at intel.com Tue Dec 30 20:46:27 2008 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 30 Dec 2008 20:46:27 -0800 Subject: [ofa-general] [PATCH] libibumad: Add sysfs_*() functions to libibumad.map In-Reply-To: <000001c96abc$fca09cb0$5e248686@amr.corp.intel.com> References: <20081229113805.GA25616@mellanox.co.il> <000001c969f2$10655bd0$0bfd070a@amr.corp.intel.com> <4959D564.6090305@dev.mellanox.co.il> <000001c96abc$fca09cb0$5e248686@amr.corp.intel.com> Message-ID: <000001c96b02$bdfc6070$48e0180a@amr.corp.intel.com> >>>> + sys_read_gid; >>>> + sys_read_guid; >>>> + sys_read_string; >>>> + sys_read_uint; >>>> + sys_read_uint64; >>>> local: *; >>>> >>> >>> Why expose these? >>> >>> - Sean >>> >> >>These functions are used by srptools. > >Please don't expose them. We're trying to port the IB diags and other MAD code >to Windows. This is adding OS specific routines to the lowest layer. What >values does srptools actually need? There's also this helper function in libibverbs that may be usable instead: int ibv_read_sysfs_file(const char *dir, const char *file, char *buf, size_t size); From jackm at dev.mellanox.co.il Wed Dec 31 00:02:17 2008 From: jackm at dev.mellanox.co.il (Jack Morgenstein) Date: Wed, 31 Dec 2008 10:02:17 +0200 Subject: [ofa-general] Re: [PATCH] mlx4_ib: fix for bugzilla 1383 (LSO packet processing) In-Reply-To: References: <200812291223.11753.jackm@dev.mellanox.co.il> Message-ID: <200812311002.17809.jackm@dev.mellanox.co.il> On Wednesday 31 December 2008 06:21, Roland Dreier wrote: > by the way, if you're looking to micro-optimize the send path, moving > the MLX (GSI/SMI) stuff out of line might be worth it... that's one > conditional branch that could be saved I suppose. > I noticed that, but I did not want to mix in too many changes at once. (I'm a coward). - Jack From yosefe at Voltaire.COM Wed Dec 31 00:18:10 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Wed, 31 Dec 2008 10:18:10 +0200 Subject: [ofa-general] [PATCH v2] ipoib: do not join broadcast group if interface is brought down In-Reply-To: References: <49246EB7.3070607@Voltaire.COM> <492D0E88.6080009@dev.mellanox.co.il> <492D78EF.4010703@Voltaire.COM> Message-ID: <495B2AC2.80005@Voltaire.COM> You are right. This is the one place where priv->lock is not held. I'm sending V3. Roland Dreier wrote: > > @@ -587,8 +589,10 @@ void ipoib_mcast_join_task(struct work_s > > __ipoib_mcast_add(dev, priv->broadcast); > > spin_unlock_irq(&priv->lock); > > } > > + rtnl_unlock(); > > > > - if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { > > + if (priv->broadcast && > > + !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { > > I'm trying to understand this patch. What protects access to > priv->broadcast here and prevents it from becoming NULL right after the > test? > > - R. -- --Yossi From yosefe at Voltaire.COM Wed Dec 31 00:25:04 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Wed, 31 Dec 2008 10:25:04 +0200 Subject: [ofa-general] [PATCH v3] ipoib: do not join broadcast group if interface is brought down Message-ID: <495B2C60.6020008@Voltaire.COM> Because ipoib_workqueue is not flushed when ipoib interface is brought down, ipoib_mcast_join() may trigger a join to the broadcast group after priv->broadcast was set to NULL (during cleanup). This will cause ipoib to be joined to the broadcast group when interface is down. As a side effect, this breaks the optimization of setting qkey only when joining the broadcast group. Signed-off-by: Yossi Etigin -- Changes from v1: - Put checks in places where was assumed priv->broadcast != NULL. Changes from v2: - Put a lock around the NULL check in ipoib_mcast_join_task. Fix bugzilla 1370. --- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) Index: b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-12-31 09:56:22.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-12-31 09:57:13.000000000 +0200 @@ -502,7 +502,7 @@ static void ipoib_mcast_join(struct net_ IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE; - if (create) { + if (create && priv->broadcast) { comp_mask |= IB_SA_MCMEMBER_REC_QKEY | IB_SA_MCMEMBER_REC_MTU_SELECTOR | @@ -570,7 +570,8 @@ void ipoib_mcast_join_task(struct work_s ipoib_warn(priv, "ib_query_port failed\n"); } - if (!priv->broadcast) { + rtnl_lock(); + if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) && !priv->broadcast) { struct ipoib_mcast *broadcast; broadcast = ipoib_mcast_alloc(dev, 1); @@ -581,6 +582,7 @@ void ipoib_mcast_join_task(struct work_s queue_delayed_work(ipoib_workqueue, &priv->mcast_join_task, HZ); mutex_unlock(&mcast_mutex); + rtnl_unlock(); return; } @@ -592,12 +594,17 @@ void ipoib_mcast_join_task(struct work_s __ipoib_mcast_add(dev, priv->broadcast); spin_unlock_irq(&priv->lock); } + rtnl_unlock(); - if (!test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { + spin_lock_irq(&priv->lock); + if (priv->broadcast && + !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &priv->broadcast->flags)) { if (!test_bit(IPOIB_MCAST_FLAG_BUSY, &priv->broadcast->flags)) ipoib_mcast_join(dev, priv->broadcast, 0); + spin_unlock_irq(&priv->lock); return; } + spin_unlock_irq(&priv->lock); while (1) { struct ipoib_mcast *mcast = NULL; @@ -622,7 +629,8 @@ void ipoib_mcast_join_task(struct work_s return; } - priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu)); + if (priv->broadcast) + priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu)); if (!ipoib_cm_admin_enabled(dev)) { rtnl_lock(); -- --Yossi From vlad at lists.openfabrics.org Wed Dec 31 03:14:10 2008 From: vlad at lists.openfabrics.org (Vladimir Sokolovsky Mellanox) Date: Wed, 31 Dec 2008 03:14:10 -0800 (PST) Subject: [ofa-general] ofa_1_4_kernel 20081231-0200 daily build status Message-ID: <20081231111410.333F3E60D81@openfabrics.org> This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_4/linux-2.6.git git_branch: ofed_kernel Common build parameters: Passed: Passed on i686 with linux-2.6.16 Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.17 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16 Passed on x86_64 with linux-2.6.16.43-0.3-smp Passed on x86_64 with linux-2.6.16.21-0.8-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.17 Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18-1.2798.fc6 Passed on x86_64 with linux-2.6.18-53.el5 Passed on x86_64 with linux-2.6.18-8.el5 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.22.5-31-default Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.9-55.ELsmp Passed on x86_64 with linux-2.6.9-42.ELsmp Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on ia64 with linux-2.6.17 Passed on ia64 with linux-2.6.16 Passed on ia64 with linux-2.6.16.21-0.8-default Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.25 Passed on ia64 with linux-2.6.26 Passed on ppc64 with linux-2.6.16 Passed on ppc64 with linux-2.6.17 Passed on ppc64 with linux-2.6.19 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.18-8.el5 Failed: From ogerlitz at voltaire.com Wed Dec 31 04:33:50 2008 From: ogerlitz at voltaire.com (Or Gerlitz) Date: Wed, 31 Dec 2008 14:33:50 +0200 Subject: [ofa-general] InfiniBand/RDMA merge plans for 2.6.29 In-Reply-To: References: Message-ID: <495B66AE.1050706@voltaire.com> Roland Dreier wrote: > HW specific: > - A bunch of ipath fixes. > - A bunch of nes cleanups and fixes. > - A few ehca fixes and cleanups. What about Jack's raw QP support/fixes patch set which he reposted on December 15th? http://lists.openfabrics.org/pipermail/general/2008-December/055954.html http://lists.openfabrics.org/pipermail/general/2008-December/055953.html http://lists.openfabrics.org/pipermail/general/2008-December/055955.html Or. From yosefe at Voltaire.COM Wed Dec 31 04:45:19 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Wed, 31 Dec 2008 14:45:19 +0200 Subject: [ofa-general] Re: [PATCH 3/4] ipoib: fix a deadlock between ipoib start/stop and child interface create/delete In-Reply-To: References: <49469C1E.8010307@Voltaire.COM> <4946A28C.8030409@Voltaire.COM> Message-ID: <495B695F.1000605@Voltaire.COM> You're right, it should not be atomic. I wanted to protect the flag itself but as I was told by Moni Shoua it's useless. I'm sending v2. Roland Dreier wrote: > > + atomic_t vlan_task_flag; > > why is this atomic_t? I only see: > > > + atomic_set(&priv->vlan_task_flag, 1); > > + atomic_set(&priv->vlan_task_flag, 0); > > + iffup_value = atomic_read(&priv->vlan_task_flag) ? IFF_UP : 0; > > so as far as I can tell you are not using anything atomic. So if > there's a race you're worried about, it's still there... > > - R. -- --Yossi From yosefe at Voltaire.COM Wed Dec 31 04:54:35 2008 From: yosefe at Voltaire.COM (Yossi Etigin) Date: Wed, 31 Dec 2008 14:54:35 +0200 Subject: [ofa-general] [PATCH v2] ipoib: fix a deadlock between ipoib start/stop and child interface create/delete Message-ID: <495B6B8B.50803@Voltaire.COM> Fix a deadlock between child interface creation/deletion and ipoib start/stop. The former takes first vlan_mutex, and might take rtnl_lock via register_netdev or unregister_netdev. The latter is executed with rtnl_lock held, and tries to take vlan_mutex. We take the vlan_mutex and bring child interface up/down on a scheduled task instead of during stop/start, since ipoib_workqueue will not be flushed with rtnl_lock held. Signed-off-by: Yossi Etigin --- Changes from v1: - Use u8 as the vlan task up/down flag type instead of atomic_t. Fix bug #1198. An alternative approach might be to fine-grain the locking (for example use one mutex to sync child creation/deletion, and another one to sync accesses to child_intfs list). drivers/infiniband/ulp/ipoib/ipoib.h | 3 ++ drivers/infiniband/ulp/ipoib/ipoib_main.c | 33 ++++-------------------------- drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 22 ++++++++++++++++++++ 3 files changed, 30 insertions(+), 28 deletions(-) Index: b/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib.h 2008-12-15 20:28:21.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib.h 2008-12-31 14:46:29.000000000 +0200 @@ -298,6 +298,8 @@ struct ipoib_dev_priv { struct work_struct flush_heavy; struct work_struct restart_task; struct delayed_work ah_reap_task; + struct work_struct vlan_task; + u8 vlan_task_flag; struct ib_device *ca; u8 port; @@ -501,6 +503,7 @@ void ipoib_event(struct ib_event_handler int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey); int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey); +void ipoib_vlan_task(struct work_struct *work); void ipoib_pkey_poll(struct work_struct *work); int ipoib_pkey_dev_delay_open(struct net_device *dev); Index: b/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-12-15 20:28:21.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-12-31 14:51:00.000000000 +0200 @@ -125,20 +125,8 @@ int ipoib_open(struct net_device *dev) } if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { - struct ipoib_dev_priv *cpriv; - - /* Bring up any child interfaces too */ - mutex_lock(&priv->vlan_mutex); - list_for_each_entry(cpriv, &priv->child_intfs, list) { - int flags; - - flags = cpriv->dev->flags; - if (flags & IFF_UP) - continue; - - dev_change_flags(cpriv->dev, flags | IFF_UP); - } - mutex_unlock(&priv->vlan_mutex); + priv->vlan_task_flag = 1; + queue_work(ipoib_workqueue, &priv->vlan_task); } netif_start_queue(dev); @@ -161,20 +149,8 @@ static int ipoib_stop(struct net_device ipoib_ib_dev_stop(dev, 0); if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) { - struct ipoib_dev_priv *cpriv; - - /* Bring down any child interfaces too */ - mutex_lock(&priv->vlan_mutex); - list_for_each_entry(cpriv, &priv->child_intfs, list) { - int flags; - - flags = cpriv->dev->flags; - if (!(flags & IFF_UP)) - continue; - - dev_change_flags(cpriv->dev, flags & ~IFF_UP); - } - mutex_unlock(&priv->vlan_mutex); + priv->vlan_task_flag = 0; + queue_work(ipoib_workqueue, &priv->vlan_task); } return 0; @@ -1071,6 +1047,7 @@ static void ipoib_setup(struct net_devic INIT_WORK(&priv->flush_heavy, ipoib_ib_dev_flush_heavy); INIT_WORK(&priv->restart_task, ipoib_mcast_restart_task); INIT_DELAYED_WORK(&priv->ah_reap_task, ipoib_reap_ah); + INIT_WORK(&priv->vlan_task, ipoib_vlan_task); } struct ipoib_dev_priv *ipoib_intf_alloc(const char *name) Index: b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c =================================================================== --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2008-12-15 20:28:21.000000000 +0200 +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 2008-12-31 14:46:50.000000000 +0200 @@ -178,3 +178,25 @@ int ipoib_vlan_delete(struct net_device return ret; } + +void ipoib_vlan_task(struct work_struct *work) +{ + struct ipoib_dev_priv *priv = + container_of(work, struct ipoib_dev_priv, vlan_task); + struct ipoib_dev_priv *cpriv; + int flags, new_flags, iffup_value; + + iffup_value = priv->vlan_task_flag ? IFF_UP : 0; + + mutex_lock(&priv->vlan_mutex); + list_for_each_entry(cpriv, &priv->child_intfs, list) { + flags = cpriv->dev->flags; + new_flags = (flags & ~IFF_UP) | iffup_value; + if (flags != new_flags) { + rtnl_lock(); + dev_change_flags(cpriv->dev, new_flags); + rtnl_unlock(); + } + } + mutex_unlock(&priv->vlan_mutex); +} -- --Yossi From amirv at mellanox.co.il Wed Dec 31 04:59:14 2008 From: amirv at mellanox.co.il (Amir Vadai) Date: Wed, 31 Dec 2008 14:59:14 +0200 Subject: [ofa-general] Infiniband performance In-Reply-To: <4940E39A.5090802@motama.com> References: <4940E39A.5090802@motama.com> Message-ID: <495B6CA2.4060401@mellanox.co.il> Sorry for the late answer. If you run iperf as root - sdp could use zero copy which should boost the performance. - Amir Jan Ruffing wrote: > Hello, > > I'm new to Infiniband and still trying to get a grasp on what performance it can realistically deliver. > > The two directly connected test machines have Mellanox Infinihost III Lx DDR HCA cards installed and run OpenSuse 11 with a 2.6.25.16 Kernel. > > 1) Maximum Bandwidth? > > Infiniband (Double Data Rate, 4x lane) is advertised with a bandwidth of 20 Gbit/s. If my understanding is correct, this is only the signal rate, which would translate to a 16/Gbit/s data rate due to 8:10 encryption? The maximum speed I meassured so far was 12Gbit/s on the low-level-Protocolls: > > tamara /home/ruffing> ibv_rc_pingpong -m 2048 -s 1048576 -n 10000 > local address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302 > remote address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba > 20971520000 bytes in 13.63 seconds = 12313.27 Mbit/sec > 10000 iters in 13.63 seconds = 1362.53 usec/iter > > melissa Dokumente/Infiniband> ibv_rc_pingpong 192.168.2.1 -m 2048 -s 1048576 -n 10000 > local address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba > remote address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302 > 20971520000 bytes in 13.63 seconds = 12313.38 Mbit/sec > 10000 iters in 13.63 seconds = 1362.52 usec/iter > > Maximal user-level bandwidth was 11.5 GBit/s using RDMA: > > ruffing at melissa:~/Dokumente/Infiniband/NetPIPE-3.7.1> ./NPibv -m 2048 -t rdma_write -c local_poll -h 192.168.2.1 -n 100 > Using RDMA Write communications > Using local polling completion > Preposting asynchronous receives (required for Infiniband) > Now starting the main loop > [...] > 121: 8388605 bytes 100 times --> 11851.72 Mbps in 5400.06 usec > 122: 8388608 bytes 100 times --> 11851.66 Mbps in 5400.09 usec > 123: 8388611 bytes 100 times --> 11850.62 Mbps in 5400.57 usec > > That's actually 4 Gbit/s short of what I was hoping for. Yet I couldn't find any test results on the net that yielded more than 12 GBit/s on 4x DDR-HCAs. Where does this performance loss stem from? On first view, 4 GBit/s (25% of the data rate) looks quite a lot to be only protocol overhead... > Is 12 GBit/s the current maximum bandwidth, or is it possible for Infiniband users to improve performance beyond that? > > > > 2) TCP (over IPoIB) vs. RDMA/SDP/uverbs? > > On the first Infiniband installation using the packages of the OpenSuse 11 distribution, I got a TCP bandwidth of 10 GBit/s. (Which actually isn't that bad when compared to a meassured maximal bandwidth of 12 GBit/s.) This installation did neither support RDMA nor SDP, though. > > tamara iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 3M > ------------------------------------------------------------ > Client connecting to 192.168.2.2, TCP port 5001 > TCP window size: 515 KByte (default) > ------------------------------------------------------------ > [ 3] local 192.168.2.1 port 47730 connected with 192.168.2.2 port 5001 > [ ID] Interval Transfer Bandwidth > [ 3] 0.0-10.0 sec 11.6 GBytes 10.0 Gbits/sec > > > > After I installed the OFED 1.4 beta to be able to use SDP, RDMA and uverbs, I could use them to get of 12 GBit/s. Yet the TCP rate dropped by 2-3 GBit/s to 7-8 GBit/s. > > ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 10M > ------------------------------------------------------------ > Client connecting to 192.168.2.2, TCP port 5001 > TCP window size: 193 KByte (default) > ------------------------------------------------------------ > [ 3] local 192.168.2.1 port 51988 connected with 192.168.2.2 port 5001 > [ ID] Interval Transfer Bandwidth > [ 3] 0.0-10.0 sec 8.16 GBytes 7.00 Gbits/sec > > What could have caused this loss of bandwidth? Is there a way to avoid it? Obviously, this could be a show stopper (for me) as far as native Infiniband protocolls are concerned: Gaining 2 GBit/sec under special circumstances probably won't outweigh loosing 3 GBit/s during normal use. > > > > 3) SDP performance > > The SDP performance (using preloading of libsdp.so) only meassured 6.2 GBit/s, even underperforming TCP: > > ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> LD_PRELOAD=/usr/lib/libsdp.so LIBSDP_CONFIG_FILE=/etc/libsdp.conf ./iperf -c 192.168.2.2 -l 10M > ------------------------------------------------------------ > Client connecting to 192.168.2.2, TCP port 5001 > TCP window size: 16.0 MByte (default) > ------------------------------------------------------------ > [ 4] local 192.168.2.1 port 36832 connected with 192.168.2.2 port 5001 > [ ID] Interval Transfer Bandwidth > [ 4] 0.0-10.0 sec 7.22 GBytes 6.20 Gbits/sec > > /etc/libsdp.conf consits of the following two lines: > use both server * *:* > use both client * *:* > > I have a hard time believing that's the max rate of SDP. (Even if Cisco meassured similar 6.6 GBit/s: https://www.cisco.com/en/US/docs/server_nw_virtual/commercial_host_driver/host_driver_linux/user/guide/sdp.html#wp948100) > > Did I mess up my Infiniband installation, or is SDP really slower than TCP over IPoIB? > > > > Sorry if my mail might sound somewhat negative, but I'm still trying to get past the marketing buzz and figure out what to realisticly expect of Infiniband. Currently, I'm still hoping that I messed up my installation somewhere, and that a few pointers in the right direction might resolve most of the issues... :) > > Thanks in advance, > Jan Ruffing > > > > Devices: > > tamara /dev/infiniband> ls -la > total 0 > drwxr-xr-x 2 root root 140 2008-12-02 16:20 . > drwxr-xr-x 13 root root 4580 2008-12-09 14:59 .. > crw-rw---- 1 root root 231, 64 2008-12-02 16:20 issm0 > crw-rw-rw- 1 root users 10, 59 2008-11-27 10:24 rdma_cm > crw-rw---- 1 root root 231, 0 2008-12-02 16:20 umad0 > crw-rw-rw- 1 root users 231, 192 2008-11-27 10:15 uverbs0 > crw-rw---- 1 root users 231, 193 2008-11-27 10:15 uverbs1 > > > > Installed Packages: > > Build ofa_kernel RPM > Install kernel-ib RPM: > Build ofed-scripts RPM > Install ofed-scripts RPM: > Install libibverbs RPM: > Install libibverbs-devel RPM: > Install libibverbs-devel-static RPM: > Install libibverbs-utils RPM: > Install libmthca RPM: > Install libmthca-devel-static RPM: > Install libmlx4 RPM: > Install libmlx4-devel RPM: > Install libcxgb3 RPM: > Install libcxgb3-devel RPM: > Install libnes RPM: > Install libnes-devel-static RPM: > Install libibcm RPM: > Install libibcm-devel RPM: > Install libibcommon RPM: > Install libibcommon-devel RPM: > Install libibcommon-static RPM: > Install libibumad RPM: > Install libibumad-devel RPM: > Install libibumad-static RPM: > Build libibmad RPM > Install libibmad RPM: > Install libibmad-devel RPM: > Install libibmad-static RPM: > Install ibsim RPM: > Install librdmacm RPM: > Install librdmacm-utils RPM: > Install librdmacm-devel RPM: > Install libsdp RPM: > Install libsdp-devel RPM: > Install opensm-libs RPM: > Install opensm RPM: > Install opensm-devel RPM: > Install opensm-static RPM: > Install compat-dapl RPM: > Install compat-dapl-devel RPM: > Install dapl RPM: > Install dapl-devel RPM: > Install dapl-devel-static RPM: > Install dapl-utils RPM: > Install perftest RPM: > Install mstflint RPM: > Install sdpnetstat RPM: > Install srptools RPM: > Install rds-tools RPM: > (installed ibutils manually) > > > > Loaded Modules: > (libsdp currently unloaded) > > Directory: /home/ruffing > tamara /home/ruffing> lsmod | grep ib > ib_addr 24580 1 rdma_cm > ib_ipoib 97576 0 > ib_cm 53584 2 rdma_cm,ib_ipoib > ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm > ib_uverbs 56884 1 rdma_ucm > ib_umad 32016 4 > mlx4_ib 79884 0 > mlx4_core 114924 1 mlx4_ib > ib_mthca 148924 0 > ib_mad 53400 5 ib_cm,ib_sa,ib_umad,mlx4_ib,ib_mthca > ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad > ipv6 281064 23 ib_ipoib > rtc_lib 19328 1 rtc_core > libata 176604 2 ata_piix,pata_it8213 > scsi_mod 168436 4 sr_mod,sg,sd_mod,libata > dock 27536 1 libata > > tamara /home/ruffing> lsmod | grep rdma > rdma_ucm 30248 0 > rdma_cm 49544 1 rdma_ucm > iw_cm 25988 1 rdma_cm > ib_addr 24580 1 rdma_cm > ib_cm 53584 2 rdma_cm,ib_ipoib > ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm > ib_uverbs 56884 1 rdma_ucm > ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad > > > > > > From amirv at mellanox.co.il Wed Dec 31 05:09:40 2008 From: amirv at mellanox.co.il (Amir Vadai) Date: Wed, 31 Dec 2008 15:09:40 +0200 Subject: [ofa-general] Infiniband performance In-Reply-To: <495B6CA2.4060401@mellanox.co.il> References: <4940E39A.5090802@motama.com> <495B6CA2.4060401@mellanox.co.il> Message-ID: <495B6F14.3050206@mellanox.co.il> To be more specific - you need to have CAP_IPC_LOCK capability (will enable sdp to pin pages while they are zero copied). super use have it - and regular user could be configured to have it. - Amir Amir Vadai wrote: > Sorry for the late answer. > > If you run iperf as root - sdp could use zero copy which should boost > the performance. > > > - Amir > > > Jan Ruffing wrote: > > >> Hello, >> >> I'm new to Infiniband and still trying to get a grasp on what performance it can realistically deliver. >> >> The two directly connected test machines have Mellanox Infinihost III Lx DDR HCA cards installed and run OpenSuse 11 with a 2.6.25.16 Kernel. >> >> 1) Maximum Bandwidth? >> >> Infiniband (Double Data Rate, 4x lane) is advertised with a bandwidth of 20 Gbit/s. If my understanding is correct, this is only the signal rate, which would translate to a 16/Gbit/s data rate due to 8:10 encryption? The maximum speed I meassured so far was 12Gbit/s on the low-level-Protocolls: >> >> tamara /home/ruffing> ibv_rc_pingpong -m 2048 -s 1048576 -n 10000 >> local address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302 >> remote address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba >> 20971520000 bytes in 13.63 seconds = 12313.27 Mbit/sec >> 10000 iters in 13.63 seconds = 1362.53 usec/iter >> >> melissa Dokumente/Infiniband> ibv_rc_pingpong 192.168.2.1 -m 2048 -s 1048576 -n 10000 >> local address: LID 0x0002, QPN 0x380405, PSN 0x9a46ba >> remote address: LID 0x0001, QPN 0x3b0405, PSN 0x13a302 >> 20971520000 bytes in 13.63 seconds = 12313.38 Mbit/sec >> 10000 iters in 13.63 seconds = 1362.52 usec/iter >> >> Maximal user-level bandwidth was 11.5 GBit/s using RDMA: >> >> ruffing at melissa:~/Dokumente/Infiniband/NetPIPE-3.7.1> ./NPibv -m 2048 -t rdma_write -c local_poll -h 192.168.2.1 -n 100 >> Using RDMA Write communications >> Using local polling completion >> Preposting asynchronous receives (required for Infiniband) >> Now starting the main loop >> [...] >> 121: 8388605 bytes 100 times --> 11851.72 Mbps in 5400.06 usec >> 122: 8388608 bytes 100 times --> 11851.66 Mbps in 5400.09 usec >> 123: 8388611 bytes 100 times --> 11850.62 Mbps in 5400.57 usec >> >> That's actually 4 Gbit/s short of what I was hoping for. Yet I couldn't find any test results on the net that yielded more than 12 GBit/s on 4x DDR-HCAs. Where does this performance loss stem from? On first view, 4 GBit/s (25% of the data rate) looks quite a lot to be only protocol overhead... >> Is 12 GBit/s the current maximum bandwidth, or is it possible for Infiniband users to improve performance beyond that? >> >> >> >> 2) TCP (over IPoIB) vs. RDMA/SDP/uverbs? >> >> On the first Infiniband installation using the packages of the OpenSuse 11 distribution, I got a TCP bandwidth of 10 GBit/s. (Which actually isn't that bad when compared to a meassured maximal bandwidth of 12 GBit/s.) This installation did neither support RDMA nor SDP, though. >> >> tamara iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 3M >> ------------------------------------------------------------ >> Client connecting to 192.168.2.2, TCP port 5001 >> TCP window size: 515 KByte (default) >> ------------------------------------------------------------ >> [ 3] local 192.168.2.1 port 47730 connected with 192.168.2.2 port 5001 >> [ ID] Interval Transfer Bandwidth >> [ 3] 0.0-10.0 sec 11.6 GBytes 10.0 Gbits/sec >> >> >> >> After I installed the OFED 1.4 beta to be able to use SDP, RDMA and uverbs, I could use them to get of 12 GBit/s. Yet the TCP rate dropped by 2-3 GBit/s to 7-8 GBit/s. >> >> ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> ./iperf -c 192.168.2.2 -l 10M >> ------------------------------------------------------------ >> Client connecting to 192.168.2.2, TCP port 5001 >> TCP window size: 193 KByte (default) >> ------------------------------------------------------------ >> [ 3] local 192.168.2.1 port 51988 connected with 192.168.2.2 port 5001 >> [ ID] Interval Transfer Bandwidth >> [ 3] 0.0-10.0 sec 8.16 GBytes 7.00 Gbits/sec >> >> What could have caused this loss of bandwidth? Is there a way to avoid it? Obviously, this could be a show stopper (for me) as far as native Infiniband protocolls are concerned: Gaining 2 GBit/sec under special circumstances probably won't outweigh loosing 3 GBit/s during normal use. >> >> >> >> 3) SDP performance >> >> The SDP performance (using preloading of libsdp.so) only meassured 6.2 GBit/s, even underperforming TCP: >> >> ruffing at tamara:~/Dokumente/Infiniband/iperf-2.0.4/src> LD_PRELOAD=/usr/lib/libsdp.so LIBSDP_CONFIG_FILE=/etc/libsdp.conf ./iperf -c 192.168.2.2 -l 10M >> ------------------------------------------------------------ >> Client connecting to 192.168.2.2, TCP port 5001 >> TCP window size: 16.0 MByte (default) >> ------------------------------------------------------------ >> [ 4] local 192.168.2.1 port 36832 connected with 192.168.2.2 port 5001 >> [ ID] Interval Transfer Bandwidth >> [ 4] 0.0-10.0 sec 7.22 GBytes 6.20 Gbits/sec >> >> /etc/libsdp.conf consits of the following two lines: >> use both server * *:* >> use both client * *:* >> >> I have a hard time believing that's the max rate of SDP. (Even if Cisco meassured similar 6.6 GBit/s: https://www.cisco.com/en/US/docs/server_nw_virtual/commercial_host_driver/host_driver_linux/user/guide/sdp.html#wp948100) >> >> Did I mess up my Infiniband installation, or is SDP really slower than TCP over IPoIB? >> >> >> >> Sorry if my mail might sound somewhat negative, but I'm still trying to get past the marketing buzz and figure out what to realisticly expect of Infiniband. Currently, I'm still hoping that I messed up my installation somewhere, and that a few pointers in the right direction might resolve most of the issues... :) >> >> Thanks in advance, >> Jan Ruffing >> >> >> >> Devices: >> >> tamara /dev/infiniband> ls -la >> total 0 >> drwxr-xr-x 2 root root 140 2008-12-02 16:20 . >> drwxr-xr-x 13 root root 4580 2008-12-09 14:59 .. >> crw-rw---- 1 root root 231, 64 2008-12-02 16:20 issm0 >> crw-rw-rw- 1 root users 10, 59 2008-11-27 10:24 rdma_cm >> crw-rw---- 1 root root 231, 0 2008-12-02 16:20 umad0 >> crw-rw-rw- 1 root users 231, 192 2008-11-27 10:15 uverbs0 >> crw-rw---- 1 root users 231, 193 2008-11-27 10:15 uverbs1 >> >> >> >> Installed Packages: >> >> Build ofa_kernel RPM >> Install kernel-ib RPM: >> Build ofed-scripts RPM >> Install ofed-scripts RPM: >> Install libibverbs RPM: >> Install libibverbs-devel RPM: >> Install libibverbs-devel-static RPM: >> Install libibverbs-utils RPM: >> Install libmthca RPM: >> Install libmthca-devel-static RPM: >> Install libmlx4 RPM: >> Install libmlx4-devel RPM: >> Install libcxgb3 RPM: >> Install libcxgb3-devel RPM: >> Install libnes RPM: >> Install libnes-devel-static RPM: >> Install libibcm RPM: >> Install libibcm-devel RPM: >> Install libibcommon RPM: >> Install libibcommon-devel RPM: >> Install libibcommon-static RPM: >> Install libibumad RPM: >> Install libibumad-devel RPM: >> Install libibumad-static RPM: >> Build libibmad RPM >> Install libibmad RPM: >> Install libibmad-devel RPM: >> Install libibmad-static RPM: >> Install ibsim RPM: >> Install librdmacm RPM: >> Install librdmacm-utils RPM: >> Install librdmacm-devel RPM: >> Install libsdp RPM: >> Install libsdp-devel RPM: >> Install opensm-libs RPM: >> Install opensm RPM: >> Install opensm-devel RPM: >> Install opensm-static RPM: >> Install compat-dapl RPM: >> Install compat-dapl-devel RPM: >> Install dapl RPM: >> Install dapl-devel RPM: >> Install dapl-devel-static RPM: >> Install dapl-utils RPM: >> Install perftest RPM: >> Install mstflint RPM: >> Install sdpnetstat RPM: >> Install srptools RPM: >> Install rds-tools RPM: >> (installed ibutils manually) >> >> >> >> Loaded Modules: >> (libsdp currently unloaded) >> >> Directory: /home/ruffing >> tamara /home/ruffing> lsmod | grep ib >> ib_addr 24580 1 rdma_cm >> ib_ipoib 97576 0 >> ib_cm 53584 2 rdma_cm,ib_ipoib >> ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm >> ib_uverbs 56884 1 rdma_ucm >> ib_umad 32016 4 >> mlx4_ib 79884 0 >> mlx4_core 114924 1 mlx4_ib >> ib_mthca 148924 0 >> ib_mad 53400 5 ib_cm,ib_sa,ib_umad,mlx4_ib,ib_mthca >> ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad >> ipv6 281064 23 ib_ipoib >> rtc_lib 19328 1 rtc_core >> libata 176604 2 ata_piix,pata_it8213 >> scsi_mod 168436 4 sr_mod,sg,sd_mod,libata >> dock 27536 1 libata >> >> tamara /home/ruffing> lsmod | grep rdma >> rdma_ucm 30248 0 >> rdma_cm 49544 1 rdma_ucm >> iw_cm 25988 1 rdma_cm >> ib_addr 24580 1 rdma_cm >> ib_cm 53584 2 rdma_cm,ib_ipoib >> ib_sa 55944 3 rdma_cm,ib_ipoib,ib_cm >> ib_uverbs 56884 1 rdma_ucm >> ib_core 81152 12 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,iw_cxgb3,mlx4_ib,ib_mthca,ib_mad >> >> >> >> >> >> >> > > _______________________________________________ > general mailing list > general at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rdreier at cisco.com Wed Dec 31 08:36:33 2008 From: rdreier at cisco.com (Roland Dreier) Date: Wed, 31 Dec 2008 08:36:33 -0800 Subject: [ofa-general] InfiniBand/RDMA merge plans for 2.6.29 In-Reply-To: <495B66AE.1050706@voltaire.com> (Or Gerlitz's message of "Wed, 31 Dec 2008 14:33:50 +0200") References: <495B66AE.1050706@voltaire.com> Message-ID: > What about Jack's raw QP support/fixes patch set which he reposted on > December 15th? didn't make this window. How are you using raw QPs? From sashak at voltaire.com Wed Dec 31 09:02:44 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 31 Dec 2008 19:02:44 +0200 Subject: [ofa-general] [PATCH] libibmad: remove not needed header files inclusion Message-ID: <20081231170244.GC21950@sashak.voltaire.com> Remove not needed header files inclusions - mostly sys/time.h and pthread.h. Signed-off-by: Sasha Khapyorsky --- libibmad/src/gs.c | 2 -- libibmad/src/mad.c | 3 +-- libibmad/src/portid.c | 2 -- libibmad/src/register.c | 2 -- libibmad/src/resolve.c | 2 -- libibmad/src/rpc.c | 1 - libibmad/src/sa.c | 2 -- libibmad/src/serv.c | 2 -- libibmad/src/smp.c | 2 -- libibmad/src/vendor.c | 2 -- 10 files changed, 1 insertions(+), 19 deletions(-) diff --git a/libibmad/src/gs.c b/libibmad/src/gs.c index 89c927e..d350c0d 100644 --- a/libibmad/src/gs.c +++ b/libibmad/src/gs.c @@ -39,8 +39,6 @@ #include #include #include -#include -#include #include #include "mad.h" diff --git a/libibmad/src/mad.c b/libibmad/src/mad.c index f0fffcd..be27c09 100644 --- a/libibmad/src/mad.c +++ b/libibmad/src/mad.c @@ -39,8 +39,7 @@ #include #include #include -#include -#include +#include #include #include diff --git a/libibmad/src/portid.c b/libibmad/src/portid.c index a84baee..61e6be0 100644 --- a/libibmad/src/portid.c +++ b/libibmad/src/portid.c @@ -38,8 +38,6 @@ #include #include #include -#include -#include #include #include #include diff --git a/libibmad/src/register.c b/libibmad/src/register.c index a33acd8..045f840 100644 --- a/libibmad/src/register.c +++ b/libibmad/src/register.c @@ -38,8 +38,6 @@ #include #include #include -#include -#include #include #include diff --git a/libibmad/src/resolve.c b/libibmad/src/resolve.c index f012543..906b28d 100644 --- a/libibmad/src/resolve.c +++ b/libibmad/src/resolve.c @@ -38,9 +38,7 @@ #include #include #include -#include #include -#include #include #include diff --git a/libibmad/src/rpc.c b/libibmad/src/rpc.c index df28f65..5226540 100644 --- a/libibmad/src/rpc.c +++ b/libibmad/src/rpc.c @@ -39,7 +39,6 @@ #include #include #include -#include #include #include diff --git a/libibmad/src/sa.c b/libibmad/src/sa.c index 192f56e..27b9d52 100644 --- a/libibmad/src/sa.c +++ b/libibmad/src/sa.c @@ -38,9 +38,7 @@ #include #include #include -#include #include -#include #include diff --git a/libibmad/src/serv.c b/libibmad/src/serv.c index a90e961..b329352 100644 --- a/libibmad/src/serv.c +++ b/libibmad/src/serv.c @@ -38,8 +38,6 @@ #include #include #include -#include -#include #include #include diff --git a/libibmad/src/smp.c b/libibmad/src/smp.c index d190af0..ad6b066 100644 --- a/libibmad/src/smp.c +++ b/libibmad/src/smp.c @@ -39,8 +39,6 @@ #include #include #include -#include -#include #include diff --git a/libibmad/src/vendor.c b/libibmad/src/vendor.c index 04e7641..eb703f6 100644 --- a/libibmad/src/vendor.c +++ b/libibmad/src/vendor.c @@ -39,8 +39,6 @@ #include #include #include -#include -#include #include -- 1.6.0.4.766.g6fc4a From sashak at voltaire.com Wed Dec 31 09:04:13 2008 From: sashak at voltaire.com (Sasha Khapyorsky) Date: Wed, 31 Dec 2008 19:04:13 +0200 Subject: [ofa-general] [PATCH] libibmad: remove functions which use pthread In-Reply-To: <20081231170244.GC21950@sashak.voltaire.com> References: <20081231170244.GC21950@sashak.voltaire.com> Message-ID: <20081231170413.GD21950@sashak.voltaire.com> I looked at implementation of safe_*() functions (safe_smp_query, safe_smp_set and safe_ca_call) and found that they are not actually "safe" as declared by its names. The only thread-unsafe thing which is used there is static 'mad_portid' structure (from rpc.c), but modification of this structure is not protected by same mutex (actually not protected at all). As far as I know nothing uses those safe_*() primitives right now outside libibmad, so I think it is better to remove this confused functions from API (with changing library version, etc.). The primitives madrpc_lock() and madrpc_unlock() are just wrappers to hidden static pthread mutex which is not controlled by caller application. I think that it will be more robust for multithreaded application to use its own synchronization methods (pthread mutex or any other) for better control. So let's remove madrpc_lock/unlock() too. Signed-off-by: Sasha Khapyorsky --- libibmad/include/infiniband/mad.h | 41 ------------------------------------- libibmad/libibmad.ver | 2 +- libibmad/src/libibmad.map | 2 - libibmad/src/rpc.c | 15 ------------- libibmad/src/sa.c | 5 ++- 5 files changed, 4 insertions(+), 61 deletions(-) diff --git a/libibmad/include/infiniband/mad.h b/libibmad/include/infiniband/mad.h index eff6738..89b4be5 100644 --- a/libibmad/include/infiniband/mad.h +++ b/libibmad/include/infiniband/mad.h @@ -703,8 +703,6 @@ void * madrpc_rmpp(ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, void madrpc_init(char *dev_name, int dev_port, int *mgmt_classes, int num_classes); void madrpc_save_mad(void *madbuf, int len); -void madrpc_lock(void); -void madrpc_unlock(void); void madrpc_show_errors(int set); void * mad_rpc_open_port(char *dev_name, int dev_port, int *mgmt_classes, @@ -725,32 +723,6 @@ uint8_t * smp_query_via(void *buf, ib_portid_t *id, unsigned attrid, uint8_t * smp_set_via(void *buf, ib_portid_t *id, unsigned attrid, unsigned mod, unsigned timeout, const void *srcport); -inline static uint8_t * -safe_smp_query(void *rcvbuf, ib_portid_t *portid, unsigned attrid, unsigned mod, - unsigned timeout) -{ - uint8_t *p; - - madrpc_lock(); - p = smp_query(rcvbuf, portid, attrid, mod, timeout); - madrpc_unlock(); - - return p; -} - -inline static uint8_t * -safe_smp_set(void *rcvbuf, ib_portid_t *portid, unsigned attrid, unsigned mod, - unsigned timeout) -{ - uint8_t *p; - - madrpc_lock(); - p = smp_set(rcvbuf, portid, attrid, mod, timeout); - madrpc_unlock(); - - return p; -} - /* sa.c */ uint8_t * sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, unsigned timeout); @@ -761,19 +733,6 @@ int ib_path_query(ibmad_gid_t srcgid, ibmad_gid_t destgid, ib_portid_t *sm_id, int ib_path_query_via(const void *srcport, ibmad_gid_t srcgid, ibmad_gid_t destgid, ib_portid_t *sm_id, void *buf); -inline static uint8_t * -safe_sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, - unsigned timeout) -{ - uint8_t *p; - - madrpc_lock(); - p = sa_call(rcvbuf, portid, sa, timeout); - madrpc_unlock(); - - return p; -} - /* resolve.c */ int ib_resolve_smlid(ib_portid_t *sm_id, int timeout); int ib_resolve_guid(ib_portid_t *portid, uint64_t *guid, diff --git a/libibmad/libibmad.ver b/libibmad/libibmad.ver index 7e93c16..23d2dc2 100644 --- a/libibmad/libibmad.ver +++ b/libibmad/libibmad.ver @@ -6,4 +6,4 @@ # API_REV - advance on any added API # RUNNING_REV - advance any change to the vendor files # AGE - number of backward versions the API still supports -LIBVERSION=5:0:4 +LIBVERSION=2:0:0 diff --git a/libibmad/src/libibmad.map b/libibmad/src/libibmad.map index 927e51c..f944d86 100644 --- a/libibmad/src/libibmad.map +++ b/libibmad/src/libibmad.map @@ -72,14 +72,12 @@ IBMAD_1.3 { madrpc; madrpc_def_timeout; madrpc_init; - madrpc_lock; madrpc_portid; madrpc_rmpp; madrpc_save_mad; madrpc_set_retries; madrpc_set_timeout; madrpc_show_errors; - madrpc_unlock; ib_path_query; sa_call; sa_rpc_call; diff --git a/libibmad/src/rpc.c b/libibmad/src/rpc.c index 5226540..670a936 100644 --- a/libibmad/src/rpc.c +++ b/libibmad/src/rpc.c @@ -38,7 +38,6 @@ #include #include #include -#include #include #include @@ -286,20 +285,6 @@ madrpc_rmpp(ib_rpc_t *rpc, ib_portid_t *dport, ib_rmpp_hdr_t *rmpp, void *data) return mad_rpc_rmpp(&port, rpc, dport, rmpp, data); } -static pthread_mutex_t rpclock = PTHREAD_MUTEX_INITIALIZER; - -void -madrpc_lock(void) -{ - pthread_mutex_lock(&rpclock); -} - -void -madrpc_unlock(void) -{ - pthread_mutex_unlock(&rpclock); -} - void madrpc_init(char *dev_name, int dev_port, int *mgmt_classes, int num_classes) { diff --git a/libibmad/src/sa.c b/libibmad/src/sa.c index 27b9d52..c601254 100644 --- a/libibmad/src/sa.c +++ b/libibmad/src/sa.c @@ -132,7 +132,7 @@ ib_path_query_via(const void *srcport, ibmad_gid_t srcgid, ibmad_gid_t destgid, if (srcport) { p = sa_rpc_call (srcport, buf, sm_id, &sa, 0); } else { - p = safe_sa_call(buf, sm_id, &sa, 0); + p = sa_call(buf, sm_id, &sa, 0); } if (!p) { IBWARN("sa call path_query failed"); @@ -142,8 +142,9 @@ ib_path_query_via(const void *srcport, ibmad_gid_t srcgid, ibmad_gid_t destgid, mad_decode_field(p, IB_SA_PR_DLID_F, &dlid); return dlid; } + int ib_path_query(ibmad_gid_t srcgid, ibmad_gid_t destgid, ib_portid_t *sm_id, void *buf) { - return ib_path_query_via (NULL, srcgid, destgid, sm_id, buf); + return ib_path_query_via(NULL, srcgid, destgid, sm_id, buf); } -- 1.6.0.4.766.g6fc4a From or.gerlitz at gmail.com Wed Dec 31 10:36:12 2008 From: or.gerlitz at gmail.com (Or Gerlitz) Date: Wed, 31 Dec 2008 20:36:12 +0200 Subject: ***SPAM*** Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.29 In-Reply-To: References: <495B66AE.1050706@voltaire.com> Message-ID: <15ddcffd0812311036m2f006c4fhaa0064eadc7b3924@mail.gmail.com> On Wed, Dec 31, 2008 at 6:36 PM, Roland Dreier wrote: >> What about Jack's raw QP support/fixes patch set which he reposted on December 15th? > didn't make this window. How are you using raw QPs? soon want to play/experience with sniffer prototype, any chance they would eventually make it for this window? Or.