[ofa-general] Re: valgrind warnings in libibverbs & PVFS
Pete Wyckoff
pw at osc.edu
Thu Mar 6 09:20:20 PST 2008
troy at scl.ameslab.gov wrote on Tue, 04 Mar 2008 20:48 -0600:
> I am trying to track down some issues with PVFS using IB with valgrind, and
> I'm trying to make a run of 'valgrind pvfs2-ls' come out with no errors.
>
> The first thing I managed to figure out is this change to libibverbs:
>
> p4l4:/usr/src/ib/ofed-1_3_git/libibverbs/Bppc32# git diff
> diff --git a/src/cmd.c b/src/cmd.c
> diff --git a/src/verbs.c b/src/verbs.c
> index 11d3c4c..bdfe723 100644
> --- a/src/verbs.c
> +++ b/src/verbs.c
> @@ -226,6 +226,8 @@ struct ibv_comp_channel *ibv_create_comp_channel(struct
> ibv_context *context)
> return NULL;
> }
>
> + VALGRIND_MAKE_MEM_DEFINED(&resp.fd, sizeof(resp.fd));
> +
> channel->context = context;
> channel->fd = resp.fd;
> channel->refcnt = 0;
>
This change makes sense, although I probably would have used &resp,
sizeof(resp). There are a bunch of places that use
IBV_INIT_CMD_RESP, some of which have VALGRIND annotations too. But
there's a bunch that do not. Would make a nice cleanup to do them
all.
The basic issue of having write(fd, ..) mysteriously cause some
other chunk of memory to get written is something that valgrind
cannot track.
> However, I am still getting two errors, and I can't seem to figure out if
> it's a PVFS issue, an ibverbs issue, or a libmthca issue, and I'm wondering
> how to track this down.
>
> troy at p4l4:/usr/src/pvfs2-hg/Bppc32$ valgrind src/apps/admin/pvfs2-ls
> ==2541== Memcheck, a memory error detector.
> ==2541== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
> ==2541== Using LibVEX rev 1658, a library for dynamic binary translation.
> ==2541== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
> ==2541== Using valgrind-3.2.1-Debian, a dynamic binary instrumentation
> framework.
> ==2541== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
> ==2541== For more details, rerun with: -v
> ==2541==
> ==2541== Syscall param write(buf) points to uninitialised byte(s)
> ==2541== at 0xFEB2B14: write (in /usr/lib/debug/libpthread-0.10.so)
> ==2541== by 0xFE7DD3C: ibv_cmd_create_cq (cmd.c:320)
> ==2541== by 0xFE84080: ibv_create_cq@@IBVERBS_1.1 (verbs.c:281)
> ==2541== by 0x10055C0C: openib_ib_initialize (openib.c:956)
> ==2541== by 0x10051458: BMI_ib_initialize (ib.c:2001)
> ==2541== by 0x1004CA00: activate_method (bmi.c:2008)
> ==2541== by 0x1004CEE4: BMI_addr_lookup (bmi.c:1555)
> ==2541== by 0x10032D20: PVFS_isys_fs_add (fs-add.sm:127)
> ==2541== by 0x10032F50: PVFS_sys_fs_add (fs-add.sm:194)
> ==2541== by 0x1000B698: main (pvfs2-ls.c:766)
> ==2541== Address 0xFEB9D550 is on thread 1's stack
Either some unused fields in cmd, or padding between fields. Hard
to silence these nicely yet avoid bugs. I have some valgrind rules
to avoid some of these. It could actually be a bug, but unlikely.
> ==2541== Conditional jump or move depends on uninitialised value(s)
> ==2541== at 0xFB19A80: mthca_cq_clean (cq.c:576)
> ==2541== by 0xFB1D688: mthca_destroy_qp (verbs.c:674)
> ==2541== by 0xFE83758: ibv_destroy_qp@@IBVERBS_1.1 (verbs.c:490)
> ==2541== by 0x1005693C: openib_close_connection (openib.c:368)
> ==2541== by 0x1004F6E4: ib_close_connection (ib.c:1695)
> ==2541== by 0x10051800: BMI_ib_finalize (ib.c:2082)
> ==2541== by 0x1004DFA8: BMI_finalize (bmi.c:474)
> ==2541== by 0x100112B0: PVFS_sys_finalize (finalize.c:57)
> ==2541== by 0x1000BDF0: main (pvfs2-ls.c:850)
That may actually be a bug, but you'll have to look at the code
there in your version of libmthca. I'm not sure what the ofed to
git commit id mapping is.
-- Pete
More information about the general
mailing list