[ofa-general] Re: valgrind warnings in libibverbs & PVFS

Pete Wyckoff pw at osc.edu
Thu Mar 6 09:20:20 PST 2008


troy at scl.ameslab.gov wrote on Tue, 04 Mar 2008 20:48 -0600:
> I am trying to track down some issues with PVFS using IB with valgrind, and 
> I'm trying to make a run of 'valgrind pvfs2-ls' come out with no errors.
>
> The first thing I managed to figure out is this change to libibverbs:
>
> p4l4:/usr/src/ib/ofed-1_3_git/libibverbs/Bppc32# git diff
> diff --git a/src/cmd.c b/src/cmd.c
> diff --git a/src/verbs.c b/src/verbs.c
> index 11d3c4c..bdfe723 100644
> --- a/src/verbs.c
> +++ b/src/verbs.c
> @@ -226,6 +226,8 @@ struct ibv_comp_channel *ibv_create_comp_channel(struct 
> ibv_context *context)
>                return NULL;
>        }
>
> +       VALGRIND_MAKE_MEM_DEFINED(&resp.fd, sizeof(resp.fd));
> +
>        channel->context = context;
>        channel->fd      = resp.fd;
>        channel->refcnt  = 0;
>

This change makes sense, although I probably would have used &resp,
sizeof(resp).  There are a bunch of places that use
IBV_INIT_CMD_RESP, some of which have VALGRIND annotations too.  But
there's a bunch that do not.  Would make a nice cleanup to do them
all.

The basic issue of having write(fd, ..) mysteriously cause some
other chunk of memory to get written is something that valgrind
cannot track.

> However, I am still getting two errors, and I can't seem to figure out if 
> it's a PVFS issue, an ibverbs issue, or a libmthca issue, and I'm wondering 
> how to track this down.
>
> troy at p4l4:/usr/src/pvfs2-hg/Bppc32$ valgrind src/apps/admin/pvfs2-ls
> ==2541== Memcheck, a memory error detector.
> ==2541== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
> ==2541== Using LibVEX rev 1658, a library for dynamic binary translation.
> ==2541== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
> ==2541== Using valgrind-3.2.1-Debian, a dynamic binary instrumentation 
> framework.
> ==2541== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
> ==2541== For more details, rerun with: -v
> ==2541==
> ==2541== Syscall param write(buf) points to uninitialised byte(s)
> ==2541==    at 0xFEB2B14: write (in /usr/lib/debug/libpthread-0.10.so)
> ==2541==    by 0xFE7DD3C: ibv_cmd_create_cq (cmd.c:320)
> ==2541==    by 0xFE84080: ibv_create_cq@@IBVERBS_1.1 (verbs.c:281)
> ==2541==    by 0x10055C0C: openib_ib_initialize (openib.c:956)
> ==2541==    by 0x10051458: BMI_ib_initialize (ib.c:2001)
> ==2541==    by 0x1004CA00: activate_method (bmi.c:2008)
> ==2541==    by 0x1004CEE4: BMI_addr_lookup (bmi.c:1555)
> ==2541==    by 0x10032D20: PVFS_isys_fs_add (fs-add.sm:127)
> ==2541==    by 0x10032F50: PVFS_sys_fs_add (fs-add.sm:194)
> ==2541==    by 0x1000B698: main (pvfs2-ls.c:766)
> ==2541==  Address 0xFEB9D550 is on thread 1's stack

Either some unused fields in cmd, or padding between fields.  Hard
to silence these nicely yet avoid bugs.  I have some valgrind rules
to avoid some of these.  It could actually be a bug, but unlikely.

> ==2541== Conditional jump or move depends on uninitialised value(s)
> ==2541==    at 0xFB19A80: mthca_cq_clean (cq.c:576)
> ==2541==    by 0xFB1D688: mthca_destroy_qp (verbs.c:674)
> ==2541==    by 0xFE83758: ibv_destroy_qp@@IBVERBS_1.1 (verbs.c:490)
> ==2541==    by 0x1005693C: openib_close_connection (openib.c:368)
> ==2541==    by 0x1004F6E4: ib_close_connection (ib.c:1695)
> ==2541==    by 0x10051800: BMI_ib_finalize (ib.c:2082)
> ==2541==    by 0x1004DFA8: BMI_finalize (bmi.c:474)
> ==2541==    by 0x100112B0: PVFS_sys_finalize (finalize.c:57)
> ==2541==    by 0x1000BDF0: main (pvfs2-ls.c:850)

That may actually be a bug, but you'll have to look at the code
there in your version of libmthca.  I'm not sure what the ofed to
git commit id mapping is.

		-- Pete



More information about the general mailing list