[ewg] [PATCH] Patch for libibmad

Sasha Khapyorsky sashak at voltaire.com
Wed Apr 21 03:09:26 PDT 2010


Hi Mike,

On 12:16 Mon 19 Apr     , Mike Heinz wrote:
> We had a customer report that perfquery was crashing on their nodes when trying to query ports on a switch. When I examined the core dump, it was clear that libibmad was dereferencing a null pointer from one of the mad_set_ functions:
> 
> #0  0x0000000000000000 in ?? ()
> #1  0x00002ae4e13e7536 in mad_set_field () from /usr/lib64/libibmad.so.5
> #2  0x00002ae4e13e7656 in mad_field_name () from /usr/lib64/libibmad.so.5
> #3  0x0000000000401662 in mad_dump_perfcounters_rcv_sl ()
> #4  0x00000000004024c9 in mad_dump_perfcounters_rcv_sl ()
> #5  0x00002ae4e18168b4 in __libc_start_main () from /lib64/libc.so.6
> #6  0x0000000000401189 in mad_dump_perfcounters_rcv_sl ()
> #7  0x00007fffe5570ce8 in ?? ()
> #8  0x0000000000000000 in ?? ()

I cannot find a path where mad_set_field() (or even mad_field_name())
call would be resulted by mad_dump_perfcounters_rcv_sl(). Do you?

> It appears that mad_set_field() was hitting a NULL pointer in the table of MAD attributes (ib_mad_f). Such entries are being used to separate different groups of mad attributes in the table.
>
> Reviewing the code, I noted that the mad_set_* and mad_get_* functions already have some error checking to avoid going completely off the end of the table, but they do not detect the case where the selected field is unset.

But such entries should be never used, at least not by perfquery. So it
is unclear to me how you are hitting such error.

> This patch corrects the problem.

I would like to understand the problem better before fixing something.

Sasha



More information about the ewg mailing list