[openib-general] IPoIB oops on path record completion

Hal Rosenstock halr at voltaire.com
Wed Dec 15 17:18:21 PST 2004


On Wed, 2004-12-15 at 20:14, Roland Dreier wrote:
>     Hal> This is due to the following: ib_sa_path_rec_callback:
>     Hal> sa_query 0xc0db0788 status 0xffffff92 mad 0x00000000 which
>     Hal> invokes query-> callback(status, NULL, query->context);
> 
>     Hal> ipoib_main.c: static void path_rec_completion(int status,
>     Hal> struct ib_sa_path_rec *pathrec, void *path_ptr)
> 
>     Hal> path_rec_completion is using the pathrec parameter as a
>     Hal> pointer without checking it for NULL first.
> 
> Hmm... are you sure this is what causes the oops?

No but it definitely oops in that callback. I didn't trace it in
path_rec_completion; only glanced at the code. Don't the debug
statements deference through NULL regardless of status ?

> path_rec_completion() will only dereference the pathrec parameter if
> its local variable ah is non-NULL:
> 
> 	if (ah) {
> 		path->pathrec = *pathrec;
> 
> and ah can only be set to non-NULL if status is successful (ah is
> initialized to NULL and the only place it can be changed is
> 
> 		ah = ipoib_create_ah(path->dev, priv->pd, &av);
> 
> which is inside a test of status.
> 
> Can you give the exact sequence you use to duplicate this?  I haven't
> been able to make it happen in my network.

Have you gotten a negative status on the callback (and NULL pathrec) ?
I've yet to see this response on the analyzer as there are too many to
go through right now. I do see it with extra debug I put in to narrow
this down.

>     Hal> Also, what I do see when I do a broadcast ping is that the
>     Hal> path record is obtained over and over rather than being
>     Hal> requested once and cached.  Is that what is supposed to be
>     Hal> happening now ?
> 
> No, that shouldn't happen.  I'll try to figure out what's happening.

Thanks.

-- Hal




More information about the general mailing list