[openib-general] IPoIB oops on path record completion
Hal Rosenstock
halr at voltaire.com
Wed Dec 15 18:21:56 PST 2004
On Wed, 2004-12-15 at 20:38, Roland Dreier wrote:
> Hal> No but it definitely oops in that callback. I didn't trace it
> Hal> in path_rec_completion; only glanced at the code. Don't the
> Hal> debug statements deference through NULL regardless of status ?
>
> Good point, I missed those uses. I've just pushed a patch that should
> fix this problem.
Still oops-es :-(
> Hal> Have you gotten a negative status on the callback (and NULL
> Hal> pathrec) ? I've yet to see this response on the analyzer as
> Hal> there are too many to go through right now. I do see it with
> Hal> extra debug I put in to narrow this down.
>
> No, I guess that's why I never saw this problem -- I've never had a
> path record callback fail.
I get the callback error but the packets on the wire look OK (see
below).
> Hal> Also, what I do see when I do a broadcast ping is that the
> Hal> path record is obtained over and over rather than being
> Hal> requested once and cached. Is that what is supposed to be
> Hal> happening now ?
>
> I can't duplicate this one. If I do something like
>
> ifconfig ib0 10.0.0.1
> ping -b 10.255.255.255
>
> I only see one path record lookup for each remote system.
I'm using /24 rather than /8 (class C rather than class A).
So my config is
ifconfig ib0 192.168.0.1
ping -b 192.168.0.255
I have 3 nodes (this is an x86 and the other 2 are x86_64).
I doubt this makes a difference in terms of these issues.
> You seem to be seeing a path record lookup fail, which of course will
> cause the lookup to be retried the next time we want to send something
> to that destination. Do you know why the the path record lookup is failing?
Just looked at the trace and all SA GetResp(PathRecord) had status 0 but
ib_sa_path_rec_callback: sa_query 0xc373fe48 status 0xffffff92 mad
0x00000000
This status is
#define ETIMEDOUT 110 /* Connection timed out */
Can you shorten your timeout down to 1 msec and see what happens ?
I do see one SA Get which was not responded to. There are a variety of
reasons this could occur.
-- Hal
More information about the general
mailing list