[openib-general] thanks and a question
Ronald G Minnich
rminnich at lanl.gov
Wed Apr 12 20:46:24 PDT 2006
Hal Rosenstock wrote:
> hoq is HOQLife. Is slv the switch LifeTimeValue ?
I believe so.
> Does that have anything to do with those settings ?
it would not work until hoq and slv were 17.
> Truly hanging ?
yes, and it was the only real connection at that point, from the bproc
daemon on the slave node to the bproc daemon on the master. There was
only 1 host powered up at that point. It was very repeatable -- we tried
to get it to boot many times. And, weirdly, it always hung at that same
point.
> Switches might drop 64 bytes at a time based on those parameters.
But why does the sender think the segment has been acked, when the
receiver has never seen that last 64 bytes? Where did the sender get
that TCP-level ack?
> That effectively doubles the time before the drops would occur which
> probably eliminated the drops so you didn't see this.
>
> 16 = 268.435 msec
> 17 = 526.871 msec
which leads to another question. This is 1/2 second. Does it really mean
that you could end up buffering 1/2 worth of flow on each port for all
256 ports?
>
> What doesn't make sense to me is the one flow. Are you sure there's no
> other data traffic ? If so, that doesn't make sense to me and hang
> together with the rest of this scenario.
no other traffic that we could see, but there had been traffic prior to
this.
Thanks hal!
ron
More information about the general
mailing list