[Openib-windows] WSD: Behavior when the other side has not yet called accept, is different from TCP/IP (Ethernet)

Fabian Tillier ftillier at silverstorm.com
Wed Aug 9 14:46:05 PDT 2006


Hi Tzachi,

On 8/9/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
> Hi Fab,
>
> I have tested your fix with the following scenarios:
>
> 1) Client listening  -Connecting succeeds -
> 2) No client listening on the remote side. Works fine - there is a fall
> back to ipoib and the right error is returned.
>
> 3) Many clients connects to a server. This was somewhat problematic: I
> was using iperf 2.01 to test that and got some strange results: By
> default Iperf calls listen with a backlog of 5. When more than 5 threads
> try to connect from the client side the server might return the new
> error, and this causes iperf to crash.

Does the client side of iperf crashes, or the server side?  The server
side certainly shouldn't be crashing unless I screwed up the patch...

The client side's connection request might time out, in which case it
would get IB_REJ_TIMEOUT which gets translated to WSAETIMEOUT.  The
WSD switch should then cause this to retry via IPoIB on the client
side.

> The immediate reason for the
> crash is defiantly in Iperf (division by zero). I didn't have the time
> to debug Iperf to find out what the real problem was.
> Still, the fix that I was thinking about was in the function
> cm_req_callback to simply drop the packets, and therefore forcing a
> retry.

We can't just ignore the cm_req_callback - the REQ was received, and
the kernel CM created a connection end point for it that it is
tracking.  We won't get another callback for that connection end
point.  We'd have to create a queue of requests beyond the backlog,
which seems like the wrong thing to do.  In any case, the end result
shouldn't be much different than what I implemented.  The patch I sent
will tell the CM to destroy it's connection end point, but it won't
send a REJ.  This results in the REQ on the remote side being retried,
and when received a new connection end point is created and the REQ
callback is invoked.

> What do you think? Do you have a better program to try, or do we want to
> trust Iperf results?

Can netperf be used to test this scenario?  I'm not that familiar with
either of them to tell you the truth.

In any case, do you agree about the CEP leak in the passive side
reject case?  Can I check this in?  I'd like to check it all in
because it *seems* like the right thing to do to me, but I don't want
to introduce too much thrash in WSD at this point without getting some
buy-in.

Thanks,

- Fab




More information about the ofw mailing list