[Openib-windows] RE: Connection rate of WSD

Erez Cohen erezc at mellanox.co.il
Tue Jun 6 08:28:11 PDT 2006


 
> I've asked about RNR handling in the HCA several times now without
getting any answer - is RNR broken in the current HCA implementations?

No. The RNR mechanism works by the spec for all Mellanox HCAs.

Best regards, 

Erez Cohen
Field Application and Support Engineer
Mellanox Technologies Ltd. 
Tel : + 972 - 4 - 9097200 ext 378
Cell : + 972 - 54 - 5468801
Fax : + 972 - 4 - 9593245
www.mellanox.com

-----Original Message-----
From: openib-windows-bounces at openib.org
[mailto:openib-windows-bounces at openib.org] On Behalf Of Fabian Tillier
Sent: Monday, June 05, 2006 10:28 PM
To: Tzachi Dar
Cc: openib-windows at openib.org
Subject: Re: [Openib-windows] RE: Connection rate of WSD

Hi Tzachi,

On 6/5/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
> Hi Fab,
> 1) Please see bellow my answers. I still don't see how playing with 
> the rnr timeout will solve the problem.

RNR handles the case where a send is sent before a matching receive is
posted.  This is exactly the situation we're trying to handle.

Is there some problem with RNR handling in the HCAs?  The RNR situation
should not be that common, happening only durring connection
establishment on a very busy system.  The inefficiency of RNR on the
wire is worth the savings in extra complexity in the code.

I've asked about RNR handling in the HCA several times now without
getting any answer - is RNR broken in the current HCA implementations?

> 2) The way we see it there are two possible answers. A - play with the

> cm . This will slow connection establishment, but gives some more 
> freedom (the CM is in software). Please also note that as for timeouts

> the CM has another message MRA (more processing required) which gives 
> us exactly the freedom to do what we want. We answer that we received 
> the request and still thinking what to do. So this is a timeout free 
> solution.
> As for the other solution: posting the first receive this has the 
> advantage that we follow the WSD spec. As for the latency introduced: 
> I believe that we can add another variable that will tell if the first

> buffer was already received correctly. On the buffer complete side the

> first action will be to check if the first receive was already
handled.
> Only if not, it will take the lock and do the complex thing. As a 
> result I believe that the latency introduced on most of the buffers 
> will only be an if statement latency, which is quiet small.
>
> What do you think?

I would rather not add code if we can avoid it.  RNR should work for
this, unless RNR is broken in the HCAs.

Maybe I don't understand - you say that a 40ms RNR retry is too long,
yet you follow with saying that it may be seconds before the switch
posts its receive.  If it's seconds, the RNR retry should be just fine.

Anyhow, as you pointed out, sending an MRA doesn't help the connection
rate, so it's not really any better than using RNR.  I'm still
considering the buffering thing, but need to find a solution that will
be streamlined and clean.  I'm weary of making significant changes at
this point, since the RNR solution is functional.

Did you test with the updated code to see if the connection delay is
reduced?  What was the outcome?

Thanks,

- Fab

_______________________________________________
openib-windows mailing list
openib-windows at openib.org
http://openib.org/mailman/listinfo/openib-windows




More information about the ofw mailing list