[Openib-windows] RE: Connection rate of WSD

Tzachi Dar tzachid at mellanox.co.il
Thu Jun 8 14:52:00 PDT 2006


 

> -----Original Message-----
> From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com] 
> On Behalf Of Fabian Tillier
> Sent: Thursday, June 08, 2006 3:28 AM
> To: Tzachi Dar
> Cc: openib-windows at openib.org
> Subject: Re: [Openib-windows] RE: Connection rate of WSD
> 
> Hi Tzachi,
> 
> On 6/6/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
> > Hi Fab,
> >
> > Here are the results of my testing of the fix that you have 
> suggested 
> > (using RNR):
> > [Just to give some background, here are some results:
> >        For 100mb Ethernet 1670 CPS (connections per second)
> >        For IPOIB 1950 CPS
> > ]
> > 1) The patch that you have sent to the MTHCA , works well 
> and will be 
> > applied today by Leonid to open IB.
> > 2) The patch to the mt23108: The patch seems good, but we 
> don't have 
> > the resource to test it. Fill free to do the necessary 
> tests and check 
> > it in.
> 
> I checked this in revision 377.
> 
> > 3) Maximum connection rate that I was able to achieve was 
> 50 CPS. (In 
> > all ways).
> 
> I remember getting around 120 or so CPS with the WHQL 
> connection rate test.  I haven't tried this in a while, but 
> things in connection management haven't changed so it should 
> be similar.  I suspect the SM has a fairly significant part 
> in the CPS rate, as each connection issues two SA queries - 
> one for the service record to resolve the destination IP to 
> GID, the other for the path.
My test is having one thread that is connecting. Perhaps the WHQL 
have more than one, and indeed this might be related to opensm.
> > 4) Using the fix that I have suggested, connection rate is 
> down to 33 
> > CPS.
> > 5) To my understanding, setting the retry count to 7 is not good 
> > enough as the remote side might never post the buffers, and 
> there will 
> > be no timeout to the sender, so the retry count should be set to 6.
> 
> Ok.
> 
> > 6) Here are some results that I got from different timeout values:
> > RNR Time Constant               CPS
> > 0x6                             50
> > 0x8                             50
> > 0x10                            45
> > 0x12                            39
> > 0x14                            32
> > 0x18                            20
> > >From my experience, setting the value to 6 means that we 
> get RNR NACK
> > exhausted errors even in an un stable system.
> >
> > As a result I suggest to set the values to somewhere 
> between 8 to 0x10.
> > I wasn't able to create a test that loads the system so much that 8 
> > wasn't good but 0x10 is.
> 
> Ok, I've picked 8.  Did you try with a value of 8 and the RNR 
> retry set to 6?
I'll do some long runs and see if a value of 8 is good enough or we
should pick something bigger.


> > So please pick a value and check it in.
> 
> Done, in revision 380.
> 
> > To my understanding the fact that our connection rate is so 
> low means 
> > that we must continue and analyze the problem with a goal 
> of 2000 CPS.
> > Once we will be near this target, we will continue looking at this.
> 
> There are a number of things we can do to help the connection rate.
> The maximum connection rate that I've seen at the IB CM level 
> is 1500, this is just the CM protocol + QP transitions (this 
> was CPU bound, so a faster CPU should help - you can use 
> cmtest to see what the rate is on your setup).  I don't know 
> how the MTHCA driver compares to the
> MT23108 driver in terms of QP transition times - I would hope 
> they're identical as the time should come from the hardware.  
> Is this something you could look into?  If MTHCA QP 
> transitions are slower, we should try to find out why.

We are currently trying to deal with performance of the data path and 
correctness of the rest, so I believe that we will not be looking at 
this in the coming future. Once things will get more stable, we will 
try to see how to improve this as well.


> So the WSD connection establishment process is ~30x slower.
> 
> Adding an SA cache for path records would likely help the WSD 
> CPS tremendously, though the ramifications of a false cache 
> hit would need to be analyzed.
> 
> Finding a way to do an ARP cache lookup without actually 
> sending the ARP would help too.  Last I looked, in my test 
> environment, using SendARP was actually slowing down CPS as 
> compared to doing the service record query.  If the ARP 
> wasn't actually sent, and we didn't have to wait for a 
> response, I'm sure this solution would be beneficial.  I 
> think there are calls in Vista/Longhorn for doing this, but 
> they're not in 2003.
> 
> Decoupling the QP transitions from the CM protocol may help 
> too.  This coupled with making QP transitions asynchronous 
> would most be ideal, as a server could, using a single 
> thread, keep more than one connection establishing at a time 
> - right now when ib_cm_rep is called, the QP is transitioned 
> synchronously through its states.
> There's only one thread per CEP that can process incoming 
> requests in UAL, so connection establishmen is effectively 
> serialized for user-mode clients.
>
Technically speaking you are right. Still, we will have to look
at the amount of work that this requires (actually rewriting IBAL)
as well as changes to the low level driver and decide when is the 
right time to do this. 




 
> - Fab
> 
> 
> 




More information about the ofw mailing list