[Openib-windows] Running WSD tests

Tue May 23 12:02:13 PDT 2006

 See bellow.

Thanks
Tzachi

> -----Original Message-----
> From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com] 
> On Behalf Of Fabian Tillier
> Sent: Monday, May 22, 2006 8:53 PM
> To: Tzachi Dar
> Cc: openib-windows at openib.org
> Subject: Re: [Openib-windows] Running WSD tests
> 
> Hi Tzachi,
> 
> On 5/21/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
> >
> > Hi Fab,
> >
> > While running tests on WSD I have came to conclusions that 
> there are 
> > still some problems in different areas of the product. Here 
> are my findings:
> >
> > 1) If opensm is killed and than restarted again WSD won't work the 
> > reason is that the previous registration is not cancelled. Once the 
> > new SM is started, there is a call to ipoib_reg_addrs 
> however in line 2350, there is a check:
> >
> >   if( p_addr_item->p_reg1)
> >    continue;
> > This check always succeeds and therefore the ip is not 
> re-registared 
> > with the new opensm
> 
> You're right - when the SM reregister event comes, we need to 
> deregister all addresses.  I'll put something together for this.
> 
I have made some experiments with this and it seems that adding the
following 3 lines 

	cl_obj_lock( &p_port->p_adapter->obj );
	ipoib_dereg_addrs( p_port->p_adapter );
	cl_obj_unlock( &p_port->p_adapter->obj );

To the function ipoib_port_down just after __endpt_mgr_reset_all( p_port
); (line 4915) solves the problem and doesn't add instabilities. So I'll
be thankful if you can check this in.

> > 2) The second thing that I'm trying to reach is have some mechanism 
> > that will prevent my tests from running if they are not running on 
> > WSD. Actually, I don't care that they will run, but I want to have 
> > some mechanism that I will know for sure where the tests 
> are running. 
> > I thought of some options but didn't get to a real 
> solution. Here is what I thought about:
> >
> > a) Use the IBWSD_NO_IPOIB environment variable. The main 
> problem with 
> > this is that although it helps in some of the cases it 
> doesn't always 
> > work. Some examples are the provider is not installed (on 
> our side, or 
> > the remote side), there is no application running on the 
> remote side 
> > and more. Main problem starts when the connect succeeds but 
> there is 
> > an error later. I thought of being some what violent and kill the 
> > process in this case (using exit).
> 
> Is this something you expect to use in a production 
> environment, or just for testing purposes?  It seem that many 
> of the things you want to trap are configuration errors on 
> the part of the user, and I'm not sure that trapping them in 
> WSD is the right thing to do.  There are ways to detect 
> whether the provider is installed (by listing the providers 
> like installsp -l).
> 
> > b) use GetSockOpt, WSAIoctl, I wasn't able to find anything 
> that will 
> > help me in solving the problems as this functions never reach the 
> > IBWSD dll. Are you aware of any option that is answered in 
> a different 
> > way if we are running on IPOIB or on WSD?
> 
> I am not, but somehow the WHQL tests can figure out if a 
> connected socket is over WSD or not.  I'll see if I can get 
> information about how to do this, as this seems like exactly 
> what you want to do.
> 
> > c) Probably best way: Have two new counters in the performance 
> > counters that will tell the user how many sockets have passed to 
> > connected mode and how many have passed from connected to 
> > disconnected. If the test will check it's situation in the 
> beginning 
> > and in the end it will be able to tell were it was running. 
> I also think that this counters are needed in any case.
> 
> That's not really a performance counter.  However, adding 
> counters that measure connection rate would be beneficial, 
> and then simple logging using perfmon would show whether 
> connections went over WSD or not.  It sounds like you really 
> want a way to detect if a socket is over WSD or not (b, but 
> cleaner).  If we can do that it would be much better IMO.
I agree that (b) is the best way if we can find a way to do it. I have
(temporarily) added some shared memory to test if my connections are
going over WSD or ipoib and was surprised to get a result like this:
19982 of 20,000 connections went through WSD. This means that
configuration was OK, but things were going in a strange way.
In any case, if we can't do (b), we will have to add another counter.

> > 3) The third issue is about __ipoib_ats_dereg_cb. It seems that 
> > although this call back doesn't do much there is always a 
> chance that 
> > until it is issued the driver will come down, and we will 
> have a blue screen.
> 
> IBAL will ensure that this can't happen - ib_close_al is a 
> blocking call and will not return until all callbacks have 
> finished unwinding.
> Reference counting in the callback doesn't help since the 
> driver could be unloaded as soon as the reference count is 
> decremented, but before the callback has unwound.
OK, I see that there is no problem here.

> - Fab
> 
> 
>