[Openib-windows] Running WSD tests
Tzachi Dar
tzachid at mellanox.co.il
Tue May 23 12:02:13 PDT 2006
See bellow.
Thanks
Tzachi
> -----Original Message-----
> From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com]
> On Behalf Of Fabian Tillier
> Sent: Monday, May 22, 2006 8:53 PM
> To: Tzachi Dar
> Cc: openib-windows at openib.org
> Subject: Re: [Openib-windows] Running WSD tests
>
> Hi Tzachi,
>
> On 5/21/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
> >
> > Hi Fab,
> >
> > While running tests on WSD I have came to conclusions that
> there are
> > still some problems in different areas of the product. Here
> are my findings:
> >
> > 1) If opensm is killed and than restarted again WSD won't work the
> > reason is that the previous registration is not cancelled. Once the
> > new SM is started, there is a call to ipoib_reg_addrs
> however in line 2350, there is a check:
> >
> > if( p_addr_item->p_reg1)
> > continue;
> > This check always succeeds and therefore the ip is not
> re-registared
> > with the new opensm
>
> You're right - when the SM reregister event comes, we need to
> deregister all addresses. I'll put something together for this.
>
I have made some experiments with this and it seems that adding the
following 3 lines
cl_obj_lock( &p_port->p_adapter->obj );
ipoib_dereg_addrs( p_port->p_adapter );
cl_obj_unlock( &p_port->p_adapter->obj );
To the function ipoib_port_down just after __endpt_mgr_reset_all( p_port
); (line 4915) solves the problem and doesn't add instabilities. So I'll
be thankful if you can check this in.
> > 2) The second thing that I'm trying to reach is have some mechanism
> > that will prevent my tests from running if they are not running on
> > WSD. Actually, I don't care that they will run, but I want to have
> > some mechanism that I will know for sure where the tests
> are running.
> > I thought of some options but didn't get to a real
> solution. Here is what I thought about:
> >
> > a) Use the IBWSD_NO_IPOIB environment variable. The main
> problem with
> > this is that although it helps in some of the cases it
> doesn't always
> > work. Some examples are the provider is not installed (on
> our side, or
> > the remote side), there is no application running on the
> remote side
> > and more. Main problem starts when the connect succeeds but
> there is
> > an error later. I thought of being some what violent and kill the
> > process in this case (using exit).
>
> Is this something you expect to use in a production
> environment, or just for testing purposes? It seem that many
> of the things you want to trap are configuration errors on
> the part of the user, and I'm not sure that trapping them in
> WSD is the right thing to do. There are ways to detect
> whether the provider is installed (by listing the providers
> like installsp -l).
>
> > b) use GetSockOpt, WSAIoctl, I wasn't able to find anything
> that will
> > help me in solving the problems as this functions never reach the
> > IBWSD dll. Are you aware of any option that is answered in
> a different
> > way if we are running on IPOIB or on WSD?
>
> I am not, but somehow the WHQL tests can figure out if a
> connected socket is over WSD or not. I'll see if I can get
> information about how to do this, as this seems like exactly
> what you want to do.
>
> > c) Probably best way: Have two new counters in the performance
> > counters that will tell the user how many sockets have passed to
> > connected mode and how many have passed from connected to
> > disconnected. If the test will check it's situation in the
> beginning
> > and in the end it will be able to tell were it was running.
> I also think that this counters are needed in any case.
>
> That's not really a performance counter. However, adding
> counters that measure connection rate would be beneficial,
> and then simple logging using perfmon would show whether
> connections went over WSD or not. It sounds like you really
> want a way to detect if a socket is over WSD or not (b, but
> cleaner). If we can do that it would be much better IMO.
I agree that (b) is the best way if we can find a way to do it. I have
(temporarily) added some shared memory to test if my connections are
going over WSD or ipoib and was surprised to get a result like this:
19982 of 20,000 connections went through WSD. This means that
configuration was OK, but things were going in a strange way.
In any case, if we can't do (b), we will have to add another counter.
> > 3) The third issue is about __ipoib_ats_dereg_cb. It seems that
> > although this call back doesn't do much there is always a
> chance that
> > until it is issued the driver will come down, and we will
> have a blue screen.
>
> IBAL will ensure that this can't happen - ib_close_al is a
> blocking call and will not return until all callbacks have
> finished unwinding.
> Reference counting in the callback doesn't help since the
> driver could be unloaded as soon as the reference count is
> decremented, but before the callback has unwound.
OK, I see that there is no problem here.
> - Fab
>
>
>
More information about the ofw
mailing list