[Openib-windows] Running WSD tests
Fabian Tillier
ftillier at silverstorm.com
Mon May 22 10:53:15 PDT 2006
Hi Tzachi,
On 5/21/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
>
> Hi Fab,
>
> While running tests on WSD I have came to conclusions that there are still
> some problems in different areas of the product. Here are my findings:
>
> 1) If opensm is killed and than restarted again WSD won't work the reason is
> that the previous registration is not cancelled. Once the new SM is started,
> there is a call to ipoib_reg_addrs however in line 2350, there is a check:
>
> if( p_addr_item->p_reg1)
> continue;
> This check always succeeds and therefore the ip is not re-registared with
> the new opensm
You're right - when the SM reregister event comes, we need to
deregister all addresses. I'll put something together for this.
> 2) The second thing that I'm trying to reach is have some mechanism that
> will prevent my tests from running if they are not running on WSD. Actually,
> I don't care that they will run, but I want to have some mechanism that I
> will know for sure where the tests are running. I thought of some options
> but didn't get to a real solution. Here is what I thought about:
>
> a) Use the IBWSD_NO_IPOIB environment variable. The main problem with this
> is that although it helps in some of the cases it doesn't always work. Some
> examples are the provider is not installed (on our side, or the remote
> side), there is no application running on the remote side and more. Main
> problem starts when the connect succeeds but there is an error later. I
> thought of being some what violent and kill the process in this case (using
> exit).
Is this something you expect to use in a production environment, or
just for testing purposes? It seem that many of the things you want
to trap are configuration errors on the part of the user, and I'm not
sure that trapping them in WSD is the right thing to do. There are
ways to detect whether the provider is installed (by listing the
providers like installsp -l).
> b) use GetSockOpt, WSAIoctl, I wasn't able to find anything that will help
> me in solving the problems as this functions never reach the IBWSD dll. Are
> you aware of any option that is answered in a different way if we are
> running on IPOIB or on WSD?
I am not, but somehow the WHQL tests can figure out if a connected
socket is over WSD or not. I'll see if I can get information about
how to do this, as this seems like exactly what you want to do.
> c) Probably best way: Have two new counters in the performance counters that
> will tell the user how many sockets have passed to connected mode and how
> many have passed from connected to disconnected. If the test will check it's
> situation in the beginning and in the end it will be able to tell were it
> was running. I also think that this counters are needed in any case.
That's not really a performance counter. However, adding counters
that measure connection rate would be beneficial, and then simple
logging using perfmon would show whether connections went over WSD or
not. It sounds like you really want a way to detect if a socket is
over WSD or not (b, but cleaner). If we can do that it would be much
better IMO.
> 3) The third issue is about __ipoib_ats_dereg_cb. It seems that although
> this call back doesn't do much there is always a chance that until it is
> issued the driver will come down, and we will have a blue screen.
IBAL will ensure that this can't happen - ib_close_al is a blocking
call and will not return until all callbacks have finished unwinding.
Reference counting in the callback doesn't help since the driver could
be unloaded as soon as the reference count is decremented, but before
the callback has unwound.
- Fab
More information about the ofw
mailing list