[ofw] RE: NetworkDirect over WinVerbs

Sean Hefty sean.hefty at intel.com
Wed Feb 11 03:48:38 PST 2009


>ND:Disconnect is supposed to disconnect and flush all outstanding requests.  It
>won't be called by msmpi unless the close handshake has been performed.
>
>With ND, the following logic works on both active and passive sides if the app
>has a close handshake (which msmpi does, hence why it doesn't use
>NotifyDisconnect):
>
>1. user handshake to disconnect
>2. user calls Disconnect()

It sounds like you're trying to do peer-to-peer in-band disconnect handshake
with an out-of-band disconnect.  Winverbs is designed to do the disconnect
handshake using the disconnect calls, rather than posting sends to the QP
directly.  Although the latter is supported.

>ND::Disconnect is always called in the context of the user's thread.  In any
>case, it doesn't matter - the output buffer will always be provided in the
>context of the user's thread, but there's no requirement for the modification
>to be done in the context of the user's thread.  I'll even assert that
>requiring QP modification to be done in the user's thread context limits
>scalability and is a lousy design.

The OFED stack has blocking calls, QP modify is done entirely in the user's
thread, every modify operation requires an IOCTL, and it still scales to
hundreds of thousands of connections.  Plus the synchronous operation makes it
easier to program to.  I'll assert that your assertion is bogus. :b

Deferring the modify call to an arbitrary thread greatly complicates the
locking, destruction handling, and device removal.  You're overlooking some
vital implementation details that not only affect scalability, but interface
viability.  The modify call is initiated in the user's thread - it could
complete at a later time if anything supported a non-blocking modify call, but
nothing does, and I don't see any indications that that will change.  The
alternative is to block the CM thread while modify completes, which has a direct
effect on the ability to process connections in a timely manner.

>You don't need them to reference one another at all.  In fact you should avoid
>it.  But there's no reason both the QP handle and the CEP handle can't be
>provided in the same IOCTL, that first performs the CEP operation, and when
>that completes, performs the QP operation.  The two objects are still
>independent, the IOCTL just has multiple phases, with each phase operating on a
>different object.

Notification of the CEP operation completing means that winverbs is now in an
arbitrary thread context.  The QP could have been destroyed or be in the process
of being destroyed.  The device could be going away.  Handling this is
non-trivial.  The locking now has to change to accommodate the callback thread
context.  Device removal now has to worry that a CM thread may be running when
it's trying to release hardware resources associated with a QP that may or may
not be associated with a CEP.

The only issue with winverbs is that it doesn't invoke the IB protocol in the
way that ND is trying to mandate.

What would be nice is a way to direct an operation on a file to use or not use
an I/O completion port on a per operation basis, or allow a file to support
multiple I/O completion ports.

- Sean




More information about the ofw mailing list