[ofw] RE: NetworkDirect over WinVerbs

Sean Hefty sean.hefty at intel.com
Mon Feb 9 17:16:03 PST 2009


>I wanted to start a discussion about how resources would be mapped in order to
>provide NetworkDirect support over WinVerbs.

I can create a code framework for this if it helps.

>In NetworkDirect, the INDAdapter interface represents an underlying hardware
>device.  All further resources are created at this level.  This is different
>from WinVerbs in that IWVConnectEndpoint objects are created at the provider
>level and not the device level.  Similarly, the file object used for overlapped
>operations is at the IWVProvider level and not the INDAdapter level.  Thus, the
>INDAdapter object needs to map to an IWVProvider, a IWVDevice, as well as a
>IWVProtectionDomain (for Register/DeregisterMemory.)

This seems correct.

>The INDProvider object would be at a higher level, though I don't see how one
>gets a list of supported IP addresses from WinVerbs.

You could obtain a list of addresses available in the system and use
IWVProvider::TranslateAddress() to discover which ones map to RDMA devices.

>Beyond this, I believe the mappings are pretty straight forward:
>
>INDListen -> IWVConnectEndpoint
>INDConnector -> IWVConnectEndpoint
>INDEndpoint -> IWVConnectQueuePair
>INDCompletionQueue -> IWVCompletionQueue
>INDMemoryWindow -> no mapping, MWs are emulated

I did not implement MWs, but that could be added.  Are the MW interfaces
sufficient?

>The only obvious issue I see at this point are:
>- that IWVDevice::RegisterMemory fills a user-provided WV_MEMORY_KEYS structure
>upon success.  This works fine on 64-bit systems as the ND_MR_HANDLE is 8 bytes
>and casting will do the trick, but will fail for 32-bit applications, where
>ND_MR_HANDLE is only 4 bytes.  Allocating a structure in RegisterMemory would
>leak that structure if RegisterMemory fails asynchronously to the user.  The
>LKey could be returned as the ND_MR_HANDLE, in which case there would need to
>be a way to get the RKey given an LKey which doesn't seem to exist at this
>point.

Leaking an 8 byte structure on a 32-bit system if registration fails probably
isn't a big deal.  If needed, you could track the structures with the PD and
release them when destroying the PD.

Does ND ever use the Rkey from memory registration?  I thought we discussed the
possibility of having Winverbs only return the Lkey if the memory registration
access rights did not include remote access.  (The winverbs implementation
doesn't do this, but it wouldn't be hard to change.)

>- that disconnection mappings are a bit funny.
>IWVConnectEndpoint::NotifyDisconnect will return in error when the endpoint
>gets disconnected (STATUS_CONNECTION_DISCONNECTED), or when the DREQ times out
>(STATUS_TIMEOUT).  It seems a bit unnatural that a user calling Disconnect
>would cause a NotifyDisconnect request to timeout.

The error code can change, but the time out of the DREQ does indicate that a
disconnect message was not received from the remote side.

>The INDConnector::Disconnect call looks like it should map first to a call to
>IWVConnectEndpoint::NotifyDisconnect, followed by a call to
>IWVConnectEndpoint::Disconnect.

You may want to map INDConnector::NotifyDisconnect() map to
IWVConnectEndpoint::NotifyDisconnect().  The INDConnector::Disconnect() call is
wanting to map to EP::Disconnect() and QP::Modify().  I.e. the ND disconnect
overlapped operation is a result of asynchronously wanting to modify the QP to
error, and not the actual disconnect.

>INDConnector::Disconnect is expected to flush all pending requests from the
>associated INDEndpoint when Disconnect completes (QP transition to ERROR).  The
>transition to error can't happen when the DREQ is sent only when it either
>times out or a DREP is received.

I thought transitioning to error was okay when sending the DREQ, just not when
receiving it.  The user should ensure that all previously posted sends completed
before calling disconnect.

- Sean




More information about the ofw mailing list