[Openib-windows] RE: Geting remote and locale ip addresses - the functionality

Fab Tillier ftillier at silverstorm.com
Wed Sep 7 15:32:49 PDT 2005


> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> Sent: Wednesday, September 07, 2005 1:24 PM
> 
> >From: Fab Tillier [mailto:ftillier at silverstorm.com]
> >Sent: Wednesday, September 07, 2005 9:01 PM
> >
> >> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> >> Sent: Wednesday, September 07, 2005 2:46 AM
> >>
> >> The second problem is getting the remote GID from the remote IP. This
> >> information is only known to the IPOIB module. After doing an arp (can be
> >> done easily from user mode and we should have the IP translated to the
> >> remote MAC). To my understanding, all information should already be in
> >> the IPOIB driver as an "endpt" (the function __endpt_mgr_ref should
> >> return it immediately).
> >
> >An alternative to adding IP support to IPoIB is to use the network stack to
> >get us from IP to Ethernet MAC, and then perform a lookup based on that.
> >It is a bit more cumbersome, but certainly possible.
> 
> I believe that this is the straight forward way and this is what should be
> used. Since there is a function that converts the remote IP to a mac
addresses,
> I believe that we should use it and do the remote query based on this arp
> addresses.

I think this makes sense.  We'll just have to do some work with the headers in
the DDK to allow user-mode clients to use the IP Helper functions.  This is
probably minimal.  I think it's just a single header that's missing.  Now, if we
decide to depend on the IP helper library, should we just use that to enumerate
IP addresses rather than providing our own mechanism?

> >> This also rises the question of doing it sync or non sync
> >> (which will make things even more complicated).
> >
> >Assuming we're dealing with IRPs here and not a direct call interface, sync
> >or async is trivial - all we need to do is follow IRP processing rules.  If
> >the request is going to take some time we need to mark the IRP pending and
> >return (this is critical to allow clients to call this at DISPATCH_LEVEL).
> >Once the request completes, complete the IRP.  It is then up to the caller
> >to decide whether to block waiting for a completion or use I/O completion
> >notifications.
> >
> >I would expect that for lookups into the IPoIB adapter's internal cache,
> >the operations would pretty much always complete immediately.  If we ever
> >add logic in IPoIB to send ARPs to resolve missing entries, asynchronous
> >processing will likely be necessary.  In any case, the only requirement
> >on us is that we don't perform any blocking operations in any of the IOCTL
> >paths.
> 
> The function that will convert the remote IP to the remote mac will very
> likely do the arp by itself, so I don't expect any blocking operation.

I think that's right.  Regardless, any IRP handling we add must support kernel
clients making requests at DISPATCH.  That's really my only firm requirement.

> >> So to summarize:
> >> 1)     Get all locale ip addresses, We can do it or not do it, I
> >> recommend to do it in the IPOIB. As the information changes, it would
> >> be better to create some notification mechanism, I suggest not
> >> implementing this mechanism at start.
> >
> >IPoIB already has this information, so I agree we should provide a way to
> >get to it - the WSD provider certainly needs it.  We don't need to
> >implement any notification mechanisms for when the addresses change,
> >as the existing stack already handles that.  Specifically, it doesn't
> >matter for DAPL case because DAPL is static.  For WSD, the switch already
> >polls the providers when it detects that an update is needed.
> >
> >I think we probably want two calls here.  The first would return all CA and
> >port GUIDs in use by IPoIB, with no input parameter.  The output would be
> >something like an array <CA GUID, PORT NUMBER, PORT GUID> tupples.  This
> >allows a client to open the CA (using the CA GUID), create a QP bound to
> >the proper port (using the port number), and query for IP addresses (using
> >the port GUID).
> >
> >The second call would take as input a port GUID, and return an array of IP
> >addresses assigned to that port.  The array entries should probably be 16
> >bytes to accommodate IPv6 addresses, and use the standard method of storing
> >IPv4 addresses (prefix with zero).
> >
> >Does that make sense?
> 
> This does make sense, however there is one thing that we might consider
> changing: it seems to me that the application will first do the first part and
> later do a query on all ports to get their IP's. Therefore I believe that we
> can have one function that returns all this information and save some time
> passing from user mode to kernel. I'm not really sure if that code would look
> better.

I initially thought a single call would work well.  It does complicate the IOCTL
output buffer structure, but is certainly something that can be done.  The
complications come from the fact that the "records" in the output buffer are no
longer fixed size, so the size of each record (and/or offset to the next record)
must be explicitly stated as part of the record:

struct IPOIB_AT_PORT_RECORD
{
	ULONG				Version;
	ULONG				Size; //!< Total size, including IP
addresses.
	ULONG				OffsetNextPort; //!< From start of
structure.
	ULONG				NumIps;
	UINT64			PortGuid;
	int_addr			IpAddr[1];
};

struct IPOIB_AT_CA_RECORD
{
	ULONG				Version;
	ULONG				Size; //!< Total size, including
subrecords.
	ULONG				OffsetNextCa; //!< From start of
structure.
	ULONG				NumPorts;
	UINT64			CaGuid;
	IPOIB_AT_PORT_RECORD	FirstPort;
};

struct IPOIB_AT_GET_LOCAL_IP_OUT
{
	ULONG				Version;
	ULONG				Size; //!< Total size, including
subrecords.
	IPOIB_AT_CA_RECORD	FirstCa;
};

I don't know if you would be able to measure the difference between performing
multiple simple IOCTLs versus performing a single complicated IOCTL.  Further,
the multiple IOCTL approach allows a user to only request IP information for a
particular port - in case they don't care about any of the other ports.  Note
that the version fields can probably be eliminated but won't result in smaller
records due to padding.  The size fields could also be eliminated, but would
result in extra processing when filling the buffer.

Personally I think the multi-IOCTL method will be simpler to implement while
providing the client with extra flexibility.

> >> 2)     Get remote IP - we must implement.
> >
> >Assuming this takes a source and destination IP as input and returns a
> >source and destination GID as output, I agree.
> >
> >How does something like this for the IOCTL input and output buffers sound:
> >
> >struct _IPOIB_AT_IN
> >{
> >       UCHAR           SrcIp[16];
> >       UCHAR           DstIp[16];
> >};
> >struct _IPOIB_AT_OUT
> >{
> >       GID             SrcGid;
> >       GID             DstGid;
> >};
>
> As I said before I believe that the functionality that we are really looking
> for is the remote mac addresses, translated into a remote lid. (This is the
> information that we have). So the interface should look like:
> struct _IPOIB_AT_IN
> {
>         UCHAR           DstMac[6]; // Do we want this longer ?
> };
> struct _IPOIB_AT_OUT
> {
>         GID             DstGid;
> };
> Does this make sense?

Do we need to specify as part of the input buffer which instance to perform the
lookup for?  Or will the destination MAC address always be unique, regardless of
how the fabric is configured?  This depends on the algorithm used to generate
the LAA MAC addresses reported to the OS.  I think right now they are unique.

Also, how does a client get the source GID?

It might make sense to pass in the local port GUID (or CA GUID and port number)
to better qualify the request and provide the source GID as output.

Thoughts?

- Fab




More information about the ofw mailing list