[Openib-windows] RE: Geting remote and locale ip addresses - the functionality

Tzachi Dar tzachid at mellanox.co.il
Thu Sep 8 08:11:41 PDT 2005


I believe that we are closing on the interface, please let me know if there
are still things that you don't agree about. 

Thanks
Tzachi

>-----Original Message-----
>From: Fab Tillier [mailto:ftillier at silverstorm.com]
>Sent: Thursday, September 08, 2005 1:33 AM
>To: 'Tzachi Dar'; openib-windows at openib.org
>Subject: RE: Geting remote and locale ip addresses - the functionality
>
>> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
>> Sent: Wednesday, September 07, 2005 1:24 PM
>>
>> >From: Fab Tillier [mailto:ftillier at silverstorm.com]
>> >Sent: Wednesday, September 07, 2005 9:01 PM
>> >
>> >> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
>> >> Sent: Wednesday, September 07, 2005 2:46 AM
>> >>
>> >> The second problem is getting the remote GID from the remote IP. This
>> >> information is only known to the IPOIB module. After doing an arp (can
>be
>> >> done easily from user mode and we should have the IP translated to the
>> >> remote MAC). To my understanding, all information should already be in
>> >> the IPOIB driver as an "endpt" (the function __endpt_mgr_ref should
>> >> return it immediately).
>> >
>> >An alternative to adding IP support to IPoIB is to use the network stack
>to
>> >get us from IP to Ethernet MAC, and then perform a lookup based on that.
>> >It is a bit more cumbersome, but certainly possible.
>>
>> I believe that this is the straight forward way and this is what should
>be
>> used. Since there is a function that converts the remote IP to a mac
>addresses,
>> I believe that we should use it and do the remote query based on this arp
>> addresses.
>
>I think this makes sense.  We'll just have to do some work with the headers
>in
>the DDK to allow user-mode clients to use the IP Helper functions.  This is
>probably minimal.  I think it's just a single header that's missing.  Now,
>if we
>decide to depend on the IP helper library, should we just use that to
>enumerate
>IP addresses rather than providing our own mechanism?
>
It seems that we will have to ask people to install the platform SDK
(royalties free) since these headers should be used also by the SDP user
mode component and MPI. An alternative to this is to reverse engineer this
component and create some header ourselves.
Unfortunately the IP helper API require a lot of work to get everything
done. 



>> >> This also rises the question of doing it sync or non sync
>> >> (which will make things even more complicated).
>> >
>> >Assuming we're dealing with IRPs here and not a direct call interface,
>sync
>> >or async is trivial - all we need to do is follow IRP processing rules.
>If
>> >the request is going to take some time we need to mark the IRP pending
>and
>> >return (this is critical to allow clients to call this at
>DISPATCH_LEVEL).
>> >Once the request completes, complete the IRP.  It is then up to the
>caller
>> >to decide whether to block waiting for a completion or use I/O
>completion
>> >notifications.
>> >
>> >I would expect that for lookups into the IPoIB adapter's internal cache,
>> >the operations would pretty much always complete immediately.  If we
>ever
>> >add logic in IPoIB to send ARPs to resolve missing entries, asynchronous
>> >processing will likely be necessary.  In any case, the only requirement
>> >on us is that we don't perform any blocking operations in any of the
>IOCTL
>> >paths.
>>
>> The function that will convert the remote IP to the remote mac will very
>> likely do the arp by itself, so I don't expect any blocking operation.
>
>I think that's right.  Regardless, any IRP handling we add must support
>kernel
>clients making requests at DISPATCH.  That's really my only firm
>requirement.
>
You will be able to do the lookup at DISPATCH level.

>> >> So to summarize:
>> >> 1)     Get all locale ip addresses, We can do it or not do it, I
>> >> recommend to do it in the IPOIB. As the information changes, it would
>> >> be better to create some notification mechanism, I suggest not
>> >> implementing this mechanism at start.
>> >
>> >IPoIB already has this information, so I agree we should provide a way
>to
>> >get to it - the WSD provider certainly needs it.  We don't need to
>> >implement any notification mechanisms for when the addresses change,
>> >as the existing stack already handles that.  Specifically, it doesn't
>> >matter for DAPL case because DAPL is static.  For WSD, the switch
>already
>> >polls the providers when it detects that an update is needed.
>> >
>> >I think we probably want two calls here.  The first would return all CA
>and
>> >port GUIDs in use by IPoIB, with no input parameter.  The output would
>be
>> >something like an array <CA GUID, PORT NUMBER, PORT GUID> tupples.  This
>> >allows a client to open the CA (using the CA GUID), create a QP bound to
>> >the proper port (using the port number), and query for IP addresses
>(using
>> >the port GUID).
>> >
>> >The second call would take as input a port GUID, and return an array of
>IP
>> >addresses assigned to that port.  The array entries should probably be
>16
>> >bytes to accommodate IPv6 addresses, and use the standard method of
>storing
>> >IPv4 addresses (prefix with zero).
>> >
>> >Does that make sense?
>>
>> This does make sense, however there is one thing that we might consider
>> changing: it seems to me that the application will first do the first
>part and
>> later do a query on all ports to get their IP's. Therefore I believe that
>we
>> can have one function that returns all this information and save some
>time
>> passing from user mode to kernel. I'm not really sure if that code would
>look
>> better.
>
>I initially thought a single call would work well.  It does complicate the
>IOCTL
>output buffer structure, but is certainly something that can be done.  The
>complications come from the fact that the "records" in the output buffer
>are no
>longer fixed size, so the size of each record (and/or offset to the next
>record)
>must be explicitly stated as part of the record:
>
>struct IPOIB_AT_PORT_RECORD
>{
>	ULONG				Version;
>	ULONG				Size; //!< Total size, including IP
>addresses.
>	ULONG				OffsetNextPort; //!< From start of
>structure.
>	ULONG				NumIps;
>	UINT64			PortGuid;
>	int_addr			IpAddr[1];
>};
>
>struct IPOIB_AT_CA_RECORD
>{
>	ULONG				Version;
>	ULONG				Size; //!< Total size, including
>subrecords.
>	ULONG				OffsetNextCa; //!< From start of
>structure.
>	ULONG				NumPorts;
>	UINT64			CaGuid;
>	IPOIB_AT_PORT_RECORD	FirstPort;
>};
>
>struct IPOIB_AT_GET_LOCAL_IP_OUT
>{
>	ULONG				Version;
>	ULONG				Size; //!< Total size, including
>subrecords.
>	IPOIB_AT_CA_RECORD	FirstCa;
>};
>
>I don't know if you would be able to measure the difference between
>performing
>multiple simple IOCTLs versus performing a single complicated IOCTL.
>Further,
>the multiple IOCTL approach allows a user to only request IP information
>for a
>particular port - in case they don't care about any of the other ports.
>Note
>that the version fields can probably be eliminated but won't result in
>smaller
>records due to padding.  The size fields could also be eliminated, but
>would
>result in extra processing when filling the buffer.
>
>Personally I think the multi-IOCTL method will be simpler to implement
>while
>providing the client with extra flexibility.
>
We will use the multi-IOCTL method to get the data.

>> >> 2)     Get remote IP - we must implement.
>> >
>> >Assuming this takes a source and destination IP as input and returns a
>> >source and destination GID as output, I agree.
>> >
>> >How does something like this for the IOCTL input and output buffers
>sound:
>> >
>> >struct _IPOIB_AT_IN
>> >{
>> >       UCHAR           SrcIp[16];
>> >       UCHAR           DstIp[16];
>> >};
>> >struct _IPOIB_AT_OUT
>> >{
>> >       GID             SrcGid;
>> >       GID             DstGid;
>> >};
>>
>> As I said before I believe that the functionality that we are really
>looking
>> for is the remote mac addresses, translated into a remote lid. (This is
>the
>> information that we have). So the interface should look like:
>> struct _IPOIB_AT_IN
>> {
>>         UCHAR           DstMac[6]; // Do we want this longer ?
>> };
>> struct _IPOIB_AT_OUT
>> {
>>         GID             DstGid;
>> };
>> Does this make sense?
>
>Do we need to specify as part of the input buffer which instance to perform
>the
>lookup for?  Or will the destination MAC address always be unique,
>regardless of
>how the fabric is configured?  This depends on the algorithm used to
>generate
>the LAA MAC addresses reported to the OS.  I think right now they are
>unique.
>
I assume that the MACs are unique, If this is not the case other things
won't work eitheir.

>Also, how does a client get the source GID?
>
The GID is needed for the IB_QUERY_PATH_REC_BY_GIDS. I believe that using
IB_DEFAULT_SUBNET_PREFIX should be enough for this query.

>It might make sense to pass in the local port GUID (or CA GUID and port
>number)
>to better qualify the request and provide the source GID as output.
>
>Thoughts?
This seems to me like an extra thing to check, while it will get checked by
QUERY_PATH_REC in any case, so I don't think that there is a reason to
bother.

>
>- Fab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20050908/84b88945/attachment.html>


More information about the ofw mailing list