[Openib-windows] IPOIB virtualization What was already done, what still has to be done to finish the job.
Hal Rosenstock
halr at voltaire.com
Fri Apr 21 11:53:42 PDT 2006
Hi Fab,
On Fri, 2006-04-21 at 13:31, Fabian Tillier wrote:
> Hi Hal,
>
> On 21 Apr 2006 09:34:27 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > Hi Tzachi,
> >
> > On Fri, 2006-04-21 at 09:23, Tzachi Dar wrote:
> > > Hi Fab,
> > >
> > > The following mail summarizes the place the work that I did on
> > > IPOIB virtualization, that is running IPOIB on Microsoft virtual
> > > server R2.
> > >
> > > Please note that the current status is that ping works in all
> > > directions, still there is a lot of work needed in order to bring it
> > > to product quality. The biggest issue that still has to be done is
> > > allow for packets that are bigger than 1500 bytes, and smaller than
> > > 2048 to pass to the guest OS. Currently, I have implemented a hack
> > > that tells windows that we only support MTU of 1500 bytes (like
> > > Ethernet). My change assumes that all machines are windows machines,
> > > and all have my changes, but this is not always true. One example that
> > > breaks this assumption is Linux.
> >
> > Is this issue with Windows or Linux in terms of this interoperation ?
> > Can you elaborate on this ?
>
> This is a Windows VM issue.
>
> A Windows VM can only receive ~1500 byte packets from the Windows host
> machine. This means that if a packet is sent using the full IPoIB MTU
> to a guest VM, that packet will not ever make it - it gets dropped
> somewhere between being handed off to the host network stack and the
> guest OS. I would expect this to be a bug in the MS virtual server
> network emulation layer, but we haven't confirmed this yet.
>
> So a Linux system sending a full IPoIB MTU packet to a Windows VM
> would not work, through no fault of the Linux machine (the same
> applies if a Windows machine sends a full IPoIB MTU packet).
>
> To prevent Windows from sending a full IPoIB MTU, the IPoIB driver
> must report its MTU to Windows as 1500 bytes, rather than 2044, but
> then any full IPoIB MTU packet (from a Linux host for example) would
> overrun the RQ WQE.
Then it sounds like when interoperation with Linux or any other OS which
uses the standard IPoIB MTU is desired, the IPoIB interface needs to be
ifconfig'd down to 1500 and it would work, right ?
-- Hal
> > > It seems that a better solution to this problem is either talk to MS
> > > and see if they have a solution to this problem or accept bigger
> > > packets and break them by demand.
> > >
> > > Due to the time that is needed to complete the work (see also problems
> > > bellow) we have decided not to support virtualization for this
> > > release.
> > >
> > > Attached to this mail is the latest version that I created. It should
> > > fit less or more to the version of IPOIB.
> > >
> > > The changes that I have made are in the following areas. I'll describe
> > > shortly what the problem was, what I did and what still has to be
> > > done. Some of the problems described are not really related to
> > > virtualization.
> > >
> > > 1) checking where to pass the packets. I have implemented the code
> > > that sniffs arps and creates a table of IP, Mac. Packets are later
> > > changed based on that table. Code is almost complete, however there is
> > > a need to take the correct lock when writing the table (shouldn't be
> > > that complicated).
> > >
> > > 2) DHCP support. A few general comments: 1) The current code
> > > introduced changes to DHCP packets both in the receiver side as well
> > > as in the sender side. This works well if we write the software in
> > > both sides. Assuming that the other side is Linux, this is not true.
> > > I'm not sure that there is a spec that solves these problems.
> >
> > Please elaborate on this.
>
> Windows IPoIB masquarades itself as an 802.3 device. This means that
> the IPoIB encapsulation for ARP and DHCP packets is implemented
> internally to the IPoIB driver. So when NDIS sends a DHCP packet,
> IPoIB converts it to follow the IETF draft so that on the wire it
> looks like IPoIB, not Ethernet. The receiver then converts back to
> Ethernet.
>
> There's a bug in the driver (that I'm currently fixing) where it
> doesn't recalculate the IP or UDP checksums or update the packet
> lengths, eventhough it changed the packet payload. This means that
> DHCP packets on the wire will have incorrect lengths and checksums,
> and DHCP packets received after conversion to Ethernet format could
> have the wrong checksum.
>
> It is really useful in Windows (and I can't stress this enough) to
> have IB port GUIDs that can be converted to valid, unique Ethernet
> MACs. The driver currently only has code to handle SilverStorm and
> Mellanox GUIDs, because no other vendor has put forth their algorithm
> (if they have one) to do this. If Voltaire has such an algorithm, it
> would be great to add a handler for it.
>
> - Fab
More information about the ofw
mailing list