[Openib-windows] Re: [openib-general] Microsoft virtual machine and Infiniband

Fabian Tillier ftillier at silverstorm.com
Mon Feb 27 14:48:56 PST 2006


Hi Tzachi,

On 2/27/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
> Hi Fab,
>
> When trying to run windows 2003 server on a Microsoft virtual machine we
> have found out that there is one problem that prevents IPOIB from running.
>
> The problem as you can guess is related to the way MAC addresses are being
> handled. On such a machine, a fake Mac addresses is being created and it is
> later used for communication (one MAC per guest OS). However although this
> packets return to the correct computer, IPOIB doesn't restore their correct
> dest MAC and therefore pinging to a remote host is impossible.

Is IPoIB running on the guest OS, or on the host OS?  I'm assuming
host, and the guest sends packets using it's guest MAC.  So a packet
gets passed to IPoIB using the guest MAC as source.  The recipient of
such a packet tries to reconstruct the Ethernet header, and ends up
with the sender's host MAC, rather than the sender's guest MAC.  Am I
following this properly?

> In order to solve this problem there is a need to create a mechanism that
> will allow the IPOIB driver to correct the MAC addresses of packets based on
> their IP addresses.

So, the recipient should do an IP lookup on every received IP packet
and restore the MAC based on the IP, rather than just based on the
LID/GID of the source.  This requires adding a mechanism to lookup by
IP, which currently doesn't exist (do we need to to support duplicate
IPs?)

Currently the receive flow does something like this:

resolve endpoints
discard loopback
switch packet type
{
    case IP:
        handle IP packet; break;

    case ARP:
        handle ARP packet; break;

    default:
        handle generic packet; break;
}

This would have to change to something like this:

resolve source by LID/GID and discard loopback
switch packet type
{
    case IP:
        resolve endpoints by IP;
        handle IP packet; break;

    case ARP:
        process ARP, creating IP mappings; break;

    default:
        resolve destination from WC;
        handle generic packet; break;
}

> It seems that the best way to do this is to have a "static" table of IP's
> and MAC addresses and to check every IP packet as well as every ARP reply.
> We have done such an experiment and it did seems to work.

Why have a static table?  Why not just extend the endpoint lookup
mechanisms to support lookup by IP?

> We are still looking for a way to configure the table of guest OS and their
> IPs and MACs. One way to achieve this is simply having a static table that
> will be entered through some file. Although this is the simplest way, it has
> an obvious disadvantage (the need to manually configure the machine). A
> different way is to find some configuration API's that the remote machine
> has, while the last possibility is trying to find the information by
> sniffing for packets (the way that an Ethernet switch does things).

We have to sniff the packets, both outbound and inbound, to do IPoIB
encapsulation since we pretend to be a standard 802.3 NIC.  Additional
snooping shouldn't be a big deal.  If it is, we can add a
configuration parameter to turn the IP based MAC resolution on/off.

> One bug that I have already found is that if a broadcast packet is sent for
> example an ARP request, we send the packet as a multicast, and we also
> receive the packet ourselves, and later we send this packet to NDIS. This is
> not the correct behavior (assuming we are emulating Ethernet behavior) and
> we should remove this packets.

Yes, I have a fix for this in my sandbox already.  Any packet we
receive where we are the sender needs to be discarded.  The existing
check in the code for loopback packets uses the unformatted ethernet
header, which clearly doesn't work.  Thanks for pointing it out,
though!

> In the next week I'll try to create a patch that will allow the virtual
> machine to work, I just wanted to know what your opinion about this issue.

Cool, thanks!  Hopefully my understanding above is correct.  Please
let me know if I've missed something.

Thanks,

- Fab



More information about the ofw mailing list