[Openib-windows] RE: [openib-general] Microsoft virtual machine and Infiniband

Tzachi Dar tzachid at mellanox.co.il
Wed Mar 1 12:52:15 PST 2006


See bellow.

Thanks
Tzachi
 

> -----Original Message-----
> From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com] 
> On Behalf Of Fabian Tillier
> Sent: Tuesday, February 28, 2006 12:49 AM
> To: Tzachi Dar
> Cc: openib-general at openib.org; openib-windows at openib.org
> Subject: Re: [openib-general] Microsoft virtual machine and Infiniband
> 
> Hi Tzachi,
> 
> On 2/27/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
> > Hi Fab,
> >
> > When trying to run windows 2003 server on a Microsoft 
> virtual machine 
> > we have found out that there is one problem that prevents 
> IPOIB from running.
> >
> > The problem as you can guess is related to the way MAC 
> addresses are 
> > being handled. On such a machine, a fake Mac addresses is being 
> > created and it is later used for communication (one MAC per 
> guest OS). 
> > However although this packets return to the correct computer, IPOIB 
> > doesn't restore their correct dest MAC and therefore 
> pinging to a remote host is impossible.
> 
> Is IPoIB running on the guest OS, or on the host OS?  I'm 
> assuming host, and the guest sends packets using it's guest 
> MAC.  So a packet gets passed to IPoIB using the guest MAC as 
> source.  The recipient of such a packet tries to reconstruct 
> the Ethernet header, and ends up with the sender's host MAC, 
> rather than the sender's guest MAC.  Am I following this properly?
> 
Yes you are.
> > In order to solve this problem there is a need to create a 
> mechanism 
> > that will allow the IPOIB driver to correct the MAC addresses of 
> > packets based on their IP addresses.
> 
> So, the recipient should do an IP lookup on every received IP 
> packet and restore the MAC based on the IP, rather than just 
> based on the LID/GID of the source.  This requires adding a 
> mechanism to lookup by IP, which currently doesn't exist (do 
> we need to to support duplicate
> IPs?)
> 
We don't have to support multiple IP's, but I don't see an issue here.

> Currently the receive flow does something like this:
> 
> resolve endpoints
> discard loopback
> switch packet type
> {
>     case IP:
>         handle IP packet; break;
> 
>     case ARP:
>         handle ARP packet; break;
> 
>     default:
>         handle generic packet; break;
> }
> 
> This would have to change to something like this:
> 
> resolve source by LID/GID and discard loopback switch packet type {
>     case IP:
>         resolve endpoints by IP;
>         handle IP packet; break;
> 
>     case ARP:
>         process ARP, creating IP mappings; break;
> 
>     default:
>         resolve destination from WC;
>         handle generic packet; break;
> }
> 
I was thinking something a little different, see bellow. I believe that
it makes it simpler to see where the support for virtual machine is:

 switch packet type
 {
     case IP:
         handle IP packet; 
	   Fix MAC by IP's
	   break;
 
     case ARP:
         handle ARP packet; 
         If Arp request update table.
         If Arp reply, fix it's destination like for IP.
         break;
 
     default:
         handle generic packet; break;
 }


> > It seems that the best way to do this is to have a "static" 
> table of 
> > IP's and MAC addresses and to check every IP packet as well 
> as every ARP reply.
> > We have done such an experiment and it did seems to work.
> 
> Why have a static table?  Why not just extend the endpoint 
> lookup mechanisms to support lookup by IP?
> 
> > We are still looking for a way to configure the table of 
> guest OS and 
> > their IPs and MACs. One way to achieve this is simply 
> having a static 
> > table that will be entered through some file. Although this is the 
> > simplest way, it has an obvious disadvantage (the need to manually 
> > configure the machine). A different way is to find some 
> configuration 
> > API's that the remote machine has, while the last possibility is 
> > trying to find the information by sniffing for packets (the 
> way that an Ethernet switch does things).
> 
> We have to sniff the packets, both outbound and inbound, to 
> do IPoIB encapsulation since we pretend to be a standard 
> 802.3 NIC.  Additional snooping shouldn't be a big deal.  If 
> it is, we can add a configuration parameter to turn the IP 
> based MAC resolution on/off.
> 
> > One bug that I have already found is that if a broadcast packet is 
> > sent for example an ARP request, we send the packet as a multicast, 
> > and we also receive the packet ourselves, and later we send this 
> > packet to NDIS. This is not the correct behavior (assuming we are 
> > emulating Ethernet behavior) and we should remove this packets.
> 
> Yes, I have a fix for this in my sandbox already.  Any packet 
> we receive where we are the sender needs to be discarded.  
> The existing check in the code for loopback packets uses the 
> unformatted ethernet header, which clearly doesn't work.  
> Thanks for pointing it out, though!
> 
It would be nice if you can check in the fix to winib.


> > In the next week I'll try to create a patch that will allow the 
> > virtual machine to work, I just wanted to know what your 
> opinion about this issue.
> 
> Cool, thanks!  Hopefully my understanding above is correct.  
> Please let me know if I've missed something.
> 
> Thanks,
> 
> - Fab
> 
> 
> 



More information about the ofw mailing list