[Openib-windows] IPOIB virtualization What was already done, what still has to be done to finish the job.
Tzachi Dar
tzachid at mellanox.co.il
Fri Apr 21 06:23:33 PDT 2006
Hi Fab,
The following mail summarizes the place the work that I did on IPOIB
virtualization, that is running IPOIB on Microsoft virtual server R2.
Please note that the current status is that ping works in all
directions, still there is a lot of work needed in order to bring it to
product quality. The biggest issue that still has to be done is allow
for packets that are bigger than 1500 bytes, and smaller than 2048 to
pass to the guest OS. Currently, I have implemented a hack that tells
windows that we only support MTU of 1500 bytes (like Ethernet). My
change assumes that all machines are windows machines, and all have my
changes, but this is not always true. One example that breaks this
assumption is Linux. It seems that a better solution to this problem is
either talk to MS and see if they have a solution to this problem or
accept bigger packets and break them by demand.
Due to the time that is needed to complete the work (see also problems
bellow) we have decided not to support virtualization for this release.
Attached to this mail is the latest version that I created. It should
fit less or more to the version of IPOIB.
The changes that I have made are in the following areas. I'll describe
shortly what the problem was, what I did and what still has to be done.
Some of the problems described are not really related to virtualization.
1) checking where to pass the packets. I have implemented the code that
sniffs arps and creates a table of IP, Mac. Packets are later changed
based on that table. Code is almost complete, however there is a need to
take the correct lock when writing the table (shouldn't be that
complicated).
2) DHCP support. A few general comments: 1) The current code introduced
changes to DHCP packets both in the receiver side as well as in the
sender side. This works well if we write the software in both sides.
Assuming that the other side is Linux, this is not true. I'm not sure
that there is a spec that solves these problems. 2) If we get a message
that we can not format but looks as DHCP packet, we silently drop it. I
believe that we should allow it to pass. Maybe someone else will find
what to do with it. Another issue that is somewhat problematic is
keeping the old addresses that was once assigned to us. Currently, we
use GID+QP as our unique identifier, however the QP changes, and
therefore the IP's also.
As for virtualization: I was using GUID+QP+base Mac as unique
identifier, and changed the receiver size to almost not do anything, and
things seems to work.
3) support for packets that are bigger that 1500, but smaller than 2048,
see description above.
4) Multicast requests of guest OS: In the current implementation NDIS
notifies the driver to what multicast groups to join. We register to
them, and we are also notified when the host leaves the group. When a
guest OS asks to be registered to a multicast group, there is no NDIS
query that tells us what has happened, and it seems that the only way
around that is by sniffing IGMP packets. I have implemented the code
that sniffs the packets and registers to such a group, but not the code
that will remove us from the multicast group. Much more than that,
windows allows us to register to a certain group and accept packets only
from some hosts, while another virtual machine (or even another
application on the same machine?) might ask to be registered to another
machines (or even deny them). If we want to follow the spec, exactly,
there is a lot of work that has to be done.
Thanks
Tzachi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060421/6900127e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ipoib.zip
Type: application/x-zip-compressed
Size: 182427 bytes
Desc: ipoib.zip
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060421/6900127e/attachment.bin>
More information about the ofw
mailing list