[libfabric-users] Linux verbs / Windows netdir Interoperability

Hefty, Sean sean.hefty at intel.com
Wed Jan 5 08:24:56 PST 2022


> I am certain that I am going to want to have a lot of back and forth discussions about
> this task. How you would like to handle that? I imagine it could be:
> 
> 1.	just messages on this list; or
> 2.	a dedicated issue in the GitHub repository; or
> 3.	something else entirely.
> 
> 
> 
> My thinking was #2, but I’m happy to do whatever makes sense for the project.

Option 2 makes sense.  We can add a note to the mailing list to point people to the discussion, so that they can follow it as desired.


> -- Required Testing
> 
> I am aware of the existence of fabtests, but those seem like more a set of tools for
> end users to familiarize themselves with the library and check for correct
> installation/configuration. I’m curious to know if you have any requirements and/or
> guidelines for what testing should be done for new code. Unit tests? Regression tests?
> Integration tests? Etc.

Fabtests is primarily a verification test suite.  Those are the first level of tests that we use.  The next step in our CI is middleware level tests -- MPI, SHMEM, DAOS, AI frameworks, etc.  That said, our CI testing on Windows is minimal, not even all fabtests work there.


> Obviously, code I submit will work for my use case, however that seems like a bare
> minimum and I feel like I should aspire to more than just that. Just looking for some
> guidance about expectations for testing.

The main consumer of the windows code that I'm aware of is Intel MPI.  I can't break that app.


> -- Earliest Notions of an Approach to the Problem
> 
> 
> 
> I could imagine this effort being directed in one of several directions and was
> interested in your opinion on the pros and cons of them and what direction you think I
> ought to head. We could:
> 
> 1.	completely scrap the existing netdir provider and start over with something new;
> 2.	keep the existing netdir provider and augment it with some sort of verbs
> compatibility setting via an environment variable;
> 3.	keep the existing netdir provider and create a new one along side it, e.g.
> ndverbs, that will play well with the verbs provider running on Linux;
> 4.	port libibverbs to Windows/NetworkDirect and theoretically use the libfabric
> verbs provider on both Linux and Windows;
> 5.	something I haven’t even contemplated.

I'm not familiar enough with the current netdir provider implementation to provide much guidance here.  Of the little code I have looked at, it looks like netdir needs significant cleanup.  My initial thought is to pick option 3.  Once established, it can replace the existing provider.

Option 4 is appealing to improve code re-use.  Or if there is a clean way to branch the verbs provider at the lowest level to switch between libibverbs/librdmacm and ND, that could be an option.  I'm skeptical this route would be more maintainable than having a dedicated windows RDMA provider, without introducing unnecessary overhead.

- Sean


More information about the Libfabric-users mailing list