[libfabric-users] Linux verbs / Windows netdir Interoperability
Hefty, Sean
sean.hefty at intel.com
Wed Jan 5 08:24:56 PST 2022
> I am certain that I am going to want to have a lot of back and forth discussions about
> this task. How you would like to handle that? I imagine it could be:
>
> 1. just messages on this list; or
> 2. a dedicated issue in the GitHub repository; or
> 3. something else entirely.
>
>
>
> My thinking was #2, but I’m happy to do whatever makes sense for the project.
Option 2 makes sense. We can add a note to the mailing list to point people to the discussion, so that they can follow it as desired.
> -- Required Testing
>
> I am aware of the existence of fabtests, but those seem like more a set of tools for
> end users to familiarize themselves with the library and check for correct
> installation/configuration. I’m curious to know if you have any requirements and/or
> guidelines for what testing should be done for new code. Unit tests? Regression tests?
> Integration tests? Etc.
Fabtests is primarily a verification test suite. Those are the first level of tests that we use. The next step in our CI is middleware level tests -- MPI, SHMEM, DAOS, AI frameworks, etc. That said, our CI testing on Windows is minimal, not even all fabtests work there.
> Obviously, code I submit will work for my use case, however that seems like a bare
> minimum and I feel like I should aspire to more than just that. Just looking for some
> guidance about expectations for testing.
The main consumer of the windows code that I'm aware of is Intel MPI. I can't break that app.
> -- Earliest Notions of an Approach to the Problem
>
>
>
> I could imagine this effort being directed in one of several directions and was
> interested in your opinion on the pros and cons of them and what direction you think I
> ought to head. We could:
>
> 1. completely scrap the existing netdir provider and start over with something new;
> 2. keep the existing netdir provider and augment it with some sort of verbs
> compatibility setting via an environment variable;
> 3. keep the existing netdir provider and create a new one along side it, e.g.
> ndverbs, that will play well with the verbs provider running on Linux;
> 4. port libibverbs to Windows/NetworkDirect and theoretically use the libfabric
> verbs provider on both Linux and Windows;
> 5. something I haven’t even contemplated.
I'm not familiar enough with the current netdir provider implementation to provide much guidance here. Of the little code I have looked at, it looks like netdir needs significant cleanup. My initial thought is to pick option 3. Once established, it can replace the existing provider.
Option 4 is appealing to improve code re-use. Or if there is a clean way to branch the verbs provider at the lowest level to switch between libibverbs/librdmacm and ND, that could be an option. I'm skeptical this route would be more maintainable than having a dedicated windows RDMA provider, without introducing unnecessary overhead.
- Sean
More information about the Libfabric-users
mailing list