[ofiwg] noob questions
Don Fry
DFry at lightfleet.com
Wed Nov 13 14:35:45 PST 2019
Here is another run with the output suggested by James Swaro
Don
________________________________________
From: Don Fry
Sent: Wednesday, November 13, 2019 2:26 PM
To: Barrett, Brian; Hefty, Sean; Byrne, John (Labs); ofiwg at lists.openfabrics.org
Subject: Re: [ofiwg] noob questions
attached is the output of mpirun with some of my debugging printf's
Don
________________________________________
From: Barrett, Brian <bbarrett at amazon.com>
Sent: Wednesday, November 13, 2019 2:05 PM
To: Don Fry; Hefty, Sean; Byrne, John (Labs); ofiwg at lists.openfabrics.org
Subject: Re: [ofiwg] noob questions
That likely means that something failed in initializing the OFI provider. Without seeing the debugging output John mentioned, it's really hard to say *why* it failed to initialize. There are many reasons, including not being able to conform to a bunch of provider assumptions that Open MPI has on its providers.
Brian
-----Original Message-----
From: Don Fry <DFry at lightfleet.com>
Date: Wednesday, November 13, 2019 at 2:01 PM
To: "Barrett, Brian" <bbarrett at amazon.com>, "Hefty, Sean" <sean.hefty at intel.com>, "Byrne, John (Labs)" <john.l.byrne at hpe.com>, "ofiwg at lists.openfabrics.org" <ofiwg at lists.openfabrics.org>
Subject: Re: [ofiwg] noob questions
When I tried --mca pml cm it complains that "PML cm cannot be selected". Maybe I needed to enable cm when I configured openmpi? I didn't specifically enable or disable it. It could also be that my getinfo routine doesn't have a capability set properly.
my latest command line was:
mpirun --mca pml cm --mca mtl ofi --mca mtl_ofi_provider_include "lf;ofi_rxm" ./mpi_latency (where lf is my provider)
Thanks for the pointers, I will do some more debugging on my end.
Don
________________________________________
From: Barrett, Brian <bbarrett at amazon.com>
Sent: Wednesday, November 13, 2019 12:53 PM
To: Hefty, Sean; Byrne, John (Labs); Don Fry; ofiwg at lists.openfabrics.org
Subject: Re: [ofiwg] noob questions
You can force Open MPI to use libfabric as its transport by adding "-mca pml cm -mca mtl ofi" to the mpirun command line.
Brian
-----Original Message-----
From: ofiwg <ofiwg-bounces at lists.openfabrics.org> on behalf of "Hefty, Sean" <sean.hefty at intel.com>
Date: Wednesday, November 13, 2019 at 12:52 PM
To: "Byrne, John (Labs)" <john.l.byrne at hpe.com>, Don Fry <DFry at lightfleet.com>, "ofiwg at lists.openfabrics.org" <ofiwg at lists.openfabrics.org>
Subject: Re: [ofiwg] noob questions
My guess is that OpenMPI has an internal socket transport that it is using. You likely need to force MPI to use libfabric, but I don't know enough about OMPI to do that.
Jeff (copied) likely knows the answer here, but you may need to create him a new meme for his assistance.
- Sean
> -----Original Message-----
> From: ofiwg <ofiwg-bounces at lists.openfabrics.org> On Behalf Of Byrne, John (Labs)
> Sent: Wednesday, November 13, 2019 11:26 AM
> To: Don Fry <DFry at lightfleet.com>; ofiwg at lists.openfabrics.org
> Subject: Re: [ofiwg] noob questions
>
> You only mention the dgram and msg types and the mtl_ofi component wants rdm. If you
> don’t support rdm, I would have expected your getinfo routine to return error -61. You
> can try using the ofi_rxm provider with your provider to add rdm support, replacing
> verbs in “--mca mtl_ofi_provider_include verbs;ofi_rxm” with your provider.
>
>
>
> openmpi transport selection is complex. Adding insane levels of verbosity can help you
> understand what is happening. I tend to use: --mca mtl_base_verbose 100 --mca
> btl_base_verbose 100 --mca pml_base_verbose 100
>
>
>
> John Byrne
>
>
>
> From: ofiwg [mailto:ofiwg-bounces at lists.openfabrics.org] On Behalf Of Don Fry
> Sent: Wednesday, November 13, 2019 10:54 AM
> To: ofiwg at lists.openfabrics.org
> Subject: [ofiwg] noob questions
>
>
>
> I have written a libfabric provider for our hardware and it passes all the fabtests I
> expect it to (dgram and msg). I am trying to run some MPI tests using libfabrics under
> openmpi (4.0.2). When I run a simple ping-pong test using mpirun it sends and receives
> the messages using the tcp/ip protocol. It does call my fi_getinfo routine, but
> doesn't use my provider send/receive routines. I have rebuilt the libfabric library
> disabling sockets, then again --disable-tcp, then --disable-udp, and fi_info reports
> fewer and fewer providers until it only lists my provider, but each time I run the mpi
> test, it still uses the ip protocol to exchange messages.
>
>
>
> When I configured openmpi I specified --with-libfabric=/usr/local/ and the libfabric
> library is being loaded and executed.
>
>
>
> I am probably doing something obviously wrong, but I don't know enough about MPI or
> maybe libfabric, so need some help. If this is the wrong list, redirect me.
>
>
>
> Any suggestions?
>
> Don
_______________________________________________
ofiwg mailing list
ofiwg at lists.openfabrics.org
https://lists.openfabrics.org/mailman/listinfo/ofiwg
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: launch.txt
URL: <http://lists.openfabrics.org/pipermail/ofiwg/attachments/20191113/cfb8faf2/attachment-0001.txt>
More information about the ofiwg
mailing list