[libfabric-users] A couple more questions
Biddiscombe, John A.
biddisco at cscs.ch
Mon Feb 27 12:12:41 PST 2017
Erratum : I just realized I hadn’t pushed my last proper commit (just done it), so those links below point to the wrong line numbers, but you’ll easily find the functions as they are nearby
JB
On 27.02.17, 20:43, "Libfabric-users on behalf of Biddiscombe, John A." <libfabric-users-bounces at lists.openfabrics.org on behalf of biddisco at cscs.ch> wrote:
I’ve checked and double checked and the endpoint is really not the passive one
If you have a spare moment and what to look through some code : here is where I call fi_read
https://github.com/biddisco/hpx/blob/libfabric/plugins/parcelport/libfabric/parcelport_libfabric.cpp#L663
and here is where the endpoint is created
https://github.com/biddisco/hpx/blob/libfabric/plugins/parcelport/libfabric/libfabric_controller.hpp#L296
those two files contain all the fabric specific code – (the main structure is identical to a verbs version with the endpoint stuff tweaked a bit and the calls to ibv_xxx replaced with fi_yyy etc etc)
I will experiment some more – maybe you can spot a simple mistake in the fabric calls/hints/info etc in the fabric_controller code – all the setup is done in there?
Thanks for any feedback (NB. The code is still a bit messy as I am experimenting with options and I’ve not pushed the actual version I’m using here – as it’s full of extra log messages etc and flag settings – but the majority iod the same)
JB
On 27.02.17, 17:52, "Hefty, Sean" <sean.hefty at intel.com> wrote:
> I have a new problem – I am sending receiving messages using endpoints
> of type FI_EP_MSG (verbs+gni) and all is well. However, when I attempt
> an fi_read(….) I get a segfault. Looking at the code I see that inside
> fi_rma.h the ep->rma field is uninitialized. In my hints I use
Is there any chance you're treating a passive endpoint as an active endpoint?
Typically if a feature is not supported, the provider is supposed to set all function pointers to 'enosys' versions of the calls that simply return a failure if called. There may be cases where the provider isn't doing this. This would fix the segfault, but not actually get things working.
> fabric_hints_->caps = FI_MSG | FI_RMA;
This is correct
> so I assumed that I’d get support for both, but I must need to ask for
> more somewhere to get rma support working.
> What might I be missing that causes the ep->rma internals to be unset?
We need more details to figure this out.
> One clue might be from the fabric logging that say this
> libfabric:verbs:fabric:fi_ibv_getifaddrs():854<debug> Found active
> interface for verbs device: mlx5_0 with address: 192.168.3.37
> libfabric:verbs:fabric:fi_ibv_create_ep():183<info> rdma_create_ep:
> Invalid argument(22)
> libfabric:verbs:fabric:fi_ibv_create_ep():183<info> rdma_create_ep:
> Invalid argument(22)
> when I call getinfo, but I am not able to tell which argument (hint?)
> this refers to. I mention it ‘just in case’ it helps.
These errors are coming from lower libraries, and may or may not indicate a problem. Not very helpful, I know...
_______________________________________________
Libfabric-users mailing list
Libfabric-users at lists.openfabrics.org
http://lists.openfabrics.org/mailman/listinfo/libfabric-users
More information about the Libfabric-users
mailing list