[libfabric-users] A couple more questions
Biddiscombe, John A.
biddisco at cscs.ch
Mon Feb 27 11:43:19 PST 2017
I’ve checked and double checked and the endpoint is really not the passive one
If you have a spare moment and what to look through some code : here is where I call fi_read
https://github.com/biddisco/hpx/blob/libfabric/plugins/parcelport/libfabric/parcelport_libfabric.cpp#L663
and here is where the endpoint is created
https://github.com/biddisco/hpx/blob/libfabric/plugins/parcelport/libfabric/libfabric_controller.hpp#L296
those two files contain all the fabric specific code – (the main structure is identical to a verbs version with the endpoint stuff tweaked a bit and the calls to ibv_xxx replaced with fi_yyy etc etc)
I will experiment some more – maybe you can spot a simple mistake in the fabric calls/hints/info etc in the fabric_controller code – all the setup is done in there?
Thanks for any feedback (NB. The code is still a bit messy as I am experimenting with options and I’ve not pushed the actual version I’m using here – as it’s full of extra log messages etc and flag settings – but the majority iod the same)
JB
On 27.02.17, 17:52, "Hefty, Sean" <sean.hefty at intel.com> wrote:
> I have a new problem – I am sending receiving messages using endpoints
> of type FI_EP_MSG (verbs+gni) and all is well. However, when I attempt
> an fi_read(….) I get a segfault. Looking at the code I see that inside
> fi_rma.h the ep->rma field is uninitialized. In my hints I use
Is there any chance you're treating a passive endpoint as an active endpoint?
Typically if a feature is not supported, the provider is supposed to set all function pointers to 'enosys' versions of the calls that simply return a failure if called. There may be cases where the provider isn't doing this. This would fix the segfault, but not actually get things working.
> fabric_hints_->caps = FI_MSG | FI_RMA;
This is correct
> so I assumed that I’d get support for both, but I must need to ask for
> more somewhere to get rma support working.
> What might I be missing that causes the ep->rma internals to be unset?
We need more details to figure this out.
> One clue might be from the fabric logging that say this
> libfabric:verbs:fabric:fi_ibv_getifaddrs():854<debug> Found active
> interface for verbs device: mlx5_0 with address: 192.168.3.37
> libfabric:verbs:fabric:fi_ibv_create_ep():183<info> rdma_create_ep:
> Invalid argument(22)
> libfabric:verbs:fabric:fi_ibv_create_ep():183<info> rdma_create_ep:
> Invalid argument(22)
> when I call getinfo, but I am not able to tell which argument (hint?)
> this refers to. I mention it ‘just in case’ it helps.
These errors are coming from lower libraries, and may or may not indicate a problem. Not very helpful, I know...
More information about the Libfabric-users
mailing list