[libfabric-users] A couple more questions

Biddiscombe, John A. biddisco at cscs.ch
Mon Feb 27 11:43:19 PST 2017


I’ve checked and double checked and the endpoint is really not the passive one

If you have a spare moment and what to look through some code : here is where I call fi_read
https://github.com/biddisco/hpx/blob/libfabric/plugins/parcelport/libfabric/parcelport_libfabric.cpp#L663

and here is where the endpoint is created
https://github.com/biddisco/hpx/blob/libfabric/plugins/parcelport/libfabric/libfabric_controller.hpp#L296

those two files contain all the fabric specific code – (the main structure is identical to a verbs version with the endpoint stuff tweaked a bit and the calls to ibv_xxx replaced with fi_yyy etc etc)

I will experiment some more – maybe you can spot a simple mistake in the fabric calls/hints/info etc in the fabric_controller code – all the setup is done in there?

Thanks for any feedback (NB. The code is still a bit messy as I am experimenting with options and I’ve not pushed the actual version I’m using here – as it’s full of extra log messages etc and flag settings – but the majority iod the same)

JB



On 27.02.17, 17:52, "Hefty, Sean" <sean.hefty at intel.com> wrote:

    > I have a new problem – I am sending receiving messages using endpoints
    > of type FI_EP_MSG  (verbs+gni) and all is well. However, when I attempt
    > an fi_read(….) I get a segfault. Looking at the code I see that inside
    > fi_rma.h the ep->rma field is uninitialized. In my hints I use
    
    Is there any chance you're treating a passive endpoint as an active endpoint?
    
    Typically if a feature is not supported, the provider is supposed to set all function pointers to 'enosys' versions of the calls that simply return a failure if called.  There may be cases where the provider isn't doing this.  This would fix the segfault, but not actually get things working.
    
    >             fabric_hints_->caps = FI_MSG | FI_RMA;
    
    This is correct
    
    > so I assumed that I’d get support for both, but I must need to ask for
    > more somewhere to get rma support working.
    > What might I be missing that causes the ep->rma internals to be unset?
    
    We need more details to figure this out.
    
    > One clue might be from the fabric logging that say this
    > libfabric:verbs:fabric:fi_ibv_getifaddrs():854<debug> Found active
    > interface for verbs device: mlx5_0 with address: 192.168.3.37
    > libfabric:verbs:fabric:fi_ibv_create_ep():183<info> rdma_create_ep:
    > Invalid argument(22)
    > libfabric:verbs:fabric:fi_ibv_create_ep():183<info> rdma_create_ep:
    > Invalid argument(22)
    > when I call getinfo, but I am not able to tell which argument (hint?)
    > this refers to. I mention it ‘just in case’ it helps.
    
    These errors are coming from lower libraries, and may or may not indicate a problem.  Not very helpful, I know...
    



More information about the Libfabric-users mailing list