[libfabric-users] A couple more questions

Biddiscombe, John A. biddisco at cscs.ch
Mon Feb 27 12:12:41 PST 2017


Erratum : I just realized I hadn’t pushed my last proper commit (just done it), so those links below point to the wrong line numbers, but you’ll easily find the functions as they are nearby

JB

On 27.02.17, 20:43, "Libfabric-users on behalf of Biddiscombe, John A." <libfabric-users-bounces at lists.openfabrics.org on behalf of biddisco at cscs.ch> wrote:

    I’ve checked and double checked and the endpoint is really not the passive one
    
    If you have a spare moment and what to look through some code : here is where I call fi_read
    https://github.com/biddisco/hpx/blob/libfabric/plugins/parcelport/libfabric/parcelport_libfabric.cpp#L663
    
    and here is where the endpoint is created
    https://github.com/biddisco/hpx/blob/libfabric/plugins/parcelport/libfabric/libfabric_controller.hpp#L296
    
    those two files contain all the fabric specific code – (the main structure is identical to a verbs version with the endpoint stuff tweaked a bit and the calls to ibv_xxx replaced with fi_yyy etc etc)
    
    I will experiment some more – maybe you can spot a simple mistake in the fabric calls/hints/info etc in the fabric_controller code – all the setup is done in there?
    
    Thanks for any feedback (NB. The code is still a bit messy as I am experimenting with options and I’ve not pushed the actual version I’m using here – as it’s full of extra log messages etc and flag settings – but the majority iod the same)
    
    JB
    
    
    
    On 27.02.17, 17:52, "Hefty, Sean" <sean.hefty at intel.com> wrote:
    
        > I have a new problem – I am sending receiving messages using endpoints
        > of type FI_EP_MSG  (verbs+gni) and all is well. However, when I attempt
        > an fi_read(….) I get a segfault. Looking at the code I see that inside
        > fi_rma.h the ep->rma field is uninitialized. In my hints I use
        
        Is there any chance you're treating a passive endpoint as an active endpoint?
        
        Typically if a feature is not supported, the provider is supposed to set all function pointers to 'enosys' versions of the calls that simply return a failure if called.  There may be cases where the provider isn't doing this.  This would fix the segfault, but not actually get things working.
        
        >             fabric_hints_->caps = FI_MSG | FI_RMA;
        
        This is correct
        
        > so I assumed that I’d get support for both, but I must need to ask for
        > more somewhere to get rma support working.
        > What might I be missing that causes the ep->rma internals to be unset?
        
        We need more details to figure this out.
        
        > One clue might be from the fabric logging that say this
        > libfabric:verbs:fabric:fi_ibv_getifaddrs():854<debug> Found active
        > interface for verbs device: mlx5_0 with address: 192.168.3.37
        > libfabric:verbs:fabric:fi_ibv_create_ep():183<info> rdma_create_ep:
        > Invalid argument(22)
        > libfabric:verbs:fabric:fi_ibv_create_ep():183<info> rdma_create_ep:
        > Invalid argument(22)
        > when I call getinfo, but I am not able to tell which argument (hint?)
        > this refers to. I mention it ‘just in case’ it helps.
        
        These errors are coming from lower libraries, and may or may not indicate a problem.  Not very helpful, I know...
        
    
    _______________________________________________
    Libfabric-users mailing list
    Libfabric-users at lists.openfabrics.org
    http://lists.openfabrics.org/mailman/listinfo/libfabric-users
    



More information about the Libfabric-users mailing list