[libfabric-users] fi_read verbs ENODATA

Arne arnestruck at astruck.de
Fri Oct 23 09:06:38 PDT 2020


Hello, its me again.


Since there is a method how to run the application with the sockets 
provider at the moment, I wanted to change provider to verbs.

I know there is trouble-shooting info for this on the verbs man-entry, 
but I cant find my error after doing the steps described, but 90% sure 
my input for hints is faulty (dont see why though, thus the question).

Problem is: on server fi_getinfo does return "no data 
available"/FI_ENODATA with:

(hints is allocated struct fi_info*):


hints->caps = FI_RMA|FI_MSG;

hints->ep_attr->type = FI_EP_MSG;

hints->addr_format = FI_SOCKADDR_IN;

hints->fabric_attr->prov_name = g_strdup("verbs");

hints->mode = FI_LOCAL_MR;


error = fi_getinfo(FI_VERSION(1, 11),

                         "10.0.10.2",

                         "4711",

                         FI_SOURCE,

                         hints,

                         &info);

(setting service to "0" will give same result).


fi_info -p verbs -P 4711 -n 10.0.10.2 -t FI_EP_MSG -a FI_SOCKADDR_IN -c 
FI_RMA -c FI_MSG -v

called on the same node returns 1 fi_info:

fi_info:
     caps: [ FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND, 
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_LOCAL_COMM, FI_REMOTE_COMM ]
     mode: [ FI_RX_CQ_DATA ]
     addr_format: FI_SOCKADDR_IN
     src_addrlen: 16
     dest_addrlen: 16
     src_addr: fi_sockaddr_in://10.0.10.2:0
     dest_addr: fi_sockaddr_in://10.0.10.2:4711
     handle: (nil)
     fi_tx_attr:
         caps: [ FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_SEND ]
         mode: [  ]
         op_flags: [  ]
         msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, 
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAW, FI_ORDER_SAS, 
FI_ORDER_RMA_RAR, FI_ORDER_RMA_RAW, FI_ORDER_RMA_WAW, 
FI_ORDER_ATOMIC_RAR, FI_ORDER_ATOMIC_RAW, FI_ORDER_ATOMIC_WAW ]
         comp_order: [ FI_ORDER_STRICT ]
         inject_size: 256
         size: 384
         iov_limit: 4
         rma_iov_limit: 1
     fi_rx_attr:
         caps: [ FI_MSG, FI_RMA, FI_RECV, FI_REMOTE_READ, FI_REMOTE_WRITE ]
         mode: [ FI_RX_CQ_DATA ]
         op_flags: [  ]
         msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS, 
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAW, FI_ORDER_SAS, 
FI_ORDER_RMA_RAR, FI_ORDER_RMA_RAW, FI_ORDER_RMA_WAW, 
FI_ORDER_ATOMIC_RAR, FI_ORDER_ATOMIC_RAW, FI_ORDER_ATOMIC_WAW ]
         comp_order: [ FI_ORDER_STRICT, FI_ORDER_DATA ]
         total_buffered_recv: 0
         size: 384
         iov_limit: 4
     fi_ep_attr:
         type: FI_EP_MSG
         protocol: FI_PROTO_RDMA_CM_IB_RC
         protocol_version: 1
         max_msg_size: 1073741824
         msg_prefix_size: 0
         max_order_raw_size: 1073741824
         max_order_war_size: 0
         max_order_waw_size: 1073741824
         mem_tag_format: 0x0000000000000000
         tx_ctx_cnt: 1
         rx_ctx_cnt: 1
         auth_key_size: 0
     fi_domain_attr:
         domain: 0x0
         name: mlx4_0
         threading: FI_THREAD_SAFE
         control_progress: FI_PROGRESS_AUTO
         data_progress: FI_PROGRESS_AUTO
         resource_mgmt: FI_RM_ENABLED
         av_type: FI_AV_UNSPEC
         mr_mode: [ FI_MR_LOCAL, FI_MR_VIRT_ADDR, FI_MR_ALLOCATED, 
FI_MR_PROV_KEY ]
         mr_key_size: 4
         cq_data_size: 4
         cq_cnt: 65408
         ep_cnt: 163768
         tx_ctx_cnt: 1024
         rx_ctx_cnt: 1024
         max_ep_tx_ctx: 1024
         max_ep_rx_ctx: 1024
         max_ep_stx_ctx: 0
         max_ep_srx_ctx: 65472
         cntr_cnt: 0
         mr_iov_limit: 1
     caps: [ FI_LOCAL_COMM, FI_REMOTE_COMM ]
     mode: [  ]
         auth_key_size: 0
         max_err_data: 255
         mr_cnt: 524032
     fi_fabric_attr:
         name: IB-0xfe80000000000000
         prov_name: verbs
         prov_version: 111.0
         api_version: 1.11
     fid_nic:
         fi_device_attr:
             name: mlx4_0
             device_id: 0x1003
             device_version: 1
             vendor_id: 0x02c9
             driver: (null)
             firmware: 2.31.5050
         fi_bus_attr:
             fi_bus_type: FI_BUS_UNKNOWN
         fi_link_attr:
             address: (null)
             mtu: 4096
             speed: 32000000000
             state: FI_LINK_UP
             network_type: InfiniBand


So verbs provider should work as I understand it.

Output with FI_DEBUG_LEVEL="info" returns much info about how certain 
variables are not set (like rx_size), which shouldnt be a problem, right?


So where is the problem with my call? Isnt the call I make in the 
application the same as the one in the command line?

The only difference I can spot is the FI_SOURCE-flag for the 
get_info-call and I was told here before that verbs-provider supports 
said flag.


Or does the hints->domain_attr->mr_mode need to be set to FI_MR_BASIC 
despite it being non-defined for libfabric version 1.5 and later?


greetings, Arne.



More information about the Libfabric-users mailing list