[libfabric-users] fi_read verbs ENODATA
Arne
arnestruck at astruck.de
Fri Oct 23 09:06:38 PDT 2020
Hello, its me again.
Since there is a method how to run the application with the sockets
provider at the moment, I wanted to change provider to verbs.
I know there is trouble-shooting info for this on the verbs man-entry,
but I cant find my error after doing the steps described, but 90% sure
my input for hints is faulty (dont see why though, thus the question).
Problem is: on server fi_getinfo does return "no data
available"/FI_ENODATA with:
(hints is allocated struct fi_info*):
hints->caps = FI_RMA|FI_MSG;
hints->ep_attr->type = FI_EP_MSG;
hints->addr_format = FI_SOCKADDR_IN;
hints->fabric_attr->prov_name = g_strdup("verbs");
hints->mode = FI_LOCAL_MR;
error = fi_getinfo(FI_VERSION(1, 11),
"10.0.10.2",
"4711",
FI_SOURCE,
hints,
&info);
(setting service to "0" will give same result).
fi_info -p verbs -P 4711 -n 10.0.10.2 -t FI_EP_MSG -a FI_SOCKADDR_IN -c
FI_RMA -c FI_MSG -v
called on the same node returns 1 fi_info:
fi_info:
caps: [ FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND,
FI_REMOTE_READ, FI_REMOTE_WRITE, FI_LOCAL_COMM, FI_REMOTE_COMM ]
mode: [ FI_RX_CQ_DATA ]
addr_format: FI_SOCKADDR_IN
src_addrlen: 16
dest_addrlen: 16
src_addr: fi_sockaddr_in://10.0.10.2:0
dest_addr: fi_sockaddr_in://10.0.10.2:4711
handle: (nil)
fi_tx_attr:
caps: [ FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_SEND ]
mode: [ ]
op_flags: [ ]
msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS,
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAW, FI_ORDER_SAS,
FI_ORDER_RMA_RAR, FI_ORDER_RMA_RAW, FI_ORDER_RMA_WAW,
FI_ORDER_ATOMIC_RAR, FI_ORDER_ATOMIC_RAW, FI_ORDER_ATOMIC_WAW ]
comp_order: [ FI_ORDER_STRICT ]
inject_size: 256
size: 384
iov_limit: 4
rma_iov_limit: 1
fi_rx_attr:
caps: [ FI_MSG, FI_RMA, FI_RECV, FI_REMOTE_READ, FI_REMOTE_WRITE ]
mode: [ FI_RX_CQ_DATA ]
op_flags: [ ]
msg_order: [ FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_RAS,
FI_ORDER_WAW, FI_ORDER_WAS, FI_ORDER_SAW, FI_ORDER_SAS,
FI_ORDER_RMA_RAR, FI_ORDER_RMA_RAW, FI_ORDER_RMA_WAW,
FI_ORDER_ATOMIC_RAR, FI_ORDER_ATOMIC_RAW, FI_ORDER_ATOMIC_WAW ]
comp_order: [ FI_ORDER_STRICT, FI_ORDER_DATA ]
total_buffered_recv: 0
size: 384
iov_limit: 4
fi_ep_attr:
type: FI_EP_MSG
protocol: FI_PROTO_RDMA_CM_IB_RC
protocol_version: 1
max_msg_size: 1073741824
msg_prefix_size: 0
max_order_raw_size: 1073741824
max_order_war_size: 0
max_order_waw_size: 1073741824
mem_tag_format: 0x0000000000000000
tx_ctx_cnt: 1
rx_ctx_cnt: 1
auth_key_size: 0
fi_domain_attr:
domain: 0x0
name: mlx4_0
threading: FI_THREAD_SAFE
control_progress: FI_PROGRESS_AUTO
data_progress: FI_PROGRESS_AUTO
resource_mgmt: FI_RM_ENABLED
av_type: FI_AV_UNSPEC
mr_mode: [ FI_MR_LOCAL, FI_MR_VIRT_ADDR, FI_MR_ALLOCATED,
FI_MR_PROV_KEY ]
mr_key_size: 4
cq_data_size: 4
cq_cnt: 65408
ep_cnt: 163768
tx_ctx_cnt: 1024
rx_ctx_cnt: 1024
max_ep_tx_ctx: 1024
max_ep_rx_ctx: 1024
max_ep_stx_ctx: 0
max_ep_srx_ctx: 65472
cntr_cnt: 0
mr_iov_limit: 1
caps: [ FI_LOCAL_COMM, FI_REMOTE_COMM ]
mode: [ ]
auth_key_size: 0
max_err_data: 255
mr_cnt: 524032
fi_fabric_attr:
name: IB-0xfe80000000000000
prov_name: verbs
prov_version: 111.0
api_version: 1.11
fid_nic:
fi_device_attr:
name: mlx4_0
device_id: 0x1003
device_version: 1
vendor_id: 0x02c9
driver: (null)
firmware: 2.31.5050
fi_bus_attr:
fi_bus_type: FI_BUS_UNKNOWN
fi_link_attr:
address: (null)
mtu: 4096
speed: 32000000000
state: FI_LINK_UP
network_type: InfiniBand
So verbs provider should work as I understand it.
Output with FI_DEBUG_LEVEL="info" returns much info about how certain
variables are not set (like rx_size), which shouldnt be a problem, right?
So where is the problem with my call? Isnt the call I make in the
application the same as the one in the command line?
The only difference I can spot is the FI_SOURCE-flag for the
get_info-call and I was told here before that verbs-provider supports
said flag.
Or does the hints->domain_attr->mr_mode need to be set to FI_MR_BASIC
despite it being non-defined for libfabric version 1.5 and later?
greetings, Arne.
More information about the Libfabric-users
mailing list