[libfabric-users] connection-less send/recv with verbs
Maurizio Drocco
drocco at di.unito.it
Thu Jul 13 13:22:42 PDT 2017
Thank you Sean,
I forgot to mention the version of libfabric I was using: git master, commit 8d192f2.
The backtrace is as follows (buf points to a 32-byte memory chunk):
#0 0x00007ffff798dd7e in fi_ibv_rdm_init_recv_request ()
from /archive/home/mdrocco/usr/lib/libfabric.so.1
#1 0x00007ffff798c728 in fi_ibv_rdm_recvmsg ()
from /archive/home/mdrocco/usr/lib/libfabric.so.1
#2 0x00007ffff798c8bf in fi_ibv_rdm_recv ()
from /archive/home/mdrocco/usr/lib/libfabric.so.1
#3 0x0000000000402937 in fi_recv (ep=0x6303d0, buf=0x633810, len=32,
desc=0x0, src_addr=18446744073709551615, context=0x0)
at /archive/home/mdrocco/usr/include/rdma/fi_endpoint.h:263
I double-checked that the endpoint is enabled before calling fi_recv.
As soon as possible, I will try with 1.5.0rc1 as suggested.
---
Maurizio Drocco
PhD Candidate
University of Torino, department of Computer Science
Via Pessinetto 12, 10149 Torino - Italy
> On 13 Jul 2017, at 19:00, Hefty, Sean <sean.hefty at intel.com> wrote:
>
>> The scenario:
>> The code is an all-to-all network of processes, with connection-less
>> send/recv communication.
>> All addresses and services are known statically at start time.
>> Each process has an endpoint, to which it posts both send and recv
>> requests (via fi_send/fi_recv); the endpoint is created from a fabric
>> that is created by passing its address, its service and FI_SOURCE flag
>> to fi_getinfo.
>> Then each process fills an AV table with address/service of all the
>> other nodes.
>>
>> The problem:
>> With verbs, the code crashes on the first call to fi_recv, with the
>> following call stack:
>> fi_recv - fi_ibv_rdm_recv - fi_ibv_rdm_recvmsg -
>> fi_ibv_rdm_init_recv_request
>>
>> Do you have any idea about what is going on? If it helps, I can
>> recompile libfabric with some options for debugging.
>
> Do you have a backtrace available? This sounds like a possible null pointer dereference.
>
> If you have access to 1.5.0rc1, you can try using the "ofi-rxm:verbs" provider combination instead of the verbs rdm support. Verbs rdm support has limited testing and specifically targets Intel MPI use.
>
> The only other idea I have without more details is to ensure that the endpoint has been enabled prior to posting receive buffers.
>
> - Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20170713/9c6a5153/attachment.html>
More information about the Libfabric-users
mailing list