[libfabric-users] Heap use after free from completion queue data fi_cq_sread()

Arne Struck arnestruck at astruck.de
Fri Feb 7 09:00:05 PST 2020


Hello,


I am pretty new to libfabric, so it is most likely I made a mistake. I 
just do not find out what the problem is. So user mailing list it is.

I got tasked with integrating libfabric into an existing University 
project. To be precise into the lower level functions of the project 
which are managing data transfer, but keep their interfaces intact as 
possible.

Since it is testing stage for the implementation sockets provider on a 
local machine is used. And since the future target servers run on older 
OS version it will be using version 1.5.4 until OS upgrade.

Now to my problem at hand:

I use fi_cq_sread() to retrieve information whether Data was previously 
received (fi_recv). Since the function is called on the Server side, it 
is possible that no Data is received (Server loops over project receive 
function to look for new data).

Data is spliced in 2 parts, a Header which contains information about 
the Data to come and the bigger chunks of actual Data. If no header is 
received, the calls for the actual Data are skipped.

When bigger amounts of Data are received, a segfault "heap use after 
free" occurs regarding the completion queue entry structure in use.

The Problem happens at the read for the Data header.

It can be circumvented by enlarging the timeout, but you will surely 
agree that guesswork for good timing and hoping that no additional time 
is required cant be the solution. And due to the programs structure 
explained above it is impossible to set the timeout to -1.

Any Idea what I did wrong? Do you need additional Information?


Greetings and thanks in advance,

Arne.




More information about the Libfabric-users mailing list