[libfabric-users] Heap use after free from completion queue data fi_cq_sread()
Arne Struck
arnestruck at astruck.de
Fri Feb 7 09:00:05 PST 2020
Hello,
I am pretty new to libfabric, so it is most likely I made a mistake. I
just do not find out what the problem is. So user mailing list it is.
I got tasked with integrating libfabric into an existing University
project. To be precise into the lower level functions of the project
which are managing data transfer, but keep their interfaces intact as
possible.
Since it is testing stage for the implementation sockets provider on a
local machine is used. And since the future target servers run on older
OS version it will be using version 1.5.4 until OS upgrade.
Now to my problem at hand:
I use fi_cq_sread() to retrieve information whether Data was previously
received (fi_recv). Since the function is called on the Server side, it
is possible that no Data is received (Server loops over project receive
function to look for new data).
Data is spliced in 2 parts, a Header which contains information about
the Data to come and the bigger chunks of actual Data. If no header is
received, the calls for the actual Data are skipped.
When bigger amounts of Data are received, a segfault "heap use after
free" occurs regarding the completion queue entry structure in use.
The Problem happens at the read for the Data header.
It can be circumvented by enlarging the timeout, but you will surely
agree that guesswork for good timing and hoping that no additional time
is required cant be the solution. And due to the programs structure
explained above it is impossible to set the timeout to -1.
Any Idea what I did wrong? Do you need additional Information?
Greetings and thanks in advance,
Arne.
More information about the Libfabric-users
mailing list