[libfabric-users] Heap use after free from completion queue data fi_cq_sread()

Hefty, Sean sean.hefty at intel.com
Mon Feb 10 11:15:35 PST 2020


> I am pretty new to libfabric, so it is most likely I made a mistake. I
> just do not find out what the problem is. So user mailing list it is.
> 
> I got tasked with integrating libfabric into an existing University
> project. To be precise into the lower level functions of the project
> which are managing data transfer, but keep their interfaces intact as
> possible.
> 
> Since it is testing stage for the implementation sockets provider on a
> local machine is used. And since the future target servers run on older
> OS version it will be using version 1.5.4 until OS upgrade.
> 
> Now to my problem at hand:
> 
> I use fi_cq_sread() to retrieve information whether Data was previously
> received (fi_recv). Since the function is called on the Server side, it
> is possible that no Data is received (Server loops over project receive
> function to look for new data).
> 
> Data is spliced in 2 parts, a Header which contains information about
> the Data to come and the bigger chunks of actual Data. If no header is
> received, the calls for the actual Data are skipped.
> 
> When bigger amounts of Data are received, a segfault "heap use after
> free" occurs regarding the completion queue entry structure in use.
> 
> The Problem happens at the read for the Data header.
> 
> It can be circumvented by enlarging the timeout, but you will surely
> agree that guesswork for good timing and hoping that no additional time
> is required cant be the solution. And due to the programs structure
> explained above it is impossible to set the timeout to -1.
> 
> Any Idea what I did wrong? Do you need additional Information?

Trying to debug a segfault based on email is challenging.  The only guess I have is to examine the lifetime of the application structures that might be passed as the context into receive operations, or that receive completion processing will want to access.  Ensure that the structure is not re-used until the receive it is associated with completes.

- Sean


More information about the Libfabric-users mailing list