[ofa-general] ibv_post_send fails when using malloc in a special way

Asmund Ostvold aostvold at platform.com
Wed Dec 3 04:58:15 PST 2008


Hi again,

We understood that the last issue report was a bit minimalistic.
We hope that report will be of more interest.  If you have ANY
question please send me email and I will answer!

We have run into a strange problem:

     * RDMA(ibv_post_send) fails when using malloc in a special way
     * We have an issue which we have reduced down to the enclosed
        program. it is not neat; but is able to demonstrate what we think
        is a problem with either ibverbs, libc, or the kernel
     * The issue is related to the sending process
     * Receiver sees incorrect data
     * The receiver will use "private" receive buffers for all RDMAs,
        i.e., each RDMA "put" will place memory into a distinct receive
       memory area
     * The sender will re-use the memory area
     * To trig the problem, we need a malloc() attempting to allocate
        huge amount of memory, but which fails. Without this failing
        malloc(), everything is OK. Please note that malloc changes
        allocation policy after this failing malloc (see below), and this
        behavior is what we observed in a pthreads program where we first
        discovered the issue. It must also be noted that if we allocate
        buffers with malloc instead of valloc it works fine...
     * Someone reviewing this would probably say: "The problem comes from
        potential munmap()+mmap() or an mremap()". We acknowledge that the
        failing program is vulnerable in this context, but strace does not
        reveal any such change in virtual to physical mapping. (And be
        aware, this is a stripped down example of a much more complicated
        scenario)
     * We are concerned that the call to ibv_reg_mr() does not imply a
        call to madvise()
     * We have tested with
            rhel4.6,
            kernel: 2.6.9-67.ELsmp, x86_64
            libibverbs-1.1.1-1.ofed1.3.1,
            Mellanox Technologies MT23108 InfiniHost (rev a1)
        and with:
            rhel5.2,
            kernel: 2.6.18-92.el5, x86_64
            libibverbs-1.1.2-1.ofed1.4.rc6,
            Mellanox Technologies MT25418 (rev a0)



Here is the special malloc behaviour:

[      3afb6c40bc] mmap(NULL, 39999000576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[      3afb6bfb0a] brk(0x95073b000)     = 0x526000
[      3afb6c40bc] mmap(NULL, 39999135744, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[      3afb6c40bc] mmap(NULL, 2097152, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x46e8632000
[      3afb6c40e9] munmap(0x46e8632000, 843776) = 0
[      3afb6c40e9] munmap(0x46e8800000, 204800) = 0
[      3afb6c4119] mprotect(0x46e8700000, 135168, PROT_READ|PROT_WRITE) = 0
[      3afb6c40bc] mmap(NULL, 39999000576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)

Please note the mmap() returning 0x46e8632000. Its uses PROT_NONE and
MAP_NORESERVE. Not sure if it is related. The RDMA send buffer that
fails, comes from this mmap.


Here is a shortform description of the way we allocate/free buffers:

     * Buffers are allocated using valloc
     * Buffers are registered using ibv_reg_mr if it not already registered
     * Buffers are initiated with unique data
     * Data is copied to receiver with ibv_post_send
     * We wait with ibv_poll_cq
     * Buffers are freed using free
     * When we start getting same buffer addresses from valloc and we
        don't register memory, data becomes wrong at the receiver side. We
        get partial data from previous buffer.


strace/ltrace are available if anyone is interested.


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bug.c
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20081203/e0352d33/attachment.c>


More information about the general mailing list