[ofa-general] Race condition in userspace libraries with create/destroy qp

Jack Morgenstein jackm at dev.mellanox.co.il
Thu Nov 20 02:11:45 PST 2008


Roland,

Mazal Tov again on the birth of your son.  I hope all is well.
Has your latency improved (by some miracle)?

There seems to be a race in libmlx4 (which our regression testing found).

mlx4_create_qp and mlx4_destroy_qp are not atomic WRT each other. If one thread is
destroying a QP while another is creating a qp, there is a race hole.  The destroying thread
can lose its timeslice after it has deleted the QP from kernel space, but before it has cleared
it from userspace store (mlx4_clear_qp).
If the other thread creates a qp during this break, it gets the same QP base number and overwrites
the destroyed QPs entry with mlx4_store_qp().

When the destroying thread resumes, it clears the new entry from the userspace store via
mlx4_clear_qp.

I'm debating between a couple of options:
1. move the mlx4_qp_clear to precede ibv_cmd_destroy_qp. However, what if we're still getting
   completions for this qp? Ouch.

2. Create a mutex for this purpose, and use it to force the create and destroy qp operations
   to be atomic WRT  the ibv_cmd_xxx_qp operations and the store/clear qp operations.

3. Force kernel space to avoid allocating a just-deleted qp number (this is my least favorite option).

My preference is for #2, as being the simplest to implement and having no side-effects.

What do you think?

- Jack

(BTW libmthca has the same issue).

========================================================

From file libmlx4/src/verbs.c:

mlx4_create_qp snippet:

        ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd,
                                &resp, sizeof resp);
        if (ret)
                goto err_rq_db;

        ret = mlx4_store_qp(to_mctx(pd->context), qp->ibv_qp.qp_num, qp);
        if (ret)
                goto err_destroy;

mlx4_destroy_qp snippet:

        ret = ibv_cmd_destroy_qp(ibqp);
        if (ret)
                return ret;
==> CAN LOSE TIME SLICE HERE!!!
        mlx4_lock_cqs(ibqp);

        __mlx4_cq_clean(to_mcq(ibqp->recv_cq), ibqp->qp_num,
                        ibqp->srq ? to_msrq(ibqp->srq) : NULL);
        if (ibqp->send_cq != ibqp->recv_cq)
                __mlx4_cq_clean(to_mcq(ibqp->send_cq), ibqp->qp_num, NULL);

        mlx4_clear_qp(to_mctx(ibqp->context), ibqp->qp_num);

        mlx4_unlock_cqs(ibqp);



More information about the general mailing list