[ofa-general] Race condition in userspace libraries with create/destroy qp
Jack Morgenstein
jackm at dev.mellanox.co.il
Thu Nov 20 02:11:45 PST 2008
Roland,
Mazal Tov again on the birth of your son. I hope all is well.
Has your latency improved (by some miracle)?
There seems to be a race in libmlx4 (which our regression testing found).
mlx4_create_qp and mlx4_destroy_qp are not atomic WRT each other. If one thread is
destroying a QP while another is creating a qp, there is a race hole. The destroying thread
can lose its timeslice after it has deleted the QP from kernel space, but before it has cleared
it from userspace store (mlx4_clear_qp).
If the other thread creates a qp during this break, it gets the same QP base number and overwrites
the destroyed QPs entry with mlx4_store_qp().
When the destroying thread resumes, it clears the new entry from the userspace store via
mlx4_clear_qp.
I'm debating between a couple of options:
1. move the mlx4_qp_clear to precede ibv_cmd_destroy_qp. However, what if we're still getting
completions for this qp? Ouch.
2. Create a mutex for this purpose, and use it to force the create and destroy qp operations
to be atomic WRT the ibv_cmd_xxx_qp operations and the store/clear qp operations.
3. Force kernel space to avoid allocating a just-deleted qp number (this is my least favorite option).
My preference is for #2, as being the simplest to implement and having no side-effects.
What do you think?
- Jack
(BTW libmthca has the same issue).
========================================================
From file libmlx4/src/verbs.c:
mlx4_create_qp snippet:
ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd, sizeof cmd,
&resp, sizeof resp);
if (ret)
goto err_rq_db;
ret = mlx4_store_qp(to_mctx(pd->context), qp->ibv_qp.qp_num, qp);
if (ret)
goto err_destroy;
mlx4_destroy_qp snippet:
ret = ibv_cmd_destroy_qp(ibqp);
if (ret)
return ret;
==> CAN LOSE TIME SLICE HERE!!!
mlx4_lock_cqs(ibqp);
__mlx4_cq_clean(to_mcq(ibqp->recv_cq), ibqp->qp_num,
ibqp->srq ? to_msrq(ibqp->srq) : NULL);
if (ibqp->send_cq != ibqp->recv_cq)
__mlx4_cq_clean(to_mcq(ibqp->send_cq), ibqp->qp_num, NULL);
mlx4_clear_qp(to_mctx(ibqp->context), ibqp->qp_num);
mlx4_unlock_cqs(ibqp);
More information about the general
mailing list