[ofa-general] Re: Race condition in userspace libraries with create/destroy qp
Roland Dreier
rdreier at cisco.com
Thu Nov 20 14:50:51 PST 2008
> mlx4_create_qp and mlx4_destroy_qp are not atomic WRT each other. If one thread is
> destroying a QP while another is creating a qp, there is a race hole. The destroying thread
> can lose its timeslice after it has deleted the QP from kernel space, but before it has cleared
> it from userspace store (mlx4_clear_qp).
> If the other thread creates a qp during this break, it gets the same QP base number and overwrites
> the destroyed QPs entry with mlx4_store_qp().
Yes, looks like a real bug.
> 2. Create a mutex for this purpose, and use it to force the create and destroy qp operations
> to be atomic WRT the ibv_cmd_xxx_qp operations and the store/clear qp operations.
This looks like the best solution.
I wonder if we should just add this synchronization in libibverbs rather
than individual drivers? I notice that libcxgb3 seems to have the same
bug AFAICS. But maybe it's better to just keep the simple rule that
driver libraries are responsible for locking their own data structures.
- R.
More information about the general
mailing list