[ofa-general] Re: [PATCH] - [resend] Corrects a race in ipoib_cm_post_receive_nonsrq()
Hoang-Nam Nguyen
HNGUYEN at de.ibm.com
Thu Jun 26 03:24:16 PDT 2008
> > - struct ib_recv_wr *bad_wr;
> > + struct ib_recv_wr *bad_wr, rx_wr;
> > + struct ib_sge rx_sge[IPOIB_CM_RX_SG];
>
> I worry about this putting an extra 300 bytes or so on the stack... I
> think it would be nicer if there were a way to just make sure receive
> posting was single-threaded, but I still don't know which contexts are
> racing against each other...
The race basically occurs between ipoib_poll() and
ipoib_cm_nonsrq_init_rx()
as Shirley mentioned. Below is sort of seq diagram for a possible race:
ipoib_poll()
ipoib_cm_req_handler()
ipoib_cm_handle_rx_wc() ipoib_cm_nonsrq_init_rx()
ipoib_cm_post_receive_nonsrq()
ipoib_cm_post_receive_nonsrq()
priv->cm.rx_wr.wr_id = id <<<<
>>>> priv->cm.rx_wr.wr_id = id
ib_post_recv()
ib_post_recv()
Makes sense?
Yes, we should avoid allocating those vars on the stack.
Either we use same pattern as done for SRQ, ie rx_wr_arr[conn]
or something like this:
- enhance ipoib_cm_post_receive_nonsrq() with wr and sge passed to by
the caller, ie ipoib_poll() -> ipoib_cm_handle_rx_wc() resp.
ipoib_cm_nonsrq_init_rx()
- ipoib_cm_handle_rx_wc() passes rx_wr and rx_sge as already defined
in struct ipoib_cm_dev_priv to ipoib_cm_post_receive_nonsrq()
since we use one cq, there should not be a race in this path
- ipoib_cm_nonsrq_init_rx() allocates wr and sge on the heap or slabcache,
passes them to ipoib_cm_post_receive_nonsrq() and frees them thereafter
If you want me to make a patch, glad to do.
Nam
More information about the general
mailing list