[ewg] rping/cxgb3 regression

Hefty, Sean sean.hefty at intel.com
Tue Feb 15 10:18:08 PST 2011


> I'm wondering if pulling the rping changes for ofed-1.5.3 would be ok?  I
> guess to do this you would have to push a
> 1-off librdmacm without those changes?  Or maybe back up what is in OFED-
> 1.5.3 to the previous release without this
> rping change?
> 
> Thoughts?

Is the commit (93635fa33b41d356fa096242fec4ce788194b42f) below the issue?  (Btw, the author listed in my git tree is wrong.)

I don't think I want to drop back to 1.0.13 for 1.5.3, so maybe reverting this change and pushing out 1.0.14.1 would work.  There's just one other change after 1.0.14 at the moment, and it's to the build, so I'd skip a full release for now.

Let me know if you think this would work.

- Sean

---

    librdmacm/rping: Make sure CQ event thread exits before destroying the CQ

    It is possible for the CQ event thread to poll the CQ after it has been
    destroyed which can result in a seg fault on T3 interfaces.  This patch
    waits for the thread to exit before destroying the CQ.

    Signed-off-by: Steve Wise <swise at opengridcomputing.com>
    Signed-off-by: Sean Hefty <sean.hefty at intel.com>

diff --git a/examples/rping.c b/examples/rping.c
index 2d4c2de..ee292ec 100644
--- a/examples/rping.c
+++ b/examples/rping.c
@@ -280,12 +280,11 @@ static int rping_cq_event_handler(struct rping_cb *cb)
                ret = 0;

                if (wc.status) {
-                       if (wc.status != IBV_WC_WR_FLUSH_ERR) {
+                       if (wc.status != IBV_WC_WR_FLUSH_ERR)
                                fprintf(stderr,
                                        "cq completion failed status %d\n",
                                        wc.status);
-                               ret = -1;
-                       }
+                       ret = -1;
                        goto error;
                }

@@ -802,10 +801,9 @@ static void *rping_persistent_server_thread(void *arg)

        rping_test_server(cb);
        rdma_disconnect(cb->child_cm_id);
+       pthread_join(cb->cqthread, NULL);
        rping_free_buffers(cb);
        rping_free_qp(cb);
-       pthread_cancel(cb->cqthread);
-       pthread_join(cb->cqthread, NULL);
        rdma_destroy_id(cb->child_cm_id);
        free_cb(cb);
        return NULL;
@@ -890,6 +888,7 @@ static int rping_run_server(struct rping_cb *cb)

        rping_test_server(cb);
        rdma_disconnect(cb->child_cm_id);
+       pthread_join(cb->cqthread, NULL);
        rdma_destroy_id(cb->child_cm_id);
 err2:
        rping_free_buffers(cb);
@@ -1057,6 +1056,7 @@ static int rping_run_client(struct rping_cb *cb)

        rping_test_client(cb);
        rdma_disconnect(cb->cm_id);
+       pthread_join(cb->cqthread, NULL);
 err2:
        rping_free_buffers(cb);
 err1:



More information about the ewg mailing list