[ewg] rping/cxgb3 regression

Hefty, Sean sean.hefty at intel.com
Tue Feb 15 17:46:44 PST 2011


I placed a 1.0.14.1 package on the ofa server in the downloads/rdmacm section.  Can you verify that it works?  If so, I'll ask to pull it into 1.5.3

> -----Original Message-----
> From: Steve Wise [mailto:swise at opengridcomputing.com]
> Sent: Tuesday, February 15, 2011 10:37 AM
> To: Hefty, Sean
> Cc: OpenFabrics EWG; Tziporet Koren
> Subject: Re: rping/cxgb3 regression
> 
> 
> On 02/15/2011 12:18 PM, Hefty, Sean wrote:
> >> I'm wondering if pulling the rping changes for ofed-1.5.3 would be ok?
> I
> >> guess to do this you would have to push a
> >> 1-off librdmacm without those changes?  Or maybe back up what is in
> OFED-
> >> 1.5.3 to the previous release without this
> >> rping change?
> >>
> >> Thoughts?
> > Is the commit (93635fa33b41d356fa096242fec4ce788194b42f) below the issue?
> (Btw, the author listed in my git tree is wrong.)
> >
> 
> Yes.
> 
> > I don't think I want to drop back to 1.0.13 for 1.5.3, so maybe reverting
> this change and pushing out 1.0.14.1 would work.  There's just one other
> change after 1.0.14 at the moment, and it's to the build, so I'd skip a
> full release for now.
> >
> > Let me know if you think this would work.
> >
> 
> I just tested that removing this from 1.0.14 will resolve the issue for
> 1.5.3.
> 
> 
> > - Sean
> >
> > ---
> >
> >      librdmacm/rping: Make sure CQ event thread exits before destroying
> the CQ
> >
> >      It is possible for the CQ event thread to poll the CQ after it has
> been
> >      destroyed which can result in a seg fault on T3 interfaces.  This
> patch
> >      waits for the thread to exit before destroying the CQ.
> >
> >      Signed-off-by: Steve Wise<swise at opengridcomputing.com>
> >      Signed-off-by: Sean Hefty<sean.hefty at intel.com>
> >
> > diff --git a/examples/rping.c b/examples/rping.c
> > index 2d4c2de..ee292ec 100644
> > --- a/examples/rping.c
> > +++ b/examples/rping.c
> > @@ -280,12 +280,11 @@ static int rping_cq_event_handler(struct rping_cb
> *cb)
> >                  ret = 0;
> >
> >                  if (wc.status) {
> > -                       if (wc.status != IBV_WC_WR_FLUSH_ERR) {
> > +                       if (wc.status != IBV_WC_WR_FLUSH_ERR)
> >                                  fprintf(stderr,
> >                                          "cq completion failed status
> %d\n",
> >                                          wc.status);
> > -                               ret = -1;
> > -                       }
> > +                       ret = -1;
> >                          goto error;
> >                  }
> >
> > @@ -802,10 +801,9 @@ static void *rping_persistent_server_thread(void
> *arg)
> >
> >          rping_test_server(cb);
> >          rdma_disconnect(cb->child_cm_id);
> > +       pthread_join(cb->cqthread, NULL);
> >          rping_free_buffers(cb);
> >          rping_free_qp(cb);
> > -       pthread_cancel(cb->cqthread);
> > -       pthread_join(cb->cqthread, NULL);
> >          rdma_destroy_id(cb->child_cm_id);
> >          free_cb(cb);
> >          return NULL;
> > @@ -890,6 +888,7 @@ static int rping_run_server(struct rping_cb *cb)
> >
> >          rping_test_server(cb);
> >          rdma_disconnect(cb->child_cm_id);
> > +       pthread_join(cb->cqthread, NULL);
> >          rdma_destroy_id(cb->child_cm_id);
> >   err2:
> >          rping_free_buffers(cb);
> > @@ -1057,6 +1056,7 @@ static int rping_run_client(struct rping_cb *cb)
> >
> >          rping_test_client(cb);
> >          rdma_disconnect(cb->cm_id);
> > +       pthread_join(cb->cqthread, NULL);
> >   err2:
> >          rping_free_buffers(cb);
> >   err1:




More information about the ewg mailing list