[openfabrics-ewg] OFED 1.1

Betsy Zeller betsy at pathscale.com
Thu Sep 21 19:52:38 PDT 2006


On RC6, all of our SDP testing passed. The bug we encountered is
intermittent, but it is true we were debugging with code from SVN, which
doesn't yet have this fix. Unless this shows up again, I'm OK with the
current solution for 233.

- Betsy

On Thu, 2006-09-21 at 09:11 +0300, Michael S. Tsirkin wrote:
> Quoting r. Betsy Zeller <betsy at pathscale.com>:
> > We've made some adds to bug 233, which should help you in tracking it
> > down.
> 
> Betsy, seems like you are running the wrong code:
> I don't understand what's causing the panic reported by 233:
> you report kernel BUG at sdp_bcopy.c line 230, but there's
> no BUG_ON at sdp_bcopy.c line 230 when I install -pre2 here.
> 
> Here's how it looks:
> 
> 230:    ssk->rx_wr.sg_list = ssk->ibsge;
>         ssk->rx_wr.num_sge = frags + 1;
>         rc = ib_post_recv(ssk->qp, &ssk->rx_wr, &bad_wr);
>         ++ssk->rx_head;
>         if (unlikely(rc)) {
>                 sdp_dbg(&ssk->isk.sk, "ib_post_recv failed with status %d\n",
> rc);
>                 sdp_reset(&ssk->isk.sk);
>         }
> 
> 
> We did have a BUG_ON there before pre1 which makes it look to me
> like you got the wrong SDP somehow.
> 
> I have just verified that the code in RC6 pre2 tarball in SVN matches my copy,
> by downloading it from here:
> https://openib.org/svn/gen2/branches/1.1/ofed/releases/OFED-1.1-rc6-pre2.tgz
> 
> opening up
> OFED-1.1-rc6-pre2/SOURCES/openib-1.1.tgz
> 
> and looking at openib-1.1/drivers/infiniband/ulp/sdp/sdp_bcopy.c
> 
> I'd recomment reboot after swapping pre-release OFED revisions,
> to make sure some old module does not stick in memory.
> Please comment.
> 
> > I'd be very interested to see this fixed for OFED 1.1.
> 
> You'll have to debug this - does not happen to me on mthca at all,
> and I don't see what it coud be - looks like a low level driver issue
> at the moment.
> 





More information about the ewg mailing list