[ofa-general] SDP and iWARP

Felix Marti felix at chelsio.com
Wed Jan 23 08:57:23 PST 2008


Hi Craig,

Can you please dump not only the last, but the last 4 WRs?

Thanks,
felix

> -----Original Message-----
> From: general-bounces at lists.openfabrics.org [mailto:general-
> bounces at lists.openfabrics.org] On Behalf Of Craig Prescott
> Sent: Wednesday, January 23, 2008 8:05 AM
> To: Steve Wise
> Cc: general at lists.openfabrics.org
> Subject: Re: [ofa-general] SDP and iWARP
> 
> Steve Wise wrote:
> > Craig Prescott wrote:
> >> Steve Wise wrote:
> >>> Craig Prescott wrote:
> >>>> Steve Wise wrote:
> >>>>>
> >>>>> Craig Prescott wrote:
> >>>>>>
> >>>>>> The above call also emits a couple of messages
> >>>>>> into the listener's syslog now :
> >>>>>>
> >>>>>> Jan  9 21:53:54 tebow2 kernel: iwch_ev_dispatch - CQE Err qpid
> >>>>>> 0x20 opcode 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
> >>>>>> Jan  9 21:53:54 tebow2 kernel: post_qp_event - AE qpid 0x20
> opcode
> >>>>>> 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
> >>>>>>
> >>>>> This is an async event generated due to a failure processing a
SQ
> >>>>> WR, I think. opcodes and status codes for iw_cxgb3 are in
> cxio_wr.h.
> >>>>> type 1 means it was an egress (SQ) failure
> >>>>> status 0x6 is a base/bounds violation,
> >>>>> but 14 seems incorrect.  That's not a valid T3 opcode. ????
> >>>>>
> >>>>
> >>>> Ok, thanks!  I guess I'm not sure what to make of that yet,
> though.
> >>>>
> >>>
> >>> See where in iwch_accept_cr() the failure is happening.  It
doesn't
> >>> look like send_mpa_reply() is being called.
> >>>
> >>
> >> The ECONNRESET is coming from here in iwch_accept_cr():
> >>
> >> ...
> >>         /* wait for wr_ack */
> >>         wait_event(ep->com.waitq, ep->com.rpl_done);
> >>         err = ep->com.rpl_err;
> >> ...
> >>
> >> Is that what you thought was happening?
> >
> > I don't know exactly what is going on!  But the code above means
that
> > the firmware never successfully sent the last streaming message (the
> > mpa-start reply) and never transitioned the connection into rdma
> mode.
> > And the async error might indicate that some WR was posted prior to
> > doing the rdma_accept() and that WR had problems.
> 
> Ok.  I'm sorry for such a slow response.
> 
> > a few questions:
> >
> > What firmware are you running?  ethtool -i will tell you.
> 
> [root at tebow1 ~]# ethtool -i eth4
> driver: cxgb3
> version: 1.0-ko
> firmware-version: T 5.0.0 TP 1.1.0
> bus-info: 0000:86:00.0
> 
> > What ofed version exactly?
> 
> OFED 1.3 daily from a few weeks back now: OFED-1.3-20080107-0942
> 
> > Does sdp post a SQ or RQ WR prior to doing the rdma_accept()?  Can
> you
> > dump that work request?  Maybe in iwch_post_send and iwch_post_recv,
> > dump the work request after it is built and before the code rings
the
> > doorbell.  You can dump it as 8B flits, and be sure an put the flits
> in
> > host byte order.  See cxio_dump_wqe() in cxio_dbg.c...
> 
> The following is the last work request seen before rdma_accept():
> 
> iwch_post_receive: Dumping built work request before ring_doorbell:
> iwch_post_receive: WQE ffff810241d59f80: 17c001008000000d
> iwch_post_receive: WQE ffff810241d59f88: 0000000000000000
> iwch_post_receive: WQE ffff810241d59f90: 0000000000000001
> iwch_post_receive: WQE ffff810241d59f98: 000002ff00000810
> iwch_post_receive: WQE ffff810241d59fa0: 000000044eac6000
> iwch_post_receive: WQE ffff810241d59fa8: 0000000000000000
> iwch_post_receive: WQE ffff810241d59fb0: 0000000000000000
> iwch_post_receive: WQE ffff810241d59fb8: 0000000000000000
> iwch_post_receive: WQE ffff810241d59fc0: 0000000000000000
> iwch_post_receive: WQE ffff810241d59fc8: 0000000000000000
> iwch_post_receive: WQE ffff810241d59fd0: 0000000000000000
> iwch_post_receive: WQE ffff810241d59fd8: 0000000000000000
> iwch_post_receive: WQE ffff810241d59fe0: 0000000000000000
> iwch_post_receive: returning 0
> 
> This comes from sdp_init_qp(), via sdp_connect_handler().
> There are a total of 64 work requests (all from
> iwch_post_receive()) generated while the netserver is
> trying to handle the RDMA_CM_EVENT_CONNECT_REQUEST.
> 
> Can you help me decode the above work request?
> 
> Thanks,
> Craig
> 
> 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general



More information about the general mailing list