[ofa-general] SDP and iWARP

Craig Prescott prescott at hpc.ufl.edu
Wed Jan 23 08:59:14 PST 2008


Hi Felix;

Here are the last 4 WRs:

...
Entering iwch_post_receive
iwch_post_receive: Dumping built work request before ring_doorbell:
iwch_post_receive: WQE ffff810241d59e00: 17c001008000000d
iwch_post_receive: WQE ffff810241d59e08: 0000000000000000
iwch_post_receive: WQE ffff810241d59e10: 0000000000000001
iwch_post_receive: WQE ffff810241d59e18: 000002ff00000810
iwch_post_receive: WQE ffff810241d59e20: 000000044eac3000
iwch_post_receive: WQE ffff810241d59e28: 0000000000000000
iwch_post_receive: WQE ffff810241d59e30: 0000000000000000
iwch_post_receive: WQE ffff810241d59e38: 0000000000000000
iwch_post_receive: WQE ffff810241d59e40: 0000000000000000
iwch_post_receive: WQE ffff810241d59e48: 0000000000000000
iwch_post_receive: WQE ffff810241d59e50: 0000000000000000
iwch_post_receive: WQE ffff810241d59e58: 0000000000000000
iwch_post_receive: WQE ffff810241d59e60: 0000000000000000
iwch_post_receive: returning 0
Entering iwch_post_receive
iwch_post_receive: Dumping built work request before ring_doorbell:
iwch_post_receive: WQE ffff810241d59e80: 17c001008000000d
iwch_post_receive: WQE ffff810241d59e88: 0000000000000000
iwch_post_receive: WQE ffff810241d59e90: 0000000000000001
iwch_post_receive: WQE ffff810241d59e98: 000002ff00000810
iwch_post_receive: WQE ffff810241d59ea0: 000000044eac4000
iwch_post_receive: WQE ffff810241d59ea8: 0000000000000000
iwch_post_receive: WQE ffff810241d59eb0: 0000000000000000
iwch_post_receive: WQE ffff810241d59eb8: 0000000000000000
iwch_post_receive: WQE ffff810241d59ec0: 0000000000000000
iwch_post_receive: WQE ffff810241d59ec8: 0000000000000000
iwch_post_receive: WQE ffff810241d59ed0: 0000000000000000
iwch_post_receive: WQE ffff810241d59ed8: 0000000000000000
iwch_post_receive: WQE ffff810241d59ee0: 0000000000000000
iwch_post_receive: returning 0
Entering iwch_post_receive
iwch_post_receive: Dumping built work request before ring_doorbell:
iwch_post_receive: WQE ffff810241d59f00: 17c001008000000d
iwch_post_receive: WQE ffff810241d59f08: 0000000000000000
iwch_post_receive: WQE ffff810241d59f10: 0000000000000001
iwch_post_receive: WQE ffff810241d59f18: 000002ff00000810
iwch_post_receive: WQE ffff810241d59f20: 000000044eac5000
iwch_post_receive: WQE ffff810241d59f28: 0000000000000000
iwch_post_receive: WQE ffff810241d59f30: 0000000000000000
iwch_post_receive: WQE ffff810241d59f38: 0000000000000000
iwch_post_receive: WQE ffff810241d59f40: 0000000000000000
iwch_post_receive: WQE ffff810241d59f48: 0000000000000000
iwch_post_receive: WQE ffff810241d59f50: 0000000000000000
iwch_post_receive: WQE ffff810241d59f58: 0000000000000000
iwch_post_receive: WQE ffff810241d59f60: 0000000000000000
iwch_post_receive: returning 0
Entering iwch_post_receive
iwch_post_receive: Dumping built work request before ring_doorbell:
iwch_post_receive: WQE ffff810241d59f80: 17c001008000000d
iwch_post_receive: WQE ffff810241d59f88: 0000000000000000
iwch_post_receive: WQE ffff810241d59f90: 0000000000000001
iwch_post_receive: WQE ffff810241d59f98: 000002ff00000810
iwch_post_receive: WQE ffff810241d59fa0: 000000044eac6000
iwch_post_receive: WQE ffff810241d59fa8: 0000000000000000
iwch_post_receive: WQE ffff810241d59fb0: 0000000000000000
iwch_post_receive: WQE ffff810241d59fb8: 0000000000000000
iwch_post_receive: WQE ffff810241d59fc0: 0000000000000000
iwch_post_receive: WQE ffff810241d59fc8: 0000000000000000
iwch_post_receive: WQE ffff810241d59fd0: 0000000000000000
iwch_post_receive: WQE ffff810241d59fd8: 0000000000000000
iwch_post_receive: WQE ffff810241d59fe0: 0000000000000000
iwch_post_receive: returning 0

Thanks,
Craig


Felix Marti wrote:
> Hi Craig,
> 
> Can you please dump not only the last, but the last 4 WRs?
> 
> Thanks,
> felix
> 
>> -----Original Message-----
>> From: general-bounces at lists.openfabrics.org [mailto:general-
>> bounces at lists.openfabrics.org] On Behalf Of Craig Prescott
>> Sent: Wednesday, January 23, 2008 8:05 AM
>> To: Steve Wise
>> Cc: general at lists.openfabrics.org
>> Subject: Re: [ofa-general] SDP and iWARP
>>
>> Steve Wise wrote:
>>> Craig Prescott wrote:
>>>> Steve Wise wrote:
>>>>> Craig Prescott wrote:
>>>>>> Steve Wise wrote:
>>>>>>> Craig Prescott wrote:
>>>>>>>> The above call also emits a couple of messages
>>>>>>>> into the listener's syslog now :
>>>>>>>>
>>>>>>>> Jan  9 21:53:54 tebow2 kernel: iwch_ev_dispatch - CQE Err qpid
>>>>>>>> 0x20 opcode 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
>>>>>>>> Jan  9 21:53:54 tebow2 kernel: post_qp_event - AE qpid 0x20
>> opcode
>>>>>>>> 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
>>>>>>>>
>>>>>>> This is an async event generated due to a failure processing a
> SQ
>>>>>>> WR, I think. opcodes and status codes for iw_cxgb3 are in
>> cxio_wr.h.
>>>>>>> type 1 means it was an egress (SQ) failure
>>>>>>> status 0x6 is a base/bounds violation,
>>>>>>> but 14 seems incorrect.  That's not a valid T3 opcode. ????
>>>>>>>
>>>>>> Ok, thanks!  I guess I'm not sure what to make of that yet,
>> though.
>>>>> See where in iwch_accept_cr() the failure is happening.  It
> doesn't
>>>>> look like send_mpa_reply() is being called.
>>>>>
>>>> The ECONNRESET is coming from here in iwch_accept_cr():
>>>>
>>>> ...
>>>>         /* wait for wr_ack */
>>>>         wait_event(ep->com.waitq, ep->com.rpl_done);
>>>>         err = ep->com.rpl_err;
>>>> ...
>>>>
>>>> Is that what you thought was happening?
>>> I don't know exactly what is going on!  But the code above means
> that
>>> the firmware never successfully sent the last streaming message (the
>>> mpa-start reply) and never transitioned the connection into rdma
>> mode.
>>> And the async error might indicate that some WR was posted prior to
>>> doing the rdma_accept() and that WR had problems.
>> Ok.  I'm sorry for such a slow response.
>>
>>> a few questions:
>>>
>>> What firmware are you running?  ethtool -i will tell you.
>> [root at tebow1 ~]# ethtool -i eth4
>> driver: cxgb3
>> version: 1.0-ko
>> firmware-version: T 5.0.0 TP 1.1.0
>> bus-info: 0000:86:00.0
>>
>>> What ofed version exactly?
>> OFED 1.3 daily from a few weeks back now: OFED-1.3-20080107-0942
>>
>>> Does sdp post a SQ or RQ WR prior to doing the rdma_accept()?  Can
>> you
>>> dump that work request?  Maybe in iwch_post_send and iwch_post_recv,
>>> dump the work request after it is built and before the code rings
> the
>>> doorbell.  You can dump it as 8B flits, and be sure an put the flits
>> in
>>> host byte order.  See cxio_dump_wqe() in cxio_dbg.c...
>> The following is the last work request seen before rdma_accept():
>>
>> iwch_post_receive: Dumping built work request before ring_doorbell:
>> iwch_post_receive: WQE ffff810241d59f80: 17c001008000000d
>> iwch_post_receive: WQE ffff810241d59f88: 0000000000000000
>> iwch_post_receive: WQE ffff810241d59f90: 0000000000000001
>> iwch_post_receive: WQE ffff810241d59f98: 000002ff00000810
>> iwch_post_receive: WQE ffff810241d59fa0: 000000044eac6000
>> iwch_post_receive: WQE ffff810241d59fa8: 0000000000000000
>> iwch_post_receive: WQE ffff810241d59fb0: 0000000000000000
>> iwch_post_receive: WQE ffff810241d59fb8: 0000000000000000
>> iwch_post_receive: WQE ffff810241d59fc0: 0000000000000000
>> iwch_post_receive: WQE ffff810241d59fc8: 0000000000000000
>> iwch_post_receive: WQE ffff810241d59fd0: 0000000000000000
>> iwch_post_receive: WQE ffff810241d59fd8: 0000000000000000
>> iwch_post_receive: WQE ffff810241d59fe0: 0000000000000000
>> iwch_post_receive: returning 0
>>
>> This comes from sdp_init_qp(), via sdp_connect_handler().
>> There are a total of 64 work requests (all from
>> iwch_post_receive()) generated while the netserver is
>> trying to handle the RDMA_CM_EVENT_CONNECT_REQUEST.
>>
>> Can you help me decode the above work request?
>>
>> Thanks,
>> Craig
>>
>>
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-
>> general




More information about the general mailing list