[ofa-general] SDP and iWARP
Craig Prescott
prescott at hpc.ufl.edu
Fri Jan 11 08:40:13 PST 2008
Steve Wise wrote:
> Craig Prescott wrote:
>> Steve Wise wrote:
>>> Craig Prescott wrote:
>>>> Steve Wise wrote:
>>>>>
>>>>> Craig Prescott wrote:
>>>>>>
>>>>>> The above call also emits a couple of messages
>>>>>> into the listener's syslog now :
>>>>>>
>>>>>> Jan 9 21:53:54 tebow2 kernel: iwch_ev_dispatch - CQE Err qpid
>>>>>> 0x20 opcode 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
>>>>>> Jan 9 21:53:54 tebow2 kernel: post_qp_event - AE qpid 0x20 opcode
>>>>>> 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
>>>>>>
>>>>> This is an async event generated due to a failure processing a SQ
>>>>> WR, I think. opcodes and status codes for iw_cxgb3 are in cxio_wr.h.
>>>>> type 1 means it was an egress (SQ) failure
>>>>> status 0x6 is a base/bounds violation,
>>>>> but 14 seems incorrect. That's not a valid T3 opcode. ????
>>>>>
>>>>
>>>> Ok, thanks! I guess I'm not sure what to make of that yet, though.
>>>>
>>>
>>> See where in iwch_accept_cr() the failure is happening. It doesn't
>>> look like send_mpa_reply() is being called.
>>>
>>
>> The ECONNRESET is coming from here in iwch_accept_cr():
>>
>> ...
>> /* wait for wr_ack */
>> wait_event(ep->com.waitq, ep->com.rpl_done);
>> err = ep->com.rpl_err;
>> ...
>>
>> Is that what you thought was happening?
>
> I don't know exactly what is going on! But the code above means that
> the firmware never successfully sent the last streaming message (the
> mpa-start reply) and never transitioned the connection into rdma mode.
> And the async error might indicate that some WR was posted prior to
> doing the rdma_accept() and that WR had problems.
>
> a few questions:
>
> What firmware are you running? ethtool -i will tell you.
[root at tebow2 ~]# ethtool -i eth4
driver: cxgb3
version: 1.0-ko
firmware-version: T 5.0.0 TP 1.1.0
bus-info: 0000:86:00.0
> What ofed version exactly?
I am still using the same code as at the top
of the thread, an OFED 1.3 daily from Monday
morning:
OFED-1.3-20080107-0942
> Does sdp post a SQ or RQ WR prior to doing the rdma_accept()? Can you
> dump that work request? Maybe in iwch_post_send and iwch_post_recv,
> dump the work request after it is built and before the code rings the
> doorbell. You can dump it as 8B flits, and be sure an put the flits in
> host byte order. See cxio_dump_wqe() in cxio_dbg.c...
>
Ok, I'll do this. I really appreciate your help and advice!
Thanks,
Craig
More information about the general
mailing list