[ofa-general] SDP and iWARP
Steve Wise
swise at opengridcomputing.com
Fri Jan 11 08:21:18 PST 2008
Craig Prescott wrote:
> Steve Wise wrote:
>> Craig Prescott wrote:
>>> Steve Wise wrote:
>>>>
>>>> Craig Prescott wrote:
>>>>>
>>>>> The above call also emits a couple of messages
>>>>> into the listener's syslog now :
>>>>>
>>>>> Jan 9 21:53:54 tebow2 kernel: iwch_ev_dispatch - CQE Err qpid 0x20
>>>>> opcode 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
>>>>> Jan 9 21:53:54 tebow2 kernel: post_qp_event - AE qpid 0x20 opcode
>>>>> 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
>>>>>
>>>> This is an async event generated due to a failure processing a SQ
>>>> WR, I think. opcodes and status codes for iw_cxgb3 are in cxio_wr.h.
>>>> type 1 means it was an egress (SQ) failure
>>>> status 0x6 is a base/bounds violation,
>>>> but 14 seems incorrect. That's not a valid T3 opcode. ????
>>>>
>>>
>>> Ok, thanks! I guess I'm not sure what to make of that yet, though.
>>>
>>
>> See where in iwch_accept_cr() the failure is happening. It doesn't
>> look like send_mpa_reply() is being called.
>>
>
> The ECONNRESET is coming from here in iwch_accept_cr():
>
> ...
> /* wait for wr_ack */
> wait_event(ep->com.waitq, ep->com.rpl_done);
> err = ep->com.rpl_err;
> ...
>
> Is that what you thought was happening?
I don't know exactly what is going on! But the code above means that
the firmware never successfully sent the last streaming message (the
mpa-start reply) and never transitioned the connection into rdma mode.
And the async error might indicate that some WR was posted prior to
doing the rdma_accept() and that WR had problems.
a few questions:
What firmware are you running? ethtool -i will tell you.
What ofed version exactly?
Does sdp post a SQ or RQ WR prior to doing the rdma_accept()? Can you
dump that work request? Maybe in iwch_post_send and iwch_post_recv,
dump the work request after it is built and before the code rings the
doorbell. You can dump it as 8B flits, and be sure an put the flits in
host byte order. See cxio_dump_wqe() in cxio_dbg.c...
Steve.
More information about the general
mailing list