[ofa-general] SDP and iWARP

Steve Wise swise at opengridcomputing.com
Fri Jan 11 08:21:18 PST 2008


Craig Prescott wrote:
> Steve Wise wrote:
>> Craig Prescott wrote:
>>> Steve Wise wrote:
>>>>
>>>> Craig Prescott wrote:
>>>>>
>>>>> The above call also emits a couple of messages
>>>>> into the listener's syslog now :
>>>>>
>>>>> Jan  9 21:53:54 tebow2 kernel: iwch_ev_dispatch - CQE Err qpid 0x20 
>>>>> opcode 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
>>>>> Jan  9 21:53:54 tebow2 kernel: post_qp_event - AE qpid 0x20 opcode 
>>>>> 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
>>>>>
>>>> This is an async event generated due to a failure processing a SQ 
>>>> WR, I think. opcodes and status codes for iw_cxgb3 are in cxio_wr.h.
>>>> type 1 means it was an egress (SQ) failure
>>>> status 0x6 is a base/bounds violation,
>>>> but 14 seems incorrect.  That's not a valid T3 opcode. ????
>>>>
>>>
>>> Ok, thanks!  I guess I'm not sure what to make of that yet, though.
>>>
>>
>> See where in iwch_accept_cr() the failure is happening.  It doesn't 
>> look like send_mpa_reply() is being called.
>>
> 
> The ECONNRESET is coming from here in iwch_accept_cr():
> 
> ...
>         /* wait for wr_ack */
>         wait_event(ep->com.waitq, ep->com.rpl_done);
>         err = ep->com.rpl_err;
> ...
> 
> Is that what you thought was happening?

I don't know exactly what is going on!  But the code above means that 
the firmware never successfully sent the last streaming message (the 
mpa-start reply) and never transitioned the connection into rdma mode. 
And the async error might indicate that some WR was posted prior to 
doing the rdma_accept() and that WR had problems.

a few questions:

What firmware are you running?  ethtool -i will tell you.

What ofed version exactly?

Does sdp post a SQ or RQ WR prior to doing the rdma_accept()?  Can you 
dump that work request?  Maybe in iwch_post_send and iwch_post_recv, 
dump the work request after it is built and before the code rings the 
doorbell.  You can dump it as 8B flits, and be sure an put the flits in 
host byte order.  See cxio_dump_wqe() in cxio_dbg.c...

Steve.




More information about the general mailing list