[ofa-general] SDP and iWARP

Craig Prescott prescott at hpc.ufl.edu
Fri Jan 11 08:40:13 PST 2008


Steve Wise wrote:
> Craig Prescott wrote:
>> Steve Wise wrote:
>>> Craig Prescott wrote:
>>>> Steve Wise wrote:
>>>>>
>>>>> Craig Prescott wrote:
>>>>>>
>>>>>> The above call also emits a couple of messages
>>>>>> into the listener's syslog now :
>>>>>>
>>>>>> Jan  9 21:53:54 tebow2 kernel: iwch_ev_dispatch - CQE Err qpid 
>>>>>> 0x20 opcode 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
>>>>>> Jan  9 21:53:54 tebow2 kernel: post_qp_event - AE qpid 0x20 opcode 
>>>>>> 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
>>>>>>
>>>>> This is an async event generated due to a failure processing a SQ 
>>>>> WR, I think. opcodes and status codes for iw_cxgb3 are in cxio_wr.h.
>>>>> type 1 means it was an egress (SQ) failure
>>>>> status 0x6 is a base/bounds violation,
>>>>> but 14 seems incorrect.  That's not a valid T3 opcode. ????
>>>>>
>>>>
>>>> Ok, thanks!  I guess I'm not sure what to make of that yet, though.
>>>>
>>>
>>> See where in iwch_accept_cr() the failure is happening.  It doesn't 
>>> look like send_mpa_reply() is being called.
>>>
>>
>> The ECONNRESET is coming from here in iwch_accept_cr():
>>
>> ...
>>         /* wait for wr_ack */
>>         wait_event(ep->com.waitq, ep->com.rpl_done);
>>         err = ep->com.rpl_err;
>> ...
>>
>> Is that what you thought was happening?
> 
> I don't know exactly what is going on!  But the code above means that 
> the firmware never successfully sent the last streaming message (the 
> mpa-start reply) and never transitioned the connection into rdma mode. 
> And the async error might indicate that some WR was posted prior to 
> doing the rdma_accept() and that WR had problems.
> 
> a few questions:
> 
> What firmware are you running?  ethtool -i will tell you.

[root at tebow2 ~]# ethtool -i eth4
driver: cxgb3
version: 1.0-ko
firmware-version: T 5.0.0 TP 1.1.0
bus-info: 0000:86:00.0

> What ofed version exactly?

I am still using the same code as at the top
of the thread, an OFED 1.3 daily from Monday
morning:

OFED-1.3-20080107-0942

> Does sdp post a SQ or RQ WR prior to doing the rdma_accept()?  Can you 
> dump that work request?  Maybe in iwch_post_send and iwch_post_recv, 
> dump the work request after it is built and before the code rings the 
> doorbell.  You can dump it as 8B flits, and be sure an put the flits in 
> host byte order.  See cxio_dump_wqe() in cxio_dbg.c...
> 

Ok, I'll do this.  I really appreciate your help and advice!

Thanks,
Craig




More information about the general mailing list