[ofa-general] Question: Verbs API Error code recover

Dotan Barak dotanb at dev.mellanox.co.il
Tue Dec 4 10:45:21 PST 2007


I'm trying to gather some data in order to reproduce this in our lab
(we didn't encounter this behavior in our regression)

Which Linux distribution do you use?
Do you have error messages in the /var/log/messages?
Can you execute perfquery and check if there are errors on the link?


thanks
Dotan


Wei Fang wrote:
> Hi, Dotan:
>
> I found that this issue happen in kernel 2.6.9-22 and related to 
> opensm.  When this issue happen, any test always fail. I pull out 
> Infiniband cable and relink it, opensm can not response it.  When I 
> stop opensm serivce and restart opensm,  Infiniband link recover.  But 
> I didn't found this issue in kernel 2.6.20 or 2.6.22.
>
> Dotan Barak wrote:
>> Wei Fang wrote:
>>> Hi, Dotan:
>>>
>>> When I got that error, I quit my program and use ib_rmda_bw prorgam 
>>> to test Infiniband link. It still fails like this:
>>>
>>> ib_rdma_bw 10.8.6.3
>>> 19068: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | 
>>> iters=1000 | duplex=0 | cma=0 |
>>> 19068: Local address:  LID 0x01, QPN 0x2e0404, PSN 0xc39344 RKey 
>>> 0x4c003101 VAddr 0x00002a958bc000
>>> 19068: Remote address: LID 0x3d9, QPN 0x140404, PSN 0x77012a, RKey 
>>> 0x74003100 VAddr 0x00002a958bc000
>>>
>>> 19068:main: Completion with error at client:
>>> 19068:main: Failed status 12: wr_id 3
>>> 19068:main: scnt=100, ccnt=0
>> This means that the remote QP didn't response (or didn't send the 
>> respond in time).
>> can you try to execute ibv_rc_pingpong between the sides and check 
>> what is the status?
>> what is the output of ibv_devinfo in both sides?
>> (maybe something bad happened to the link)
>>
>> thanks
>> Dotan
>>
>>
>




More information about the general mailing list