[ewg] MLX4 Strangeness

Tziporet Koren tziporet at mellanox.co.il
Mon Feb 15 23:53:24 PST 2010


On 2/15/2010 10:24 PM, Tom Tucker wrote:
> Hello,
>
> I am seeing some very strange behavior on my MLX4 adapters running 2.7
> firmware and the latest OFED 1.5.1. Two systems are involved and each
> have dual ported MTHCA DDR adapter and MLX4 adapters.
>
> The scenario starts with NFSRDMA stress testing between the two systems
> running bonnie++ and iozone concurrently. The test completes and there
> is no issue. Then 6 minutes pass and the server "times out" the
> connection and shuts down the RC connection to the client.
>
>   From this point on, using the RDMA CM, a new RC QP can be brought up
> and moved to RTS, however, the first RDMA_SEND to the NFS SERVER system
> fails with IB_WC_RETRY_EXC_ERR. I have confirmed:
>
> - that "arp" completed successfully and the neighbor entries are
> populated on both the client and server
> - that the QP are in the RTS state on both the client and server
> - that there are RECV WR posted to the RQ on the server and they did not
> error out
> - that no RECV WR completed successfully or in error on the server
> - that there are SEND WR posted to the QP on the client
> - the client side SEND_WR fails with error 12 as mentioned above
>
> I have also confirmed the following with a different application (i.e.
> rping):
>
> server# rping -s
> client# rping -c -a 192.168.80.129
>
> fails with the exact same error, i.e.
> client# rping -c -a 192.168.80.129
> cq completion failed status 12
> wait for RDMA_WRITE_ADV state 10
> client DISCONNECT EVENT...
>
> However, if I run rping the other way, it works fine, that is,
>
> client# rping -s
> server# rping -c -a 192.168.80.135
>
> It runs without error until I stop it.
>
> Does anyone have any ideas on how I might debug this?
>
>
>    
Tom
What is the vendor syndrome error when you get a completion with error?

Does the issue occurs only on the ConnectX cards (mlx4) or also on the 
InfiniHost cards (mthca)

Tziporet




More information about the ewg mailing list