[openib-general] Re: uDAPL again
Arlin Davis
ardavis at ichips.intel.com
Wed Nov 2 12:02:57 PST 2005
Aniruddha Bohra wrote:
> cq_object_wait: RET evd 0x8083ca0 ibv_cq 0x8083da0 ibv_ctx (nil)
> Success^M
> >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M
> dapl_evd_dto_callback : CQE ^M
> work_req_id 134771572^M
> status 12^M
> >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M
> DTO completion ERROR: 12: op 0xff^M
> disconnect(ep 0x8087110, conn 0x808a008, id 134774528 flags 0)^M
> destroy_cm_id: conn 0x808a008 id 134774528^M
> dapli_evd_post_event: Called with event # 4006^M
>
>
> Any ideas how to proceed to even debug this ?
Are you using the uDAPL provider with socket CM (VERBS=openib_scm) or
the default one that use's uCM and uAT? For the socket_CM version the
timeout is set to 14 (~67ms) and the retries are set to 7 so the
receiving node would have to be delayed beyond ~469ms to get this
failure. For the default uCM/uAT version the retries are set to 7 and
the timeout is set to pktlifetime+1 so you would have to look at the
path-record for the timeout value for the connection.
Can you successfully run the IB verbs ibv_rc_pingpong test suite?
Anything special about your fabric configuration that could induce this
kind of latencies? Something on the fabric or in your remote system is
delaying ACK's beyond your total timeout/retry times.
If you had no buffers posted or attempted to send to unregistered memory
you would get different errors.
-arlin
>
> Thanks
> Aniruddha
>
More information about the general
mailing list