[openib-general] connection loss handling in mthca
keshetti mahesh
k_mahesh85 at yahoo.co.in
Mon Jul 24 05:47:54 PDT 2006
Dotan Barak <dotanb at mellanox.co.il> wrote: Message
-----Original Message-----
From: keshetti mahesh [mailto:k_mahesh85 at yahoo.co.in]
Sent: Monday, July 24, 2006 3:21 PM
To: Dotan Barak
Subject: RE: [openib-general] connection loss handling in mthca
Dotan Barak <dotanb at mellanox.co.il> wrote:
-----Original Message-----
From: keshetti mahesh [mailto:k_mahesh85 at yahoo.co.in]
Sent: Monday, July 24, 2006 2:40 PM
To: Dotan Barak
Subject: Re: [openib-general] connection loss handling in mthca
Dotan Barak <dotanb at mellanox.co.il> wrote: Hi.
On Monday 24 July 2006 13:50, keshetti mahesh wrote:
> i have a query regarding the handling of asynchronous events in mthca driver
> consider the situation, receiver has posted some 10 descriptors. and 5 out of them are completd successfully, after that connection is lost( in NIC level) due to some reason
>
> now,
> 1. how do the QP know about this(there is no IB specific event)
If the QP was the responder of an RDMA operation which failed, there should be an async event on the QP.
> 2. What about the remaining descriptors in the receiver side
> are completions will be generated for them
In case of an error, the QP state will be changed to error and all the WR (in SQ and RQ) will be flushed (with error)
where does it happen? in the interrupt handler or ??
i have gone through the mthca code
1. there is no IQE or event corresponding to the connection lose
2. in the interrupt handlers only the event handler corresponding to that QP is called (no QP state change)
[Dotan Barak]
When there is an error with the QP, the QP state is being changed by the HCA (Automatically).
The async event event occur only if the operation is an RDMA operation and the QP is the responder,
there should be completion with error after the QP had the problem (is there are WR in the QP).
the event is an affiliated event (only for this QP), so only the event handler of this QP should get this event.
Dotan
Dotan
ok, now what i can understand is
if we consider this case (i.e. connection lose) the HCA will automatically change the state of QP to error.
No async event or error will be generated (this is not RDMA operation) . and
a completion with error code (which error code????) will be generated for the completion which is in process and all other outstanding WRs will be flushed.
is this OK???
??with which error status the WR in progress will be completed.
-Mahesh
[Dotan Barak] what you understood is correct.
I cannot tell you the expected status of the completion if i don't know what you are doing
(which opcodes do you use, if the QP which go to error is responder or requestor ...).
the first WR which fails will have a "meaningful" status and the rest of the completions status will be "flushed with error".
Dotan
let me put the whole thing again
A is the sender(who has posted some 5 descriptors) and B is the receiver(who has posted the same 5 receive descriptors)
now the sender(A) HCA has detected the connection lose due to " TPT error for data buffer" on the receiver (B) side then
-the receiver(B) will be notified about this through an interrupt(affiliated asynchronous error) ??
-upon receiving the interrupt the receiver(B) HCA will transit the state of QP to error
-what happens to the WR s in progress at the both ends ..with which code the completion will be generated???
-Mahesh
---------------------------------
Find out what India is talking about on Yahoo! Answers India.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060724/a598e9cf/attachment.html>
More information about the general
mailing list