[openib-general] connection loss handling in mthca

keshetti mahesh k_mahesh85 at yahoo.co.in
Mon Jul 24 05:47:54 PDT 2006



Dotan Barak <dotanb at mellanox.co.il> wrote:   Message     
    
   -----Original Message-----
From: keshetti mahesh    [mailto:k_mahesh85 at yahoo.co.in] 
Sent: Monday, July 24, 2006 3:21    PM
To: Dotan Barak
Subject: RE: [openib-general]    connection loss handling in mthca




Dotan    Barak <dotanb at mellanox.co.il> wrote:         
            
       -----Original Message-----
From: keshetti        mahesh [mailto:k_mahesh85 at yahoo.co.in] 
Sent: Monday, July 24,        2006 2:40 PM
To: Dotan Barak
Subject: Re:        [openib-general] connection loss handling in        mthca




Dotan Barak        <dotanb at mellanox.co.il> wrote:                 Hi.

On Monday 24 July 2006 13:50, keshetti mahesh          wrote:
> i have a query regarding the handling of asynchronous          events in mthca driver
> consider the situation, receiver has          posted some 10 descriptors. and 5 out of them are completd successfully,          after that connection is lost( in NIC level) due to some reason
>          
> now,
> 1. how do the QP know about this(there is no IB          specific event)
If the QP was the responder of an RDMA operation          which failed, there should be an async event on the QP.
> 2. What          about the remaining descriptors in the receiver side
> are          completions will be generated for them 
In case of an error, the QP          state will be changed to error and all the WR (in SQ and RQ) will be          flushed (with error)

where does it happen? in the interrupt          handler or ??
i have gone through the mthca code          
    1. there is no IQE or event corresponding to the          connection lose
    2. in the interrupt handlers only          the event handler corresponding to      that          QP  is called (no  QP  state change)
[Dotan          Barak] 
         When there is an error with the QP, the QP state is being changed          by the HCA (Automatically).



         The async event event occur only if the operation is an RDMA          operation and the QP is the responder,
         there should be completion with error after the QP had the          problem (is there are WR in the QP).
          
         the event is an affiliated event (only for this QP), so only the          event handler of this QP should get this event.
          
         Dotan
          
         

Dotan


ok, now what i can understand is          
 if we consider this case (i.e. connection lose)  the HCA          will automatically change the state of QP to error.
No async event or          error  will be  generated  (this is not RDMA operation) .          and  
  a completion with error code (which error code????)             will be generated for the completion which is in process          and all other outstanding WRs will be flushed.
is this          OK???

??with which error status the WR in progress will be           completed.

-Mahesh


[Dotan          Barak] what you understood is correct.
          
         I cannot tell you the expected status of the completion if i          don't know what you are doing 
         (which opcodes do you use, if the QP which go to error is          responder or requestor ...).
          
         the first WR which fails will have a "meaningful" status and the          rest of the completions status will be "flushed with          error".
          
          Dotan

let me put the whole thing again
A is the sender(who has posted some 5 descriptors) and B is the receiver(who has posted the same 5 receive descriptors)

now the sender(A)  HCA  has detected the connection lose due to " TPT error for data buffer" on the receiver (B) side then  
-the receiver(B) will be  notified about this through an interrupt(affiliated asynchronous error)        ??
-upon receiving the interrupt the receiver(B)  HCA will transit the state of QP to error
-what happens to the WR s in progress at the both ends ..with which code the completion will be generated???

-Mahesh



 				
---------------------------------
 Find out what India is talking about on Yahoo! Answers India.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060724/a598e9cf/attachment.html>


More information about the general mailing list