<html>

<body>

<font size=3>At 01:02 PM 11/11/2005, Ranjit Pandit wrote:<br>

<blockquote type=cite class=cite cite="">On 11/11/05, Michael Krause

<krause@cup.hp.com> wrote:<br>

> Please clarify the following which was in the document provided by

Oracle.<br>

><br>

> On page 3 of the RDS document, under the section "RDP

Interface", the 2nd<br>

> and 3rd paragraphs are state:<br>

><br>

>    * RDP does not guarantee that a datagram is

delivered to the remote<br>

> application.<br>

>    * It is up to the RDP client to deal with

datagrams lost due to transport<br>

> failure or remote application failure.<br>

><br>

> The HCA is still a fault domain with RDS - it does not address

flushing data<br>

> out of the HCA fault domain, nor does it sound like it ensures that

CQE loss<br>

> is recoverable.<br>

><br>

> I do believe RDS will replay all of the sendmsg's that it believes

are<br>

> pending, but it has no way to determine if already sent sendmsgs

were<br>

> actually successfully delivered to the remote application unless it

provides<br>

> some level of resync of the outstanding sends not completed from

an<br>

> application's perspective as well as any state updated via RDMA

operations<br>

> which may occur without an explicit send operation to flush to a

known<br>

> state.  I'm still trying to ascertain whether RDS completely

recovers from<br>

> HCA failure (assuming there is another HCA / path available) between

the two<br>

> endnodes.<br><br>

RDS will replay the sends that are completed in error by the HCA,<br>

which typically would happen if the current path fails or the remote<br>

node/HCA dies.</font></blockquote><br>

Does this mean that the receiving RDS entity is responsible for dealing

with duplicates?  A Send completion error does not mean that the

receiving endnode did not receive the data for either IB or iWARP; it

only indicates that the Send operation failed which could be just a loss

of the receive ACK with the Send completing on the receiver.  Such a

scenario would imply that RDS would have to comprehend what buffers have

actually been consumed before retransmission, i.e. a resync is performed,

else one could receive duplicate data at the application layer which can

cause corruption or other problems as a function of the application

(tolerance will vary by application thus the ULP must present consistent

semantics to enable a broader set of applications than perhaps the

initial targeted application to be supported).<br><br>

<blockquote type=cite class=cite cite=""><font size=3>In case of a

catastrophic error on the local HCA, subsequent sends will fail (for a

certain time (session_time_wait ) ) as if there was no alternate path

available at that time. On getting an error the application should

discard any sends unacknowledged by it's peer and take corrective

action.</font></blockquote><br>

Unacknowledged by the peer means at the interconnect or the application

level?  Again, how is the receive buffer management

handled?<br><br>

<blockquote type=cite class=cite cite=""><font size=3>After the time_wait

is over, subsequent sends will initiate a brand new connection which

could use the alternate HCA ( if the path is

available).</font></blockquote><br>

This is understood.<br><br>

Mike</body>

</html>