[openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB

Ranjit Pandit rpandit at silverstorm.com
Fri Nov 11 13:02:14 PST 2005


On 11/11/05, Michael Krause <krause at cup.hp.com> wrote:
> Please clarify the following which was in the document provided by Oracle.
>
> On page 3 of the RDS document, under the section "RDP Interface", the 2nd
> and 3rd paragraphs are state:
>
>    * RDP does not guarantee that a datagram is delivered to the remote
> application.
>    * It is up to the RDP client to deal with datagrams lost due to transport
> failure or remote application failure.
>
> The HCA is still a fault domain with RDS - it does not address flushing data
> out of the HCA fault domain, nor does it sound like it ensures that CQE loss
> is recoverable.
>
> I do believe RDS will replay all of the sendmsg's that it believes are
> pending, but it has no way to determine if already sent sendmsgs were
> actually successfully delivered to the remote application unless it provides
> some level of resync of the outstanding sends not completed from an
> application's perspective as well as any state updated via RDMA operations
> which may occur without an explicit send operation to flush to a known
> state.  I'm still trying to ascertain whether RDS completely recovers from
> HCA failure (assuming there is another HCA / path available) between the two
> endnodes.

RDS will replay the sends that are completed in error by the HCA,
which typically would happen if the current path fails or the remote
node/HCA dies.

In case of a catastrophic error on the local HCA, subsequent sends
will fail (for a certain time (session_time_wait ) ) as if there was
no alternate path available at that time.
On getting an error the application should discard any sends
unacknowledged by it's peer and take corrective action.

After the time_wait is over, subsequent sends will initiate a brand
new connection which could use the alternate HCA ( if the path is
available).

>
> Mike
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
>



More information about the general mailing list