[ofa-general] XRC cleanup order issue

Jack Morgenstein jackm at dev.mellanox.co.il
Sat Dec 15 22:27:04 PST 2007


On Wednesday 12 December 2007 17:24, Tang, Changqing wrote:
> 
> HI,
>         This question is mainly for Mellanox engineers.
> 
>         With XRC, the rank who create the QP which is used for transport to all ranks on that node can NOT exit first if other ranks are still using
> the transport. This restriction is a problem for our dynamic process definition where any rank could die with any reason, but without teardown the
> whole application.
> 
>         I am thinking about shared memory usage, where the creator does not have to keep alive while other processes can still use it, untill the
> last process exits, then the system will cleanup the shared memory.
> 
>         Can't XRC mimic the shared memory behavior ?
> 
There is an issue that the QP needs to be associated with a protection domain (i.e., UAR area),
which is unique per user process.

One possibility is to have a separate process per host per job (XRC domain) create the XRC QPs on the receiving side.
There still would be the issue of what happens if that process somehow dies prematurely.

We'll examine the issue and see if there is some other solution.

- Jack



More information about the general mailing list