[ofa-general] XRC cleanup order issue

Tang, Changqing changquing.tang at hp.com
Mon Dec 17 07:40:23 PST 2007


Thanks, we would like to have a new solution to remove this restriction.

--CQ Tang


> -----Original Message-----
> From: Jack Morgenstein [mailto:jackm at dev.mellanox.co.il]
> Sent: Sunday, December 16, 2007 12:27 AM
> To: general at lists.openfabrics.org
> Cc: Tang, Changqing
> Subject: Re: [ofa-general] XRC cleanup order issue
>
> On Wednesday 12 December 2007 17:24, Tang, Changqing wrote:
> >
> > HI,
> >         This question is mainly for Mellanox engineers.
> >
> >         With XRC, the rank who create the QP which is used for
> > transport to all ranks on that node can NOT exit first if
> other ranks
> > are still using the transport. This restriction is a
> problem for our dynamic process definition where any rank
> could die with any reason, but without teardown the whole application.
> >
> >         I am thinking about shared memory usage, where the creator
> > does not have to keep alive while other processes can still
> use it, untill the last process exits, then the system will
> cleanup the shared memory.
> >
> >         Can't XRC mimic the shared memory behavior ?
> >
> There is an issue that the QP needs to be associated with a
> protection domain (i.e., UAR area), which is unique per user process.
>
> One possibility is to have a separate process per host per
> job (XRC domain) create the XRC QPs on the receiving side.
> There still would be the issue of what happens if that
> process somehow dies prematurely.
>
> We'll examine the issue and see if there is some other solution.
>
> - Jack
>



More information about the general mailing list