[ofa-general] XRC cleanup order issue
Tang, Changqing
changquing.tang at hp.com
Wed Dec 19 12:01:51 PST 2007
Hi, Jack:
If I need to use XRC domain for communication between two ranks on the same node, How do I do it ?
The reason I ask is that HP-MPI has a NIC mode where no-shared memory is used.
Thanks.
--CQ
> -----Original Message-----
> From: Jack Morgenstein [mailto:jackm at dev.mellanox.co.il]
> Sent: Sunday, December 16, 2007 12:27 AM
> To: general at lists.openfabrics.org
> Cc: Tang, Changqing
> Subject: Re: [ofa-general] XRC cleanup order issue
>
> On Wednesday 12 December 2007 17:24, Tang, Changqing wrote:
> >
> > HI,
> > This question is mainly for Mellanox engineers.
> >
> > With XRC, the rank who create the QP which is used for
> > transport to all ranks on that node can NOT exit first if
> other ranks
> > are still using the transport. This restriction is a
> problem for our dynamic process definition where any rank
> could die with any reason, but without teardown the whole application.
> >
> > I am thinking about shared memory usage, where the creator
> > does not have to keep alive while other processes can still
> use it, untill the last process exits, then the system will
> cleanup the shared memory.
> >
> > Can't XRC mimic the shared memory behavior ?
> >
> There is an issue that the QP needs to be associated with a
> protection domain (i.e., UAR area), which is unique per user process.
>
> One possibility is to have a separate process per host per
> job (XRC domain) create the XRC QPs on the receiving side.
> There still would be the issue of what happens if that
> process somehow dies prematurely.
>
> We'll examine the issue and see if there is some other solution.
>
> - Jack
>
More information about the general
mailing list