[ofa-general] [GIT PULL] OFED 1.2 uDAPL release notes
Tang, Changqing
changquing.tang at hp.com
Fri Jul 6 11:08:26 PDT 2007
Sean:
Thanks, I think this solve our problem. Currently two cards are
on different subnet. Code on either subnet is working reliablely. I have
not tried if all cards are on the same subnet.
Do you recommend to config as a single subnet or two subnets ?
--CQ
> -----Original Message-----
> From: Sean Hefty [mailto:sean.hefty at intel.com]
> Sent: Friday, July 06, 2007 11:48 AM
> To: Tang, Changqing; Arlin Davis
> Cc: Vladimir Sokolovsky; OpenFabrics General
> Subject: RE: [ofa-general] [GIT PULL] OFED 1.2 uDAPL release notes
>
> >Eventhough I force all ranks only using the first card
> (ib0), it works
> >for a while and then fails with NON_PEER_REJECTED when one
> rank tries
> >to connect to another rank (dat_connect() and
> dat_evd_wait()). (I run a
> >simple MPI job in an infinite loop, it fails after hundreds runs);
>
> This sounds like it could be a race condition as a result of
> running the test in a loop. If the client starts before the
> server is listening, it will receive this sort of reject event.
>
> >It works on the first card (ib0), failed on the second card (ib1)
>
> Please take a look at the following thread:
>
> http://lists.openfabrics.org/pipermail/general/2007-May/036559.html
>
> In particular, see Steve's message about this:
>
> http://lists.openfabrics.org/pipermail/general/2007-May/036571.html
>
> and let me know if his suggestion fixes your problem.
>
> I will update the librdmacm documentation with this
> information as well.
>
> - Sean
>
More information about the general
mailing list