[ofa-general] [GIT PULL] OFED 1.2 uDAPL release notes

Tang, Changqing changquing.tang at hp.com
Fri Jul 6 11:08:26 PDT 2007


Sean:
	Thanks, I think this solve our problem. Currently two cards are
on different subnet. Code on either subnet is working reliablely. I have
not tried if all cards are on the same subnet.

	Do you recommend to config as a single subnet or two subnets ?


--CQ 

> -----Original Message-----
> From: Sean Hefty [mailto:sean.hefty at intel.com] 
> Sent: Friday, July 06, 2007 11:48 AM
> To: Tang, Changqing; Arlin Davis
> Cc: Vladimir Sokolovsky; OpenFabrics General
> Subject: RE: [ofa-general] [GIT PULL] OFED 1.2 uDAPL release notes
> 
> >Eventhough I force all ranks only using the first card 
> (ib0), it works 
> >for a while and then fails with NON_PEER_REJECTED when one 
> rank tries 
> >to connect to another rank (dat_connect() and 
> dat_evd_wait()). (I run a 
> >simple MPI job in an infinite loop, it fails after hundreds runs);
> 
> This sounds like it could be a race condition as a result of 
> running the test in a loop.  If the client starts before the 
> server is listening, it will receive this sort of reject event.
> 
> >It works on the first card (ib0), failed on the second card (ib1)
> 
> Please take a look at the following thread:
> 
> http://lists.openfabrics.org/pipermail/general/2007-May/036559.html
> 
> In particular, see Steve's message about this:
> 
> http://lists.openfabrics.org/pipermail/general/2007-May/036571.html
> 
> and let me know if his suggestion fixes your problem.
> 
> I will update the librdmacm documentation with this 
> information as well.
> 
> - Sean
> 



More information about the general mailing list