[ofa-general] multicast group join limits -- test code

Jeff Squyres jsquyres at cisco.com
Tue Aug 14 11:31:45 PDT 2007


On Aug 14, 2007, at 2:12 PM, Andrew Friedley wrote:

> An MPI is needed to compile/run the test.  No arguments are needed;  
> the test repeatedly joins groups (without leaving them) until an  
> error occurs, then intentionally hangs.

Just to clarify for those not familiar with MPI -- MPI is not used in  
the multicast portion of the test.  It's only used to bootstrap /  
launch the test and used as an "out of band" messaging mechanism so  
that you can know when the group has been joined, etc.

So you can even use a TCP-only MPI to run this test to ensure that  
you are not skewing any IB stack issues.

> Here's some of the different behaviors I see with this test (OFED  
> v1.2 is always used):

FWIW, I've been trying to help Andrew run this test, and I always run  
into one of two errors:

- Running 1 proc each on 2 nodes joins 2 groups, then:

0 ERROR rdma_join_multicast(): 99 Cannot assign requested address

- Running 4 procs on 1 node joins a few 10s of groups (different  
every time) and then:

ERROR event 13, status -110 Operation now in progress, forcing job to  
hang

The nodes are all RHEL4U4 running OFED 1.2; each node has 4 cores.

-- 
Jeff Squyres
Cisco Systems




More information about the general mailing list