[ofa-general] multicast group join limits -- test code
Jeff Squyres
jsquyres at cisco.com
Tue Aug 14 11:31:45 PDT 2007
On Aug 14, 2007, at 2:12 PM, Andrew Friedley wrote:
> An MPI is needed to compile/run the test. No arguments are needed;
> the test repeatedly joins groups (without leaving them) until an
> error occurs, then intentionally hangs.
Just to clarify for those not familiar with MPI -- MPI is not used in
the multicast portion of the test. It's only used to bootstrap /
launch the test and used as an "out of band" messaging mechanism so
that you can know when the group has been joined, etc.
So you can even use a TCP-only MPI to run this test to ensure that
you are not skewing any IB stack issues.
> Here's some of the different behaviors I see with this test (OFED
> v1.2 is always used):
FWIW, I've been trying to help Andrew run this test, and I always run
into one of two errors:
- Running 1 proc each on 2 nodes joins 2 groups, then:
0 ERROR rdma_join_multicast(): 99 Cannot assign requested address
- Running 4 procs on 1 node joins a few 10s of groups (different
every time) and then:
ERROR event 13, status -110 Operation now in progress, forcing job to
hang
The nodes are all RHEL4U4 running OFED 1.2; each node has 4 cores.
--
Jeff Squyres
Cisco Systems
More information about the general
mailing list