[openib-general] Problem with mca_mpool_openib_register - Cannot allocate memory

Bill Wichser bill at Princeton.EDU
Mon Jun 19 08:11:12 PDT 2006


Running the openib stack from Redhat on a 2.6.9-34.ELsmp kernel, dual 
Xeon.  Running with openmpi v1.0.2 compiled w/gcc.

While we still have the problem with btl_openib_endpoint.c returning  0 
byte(s) for max inline data, and realize that another IB stack addresses 
this, another problem when running across more than a single host pops 
up generating huge amounts of error messages.

The errors go something like this:

mca_mpool_openib_register: ibv_reg_mr(0x2ac2622000,1052672) failed with 
error: Cannot allocate memory
[0,1,1][btl_openib.c:496:mca_btl_openib_prepare_dst] 
mpool_register(0x2ac2622040,1048576) failed: base 0x2ac2222040 lb 0 
offset 4194304

We fixed the /etc/security/limits.conf problem but I don't know what to 
do about this one.  The job seems to complete without error on 2 nodes 
(4 processors) but to scale any larger just generates megabyte files of 
these types of error messages.

Any insights for this problem?  All searches lead me to the limits.conf 
which we have set to 8192.  These are 8G machines if that makes any 
difference.

Thanks,
Bill




More information about the general mailing list