[openib-general] Problem with mca_mpool_openib_register - Cannot allocate memory
Bill Wichser
bill at Princeton.EDU
Mon Jun 19 08:11:12 PDT 2006
Running the openib stack from Redhat on a 2.6.9-34.ELsmp kernel, dual
Xeon. Running with openmpi v1.0.2 compiled w/gcc.
While we still have the problem with btl_openib_endpoint.c returning 0
byte(s) for max inline data, and realize that another IB stack addresses
this, another problem when running across more than a single host pops
up generating huge amounts of error messages.
The errors go something like this:
mca_mpool_openib_register: ibv_reg_mr(0x2ac2622000,1052672) failed with
error: Cannot allocate memory
[0,1,1][btl_openib.c:496:mca_btl_openib_prepare_dst]
mpool_register(0x2ac2622040,1048576) failed: base 0x2ac2222040 lb 0
offset 4194304
We fixed the /etc/security/limits.conf problem but I don't know what to
do about this one. The job seems to complete without error on 2 nodes
(4 processors) but to scale any larger just generates megabyte files of
these types of error messages.
Any insights for this problem? All searches lead me to the limits.conf
which we have set to 8192. These are 8G machines if that makes any
difference.
Thanks,
Bill
More information about the general
mailing list