[Users] Registrable memory calculation

Kenja, Krishna (kenjakt) kenjakt at mail.uc.edu
Mon Aug 22 10:12:32 PDT 2016


Hello All,


I get the following error when I try to run a benchmark test on a node with 256GB of RAM.



WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory.  This can cause MPI jobs to
run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered.  You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module
parameters:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

  Local host:              compute-0-3.local
  Registerable memory:     32768 MiB
  Total memory:            262109 MiB

Your MPI job will continue, but may be behave poorly and/or hang.

This only happens when I try it on nodes with more than 32GB of memory. Some searching around has lead me to the following bug report which seems to be the issue in my case too, because log_num_mtt on the node is 0 and log_mtts_per_seg is 3.

https://bugzilla.redhat.com/show_bug.cgi?id=1171626

The bug report suggests loading mlx5_core as a possible workaround.

Any suggestions or other workarounds to  make this work with mlx4_core?

Appreciate any help I can get!

Regards
Krishna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20160822/d33cb220/attachment.html>


More information about the Users mailing list