[openib-general] InfiniPath + OpenIB question

Sayantan Sur surs at cse.ohio-state.edu
Tue Nov 22 09:39:21 PST 2005


Hi,

I sent a message about an hour back, but it hasn't seem to have made it
to the list. Anyways, here is my reply (again). Apologies for multiple
copies.

=====
>
>
>mpirun_rsh -rsh -np 4 -hostfile nodes ./cpip
>
>I get the following error:
>
>[0] Abort: Error getting HCA context

Thanks for trying out MVAPICH on Pathscale machines with Gen2.

This seems to be the major problem, not the version mismatch. The
version "0" usually pops up when the remote processes have died and
could not communicate their version number to the master "mpirun_rsh"
process.

There are several things you need to make sure that MVAPICH works
smoothly.

1) Load all IB related modules (ib_ipath, ib_uverbs, ib_mad, ib_umad),
as Hal has pointed out.

2) Make sure your /etc/udev/rules.d/90-ib.rules is present and the
contents look more or less like in the cheat-sheet posted on OpenIB Wiki
by Michael Tsirkin
(https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet)

3) If you don't have the udev rules and are making devices by hand, make
sure that the "user" you are running as has 666 perms on the devices.

4) If you are to run MPI programs as a user (not as root), then make
sure that the "ulimit -l" shows "unlimited" or some high number. This is
the amount of lockable memory (in kB). In order to change this value,
you need to edit /etc/security/limits.conf and put a line like:

* soft memlock unlimited

Then edit /etc/init.d/sshd and put a line:

ulimit -l unlimited

All subsequent SSH sessions will have "unlimited" lockable memory.

Please let us know if you were able to get past this error.

Thanks,
Sayantan.

-- 
http://www.cse.ohio-state.edu/~surs



More information about the general mailing list