Shaun / Steve,<br><br>To pass the "librdmacm.so: cannot open shared object file: No such file or<br>>> directory" error message, LD_RUN_PATH also need to be set. <br><br>Anyway, after I am able to run the mvapich2 0.9.8-Release, I am trying to figure out how to run the various nenchmark tests using this MPI tool.<br><br>Has anyone run the Pallas tool with the OSC MPI or OpenMPI. I also want to run the OSC benchmark tests. Any guideline availabvle for these please?<br>Thanks,<br>David<br><br><br><b><i>Shaun Rowland <rowland@cse.ohio-state.edu></i></b> wrote:<blockquote class="replbq" style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px;"> Steve Wise wrote:<br>> I haven't tested mvapich2 with ammasso. But OSU has. I'm CCing their<br>> dev team so maybe they can help.<br>> <br>> Steve.<br>> <br>> <br>> <br>> On Fri, 2006-12-01 at 14:58 -0800, david elsen wrote:<br>>> Steve,<br>>><br>>> I can
run rping, rdma_lat etc on the Ammasso card but when I try to<br>>> run the mvapich2 (0.9.8-Release), I get librdmacm.so missing error. <br>>><br>>> ./mpdboot -n 1<br>>> debug: starting<br>>> /root/0.9.8-RELEASE/bin/mpdroot: error while loading shared libraries:<br>>> librdmacm.so: cannot open shared object file: No such file or<br>>> directory<br>>> running mpdallexit on ammasso1<br>>> LAUNCHED mpd on ammasso1 via <br>>> debug: launch cmd= /root/0.9.8-RELEASE/bin/mpd.py --ncpus=1 -e -d<br>>> debug: mpd on ammasso1 on port 35352<br>>> RUNNING: mpd on ammasso1<br>>> debug: info for running mpd: {'ncpus': 1, 'list_port': 35352,<br>>> 'entry_port': '', 'host': 'ammasso1', 'entry_host': '', 'ifhn': ''}<br><br>Hello David and Steve. We discussed this problem in detail on the<br>mvapich-discuss list recently. David, you indicated the following in<br>your last email about this to
mvapich-discuss on 11/26/2006:<br><br>"For some reason, it is working in SuSE, and not working in Fedora."<br><br>Is this still the case? Were the libraries built specifically on the<br>Fedora Core 6 system, or are you using libraries that were built on<br>SuSE? I assume they were built on Fedora Core 6. Were you trying to run<br>this as root or as a regular user? I am not sure exactly how this might<br>affect shared library loading, but it is possible there is a difference.<br><br>In our previous discussion, your library path did indeed have a<br>librdmacm.so file, though it could not be loaded for an unknown reason.<br>It is unclear to me if this email thread indicates that you have tried<br>to rebuild that and are experiencing the same issue. Where you able to<br>try running that test shared library example I gave and did it work? Did<br>it work as the same user you are trying to run MVAPICH as?<br><br>It seems clear this is a runtime loader problem on Fedora Core 6, or
on<br>your particular configuration there. That is what cannot find the<br>library. It is similar to the libtest code I provided as an example:<br><br>[rowland@e14-oib libtest]$ ls<br>Makefile test.c test.h test-program.c<br><br>[rowland@e14-oib libtest]$ make normal<br>gcc -c -fPIC test.c<br>gcc -shared -Wl,-soname,libtest.so.1 -o libtest.so.1.0 test.o<br>ln -s libtest.so.1.0 libtest.so.1<br>ln -s libtest.so.1 libtest.so<br>gcc -c -o test-program.o test-program.c<br>gcc -o test-program test-program.o -L/home/7/rowland/libtest -ltest<br><br>[rowland@e14-oib libtest]$ ldd test-program<br> libtest.so.1 => not found<br> libc.so.6 => /lib64/tls/libc.so.6 (0x0000003bf1900000)<br> /lib64/ld-linux-x86-64.so.2 (0x0000003bf1700000)<br><br>[rowland@e14-oib libtest]$ ./test-program<br>./test-program: error while loading shared libraries: libtest.so.1: <br>cannot open shared object file: No such file or directory<br><br>[rowland@e14-oib libtest]$
export LD_LIBRARY_PATH=$PWD<br><br>[rowland@e14-oib libtest]$ ldd test-program<br> libtest.so.1 => /home/7/rowland/libtest/libtest.so.1 <br>(0x00002abbf9aee000)<br> libc.so.6 => /lib64/tls/libc.so.6 (0x0000003bf1900000)<br> /lib64/ld-linux-x86-64.so.2 (0x0000003bf1700000)<br><br>[rowland@e14-oib libtest]$ ./test-program<br>In shared library function...<br><br>In previous email your ldd output showed the library was being found:<br><br>Please see the output of ldd /usr/local/mvapich2/bin/mpdroot :<br>[root@ammasso1 ~]# ldd /usr/local/mvapich2/bin/mpdroot<br> linux-gate.so.1 => (0xffffe000)<br> librdmacm.so => /usr/local/lib/librdmacm.so (0xb7fec000)<br> libibverbs.so.2 => /usr/local/lib/libibverbs.so.2 (0xb7fe5000)<br> libibumad.so.1 => /usr/local/lib/libibumad.so.1 (0xb7fdc000)<br> libpthread.so.0 => /lib/libpthread.so.0 (0x0012a000)<br> libc.so.6 => /lib/libc.so.6
(0x00ca7000)<br> libsysfs.so.2 => /usr/lib/libsysfs.so.2 (0x00369000)<br> libdl.so.2 => /lib/libdl.so.2 (0x00de6000)<br> libibcommon.so.1 => /usr/local/lib/libibcommon.so.1 (0xb7fcb000)<br> /lib/ld-linux.so.2 (0x002d8000)<br><br>But that path is different than the one you are quoting above. Does an<br>ldd on /root/0.9.8-RELEASE/bin/mpdroot find librdmacm.so too, as the<br>same user you are trying to run it as?<br><br>I have one more idea for you to try here. You can do the following:<br><br>export LD_DEBUG=all<br>/root/0.9.8-RELEASE/bin/mpdroot >&output<br>unset LD_DEBUG<br><br>Then take a look at the output file to see if there are any relevant<br>error messages. Don't forget to unset LD_DEBUG before doing anything else.<br><br>Also, just to be sure, if you run "file <path to="" librdmacm.so="">" what<br>does it say? It should indicate that it is a shared library as similarly to:<br><br>[rowland@e14-oib libtest]$ file
/usr/local/ofed/lib64/librdmacm.so*<br>/usr/local/ofed/lib64/librdmacm.so: symbolic link to <br>`librdmacm.so.0.9.0'<br>/usr/local/ofed/lib64/librdmacm.so.0.9.0: ELF 64-bit LSB shared object, <br>AMD x86-64, version 1 (SYSV), not stripped<br><br>Unfortunately, we do not have any Fedora Core 6 systems to investigate<br>this problem on at this time, and I don't know anything about what might<br>be there that would cause a problem. As far as I know, there shouldn't<br>be. However, it seems there is some runtime issue on your Fedora Core 6<br>machine or with how this is being run there. If it is in fact working on<br>another distribution as you indicated in your previous response to us,<br>then that also strongly points in this direction.<br>-- <br>Shaun Rowland rowland@cse.ohio-state.edu<br>http://www.cse.ohio-state.edu/~rowland/<br></path></blockquote><br><p> __________________________________________________<br>Do You Yahoo!?<br>Tired of spam? Yahoo! Mail has the
best spam protection around <br>http://mail.yahoo.com