<div>Shaun,</div> <div> </div> <div>It was working on one of my Fedora system. I tried to do the same installation on my other system which has SuSe 9.3 and it is not working there.</div> <div> </div> <div>So I am not sure what is going on with this.</div> <div> </div> <div>Thanks,</div> <div>David</div> <div><BR><BR><B><I>Shaun Rowland <rowland@cse.ohio-state.edu></I></B> wrote:</div> <BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid">Steve Wise wrote:<BR>> I haven't tested mvapich2 with ammasso. But OSU has. I'm CCing their<BR>> dev team so maybe they can help.<BR>> <BR>> Steve.<BR>> <BR>> <BR>> <BR>> On Fri, 2006-12-01 at 14:58 -0800, david elsen wrote:<BR>>> Steve,<BR>>><BR>>> I can run rping, rdma_lat etc on the Ammasso card but when I try to<BR>>> run the mvapich2 (0.9.8-Release), I get librdmacm.so missing error. <BR>>><BR>>>
./mpdboot -n 1<BR>>> debug: starting<BR>>> /root/0.9.8-RELEASE/bin/mpdroot: error while loading shared libraries:<BR>>> librdmacm.so: cannot open shared object file: No such file or<BR>>> directory<BR>>> running mpdallexit on ammasso1<BR>>> LAUNCHED mpd on ammasso1 via <BR>>> debug: launch cmd= /root/0.9.8-RELEASE/bin/mpd.py --ncpus=1 -e -d<BR>>> debug: mpd on ammasso1 on port 35352<BR>>> RUNNING: mpd on ammasso1<BR>>> debug: info for running mpd: {'ncpus': 1, 'list_port': 35352,<BR>>> 'entry_port': '', 'host': 'ammasso1', 'entry_host': '', 'ifhn': ''}<BR><BR>Hello David and Steve. We discussed this problem in detail on the<BR>mvapich-discuss list recently. David, you indicated the following in<BR>your last email about this to mvapich-discuss on 11/26/2006:<BR><BR>"For some reason, it is working in SuSE, and not working in Fedora."<BR><BR>Is this still the case? Were the libraries built specifically on
the<BR>Fedora Core 6 system, or are you using libraries that were built on<BR>SuSE? I assume they were built on Fedora Core 6. Were you trying to run<BR>this as root or as a regular user? I am not sure exactly how this might<BR>affect shared library loading, but it is possible there is a difference.<BR><BR>In our previous discussion, your library path did indeed have a<BR>librdmacm.so file, though it could not be loaded for an unknown reason.<BR>It is unclear to me if this email thread indicates that you have tried<BR>to rebuild that and are experiencing the same issue. Where you able to<BR>try running that test shared library example I gave and did it work? Did<BR>it work as the same user you are trying to run MVAPICH as?<BR><BR>It seems clear this is a runtime loader problem on Fedora Core 6, or on<BR>your particular configuration there. That is what cannot find the<BR>library. It is similar to the libtest code I provided as an example:<BR><BR>[rowland@e14-oib libtest]$
ls<BR>Makefile test.c test.h test-program.c<BR><BR>[rowland@e14-oib libtest]$ make normal<BR>gcc -c -fPIC test.c<BR>gcc -shared -Wl,-soname,libtest.so.1 -o libtest.so.1.0 test.o<BR>ln -s libtest.so.1.0 libtest.so.1<BR>ln -s libtest.so.1 libtest.so<BR>gcc -c -o test-program.o test-program.c<BR>gcc -o test-program test-program.o -L/home/7/rowland/libtest -ltest<BR><BR>[rowland@e14-oib libtest]$ ldd test-program<BR>libtest.so.1 => not found<BR>libc.so.6 => /lib64/tls/libc.so.6 (0x0000003bf1900000)<BR>/lib64/ld-linux-x86-64.so.2 (0x0000003bf1700000)<BR><BR>[rowland@e14-oib libtest]$ ./test-program<BR>./test-program: error while loading shared libraries: libtest.so.1: <BR>cannot open shared object file: No such file or directory<BR><BR>[rowland@e14-oib libtest]$ export LD_LIBRARY_PATH=$PWD<BR><BR>[rowland@e14-oib libtest]$ ldd test-program<BR>libtest.so.1 => /home/7/rowland/libtest/libtest.so.1 <BR>(0x00002abbf9aee000)<BR>libc.so.6 => /lib64/tls/libc.so.6
(0x0000003bf1900000)<BR>/lib64/ld-linux-x86-64.so.2 (0x0000003bf1700000)<BR><BR>[rowland@e14-oib libtest]$ ./test-program<BR>In shared library function...<BR><BR>In previous email your ldd output showed the library was being found:<BR><BR>Please see the output of ldd /usr/local/mvapich2/bin/mpdroot :<BR>[root@ammasso1 ~]# ldd /usr/local/mvapich2/bin/mpdroot<BR>linux-gate.so.1 => (0xffffe000)<BR>librdmacm.so => /usr/local/lib/librdmacm.so (0xb7fec000)<BR>libibverbs.so.2 => /usr/local/lib/libibverbs.so.2 (0xb7fe5000)<BR>libibumad.so.1 => /usr/local/lib/libibumad.so.1 (0xb7fdc000)<BR>libpthread.so.0 => /lib/libpthread.so.0 (0x0012a000)<BR>libc.so.6 => /lib/libc.so.6 (0x00ca7000)<BR>libsysfs.so.2 => /usr/lib/libsysfs.so.2 (0x00369000)<BR>libdl.so.2 => /lib/libdl.so.2 (0x00de6000)<BR>libibcommon.so.1 => /usr/local/lib/libibcommon.so.1 (0xb7fcb000)<BR>/lib/ld-linux.so.2 (0x002d8000)<BR><BR>But that path is different than the one you are quoting above.
Does an<BR>ldd on /root/0.9.8-RELEASE/bin/mpdroot find librdmacm.so too, as the<BR>same user you are trying to run it as?<BR><BR>I have one more idea for you to try here. You can do the following:<BR><BR>export LD_DEBUG=all<BR>/root/0.9.8-RELEASE/bin/mpdroot >&output<BR>unset LD_DEBUG<BR><BR>Then take a look at the output file to see if there are any relevant<BR>error messages. Don't forget to unset LD_DEBUG before doing anything else.<BR><BR>Also, just to be sure, if you run "file <PATH librdmacm.so to>" what<BR>does it say? It should indicate that it is a shared library as similarly to:<BR><BR>[rowland@e14-oib libtest]$ file /usr/local/ofed/lib64/librdmacm.so*<BR>/usr/local/ofed/lib64/librdmacm.so: symbolic link to <BR>`librdmacm.so.0.9.0'<BR>/usr/local/ofed/lib64/librdmacm.so.0.9.0: ELF 64-bit LSB shared object, <BR>AMD x86-64, version 1 (SYSV), not stripped<BR><BR>Unfortunately, we do not have any Fedora Core 6 systems to investigate<BR>this problem on at this
time, and I don't know anything about what might<BR>be there that would cause a problem. As far as I know, there shouldn't<BR>be. However, it seems there is some runtime issue on your Fedora Core 6<BR>machine or with how this is being run there. If it is in fact working on<BR>another distribution as you indicated in your previous response to us,<BR>then that also strongly points in this direction.<BR>-- <BR>Shaun Rowland rowland@cse.ohio-state.edu<BR>http://www.cse.ohio-state.edu/~rowland/<BR></BLOCKQUOTE><BR><p>
<hr size=1>Check out <a href="http://us.rd.yahoo.com/evt=43257/*http://advision.webevents.yahoo.com/mailbeta">the all-new Yahoo! Mail beta</a> - Fire up a more powerful email and get things done faster.