[openib-general] mvapich2 ofed 1.2 problem

Steve Wise swise at opengridcomputing.com
Thu Feb 15 08:59:45 PST 2007


Shaun,

Lemme know if you have an mvapich2 kit that I can test with iwarp...

Thanks,

Steve.


On Wed, 2007-02-14 at 23:31 -0500, Shaun Rowland wrote:
> Roland Dreier wrote:
> >  > When I build using the OFED-1.2-20070208-1508, libibverbs 1.0 is what is
> >  > built, at least by looking at the .so file result:
> >  > 
> >  > [rowland at z0 ~]$ ls /usr/local/ofed/lib64/ |grep ibverbs libibverbs.a
> >  > libibverbs.so
> >  > libibverbs.so.1
> >  > libibverbs.so.1.0.0
> > 
> > The soname hasn't changed because the library is still compatible.
> > But (I hope at least) OFED has libibverbs 1.1.
> 
> The soname is libibverbs.so.1, so I guess the longer name would not
> matter anyway. Clearly, what I posted shows the IBVERBS 1.1 ABI is
> there. I think I have figured out why our code has this problem. The
> problem below is similar to the original one posted about.
> 
> I did some experimentation with the srq_pingpong libibverbs example
> code. First I built it directly with:
> 
> 
> gcc -g -c pingpong.c -I/usr/local/ofed/include
> 
> gcc -g -c -D_GNU_SOURCE srq_pingpong.c -I/usr/local/ofed/include
> 
> gcc -g -o srq_pingpong srq_pingpong.o pingpong.o -L/usr/local/ofed/lib64 
> -libverbs
> 
> 
> This works.  Next I copied srq_pingpong.c to two files:
> 
> srq_pingpong_rowland.c
>       - just has a main function that calls lib_start().
> 
> srq_pingpong_lib_rowland.c
>       - main() changed to lib_start().
> 
> This moves all of the SRQ pingpong code into a shared library. If I
> build this shared library in this way, it works:
> 
> 
> gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include
> 
> gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c 
> -I/usr/local/ofed/include
> 
> gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so 
> srq_pingpong_lib_rowland.o pingpong.o -L/usr/local/ofed/lib64 -libverbs
> 
> gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD -lsrqtest
> 
> 
> Above I am linking libibverbs directly into my libsrqtest.so
> library. This works and the IBVERBS 1.1 ABI is clearly in the
> libsrqtest.so file:
> 
> [rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head
>                   U ibv_ack_cq_events@@IBVERBS_1.1
>                   U ibv_alloc_pd@@IBVERBS_1.1
>                   U ibv_close_device@@IBVERBS_1.1
>                   U ibv_create_comp_channel@@IBVERBS_1.0
>                   U ibv_create_cq@@IBVERBS_1.1
>                   U ibv_create_qp@@IBVERBS_1.1
>                   U ibv_create_srq@@IBVERBS_1.1
>                   U ibv_dealloc_pd@@IBVERBS_1.1
>                   U ibv_dereg_mr@@IBVERBS_1.1
>                   U ibv_destroy_comp_channel@@IBVERBS_1.0
> 
> However, if I build in a similar way to MVAPICH2, the resulting program
> fails:
> 
> 
> gcc -g -fpic -c pingpong.c -I/usr/local/ofed/include
> 
> gcc -g -fpic -c -D_GNU_SOURCE srq_pingpong_lib_rowland.c 
> -I/usr/local/ofed/include
> 
> gcc -g -shared -Wl,-soname,libsrqtest.so -o libsrqtest.so 
> srq_pingpong_lib_rowland.o pingpong.o
> 
> gcc -g -o srq_pingpong_rowland srq_pingpong_rowland.c -L$PWD 
> -L/usr/local/ofed/lib64 -lsrqtest -libverbs
> 
> 
> Above I am not linking libibverbs into libsrqtest.so, thus it is
> required on the last gcc line. This is how MVAPICH2's libmpich.so file
> works, and from past experience, I've seen this before. Running shows:
> 
> [rowland at z1 ibverbs-examples]$ gdb ./srq_pingpong_rowland
> GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host 
> libthread_db library "/lib64/tls/libthread_db.so.1".
> 
> (gdb) r
> Starting program: 
> /home/7/rowland/z1-test/ibverbs-examples/srq_pingpong_rowland
> [Thread debugging using libthread_db enabled]
> [New Thread 182896403968 (LWP 29858)]
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 182896403968 (LWP 29858)]
> post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0, 
> bad_wr=0x7fbfff88c8)
>      at src/compat-1_0.c:312
> 312     src/compat-1_0.c: No such file or directory.
>          in src/compat-1_0.c
> (gdb) bt
> #0  post_srq_recv_wrapper_1_0 (srq=0x5075b0, wr=0x7fbfff88d0,
>      bad_wr=0x7fbfff88c8) at src/compat-1_0.c:312
> #1  0x0000002a95559e12 in ibv_post_srq_recv (srq=0x5075b0,
>      recv_wr=0x7fbfff88d0, bad_recv_wr=0x7fbfff88c8)
>      at /usr/local/ofed/include/infiniband/verbs.h:915
> #2  0x0000002a95559dcf in pp_post_recv (ctx=0x5023d0, n=500)
>      at srq_pingpong_lib_rowland.c:496
> #3  0x0000002a9555a614 in lib_start (argc=1, argv=0x7fbffff7f8)
>      at srq_pingpong_lib_rowland.c:696
> #4  0x0000000000400608 in main (argc=1, argv=0x7fbffff7f8)
>      at srq_pingpong_rowland.c:36
> (gdb) quit
> 
> It is not clear to me why the difference of either linking libibverbs
> into libsrqtest.so or not doing so causes the IBVERBS 1.1 ABI to be used
> or not. I looked at the libibverbs code, and the 1.1 ABI is the default.
> The libsrqtest.so file in the above case seems to have lost this
> information:
> 
> [rowland at z1 ibverbs-examples]$ nm libsrqtest.so |grep ibv |head
>                   U ibv_ack_cq_events
>                   U ibv_alloc_pd
>                   U ibv_close_device
>                   U ibv_create_comp_channel
>                   U ibv_create_cq
>                   U ibv_create_qp
>                   U ibv_create_srq
>                   U ibv_dealloc_pd
>                   U ibv_dereg_mr
>                   U ibv_destroy_comp_channel
> 
> I've never had to deal with an ABI issue like this in shared library
> linking/usage. Does it make sense for this to be the case? I think
> perhaps it does, but I wanted to ask.
> 
> I've placed my test code here if it helps:
> 
> http://www.cse.ohio-state.edu/~rowland/ibverbs-examples.tar.gz
> 
> I have a fix for our code that I am testing now. It seems to work and
> solve the observed problems, but more testing will be required to be
> sure there are no issues. This will require a new SRPM if the fix is
> required, which it seems at this point.





More information about the general mailing list