[libfabric-users] libnl error

Jeff Squyres (jsquyres) jsquyres at cisco.com
Mon Dec 5 10:51:07 PST 2016


On Dec 5, 2016, at 1:19 PM, Martin Cuma <martin.cuma at utah.edu> wrote:
> 
> we're running a fairly stock CentOS7.2 and after building libfabric I am getting
> 
> make test
> ./util/fi_info
> lt-fi_info: route/tc.c:973: rtnl_tc_register: Assertion `0' failed.

Short version
-------------

This almost certainly means that you have inadvertently linked in both libnl v1 and v3 libraries into your executable.  The easiest solution in CentOS 7 (albeit not always the quickest) is usually to uninstall all libnl v1 library RPMs and then rebuild your entire stack (e.g., Libfabric, MPI, ...etc.).

More detail
-----------

Libnl is the netlink userspace library.  It started off life as "libnl", and later evolved into "libnl3"(I don't know what happened to v2).  There were both significant additions and changes to the existing API such that applications are generally written specifically to v1, specifically to v3, or use configury to switch between v1 and v3.

An unfortunate side effect that you have run into is that v1 and v3 were never designed to work together in a single process.  If v1 and v3 are linked into the same process, there are conflicts in global symbols and data that cause the assertion you're seeing (I believe it usually happens in a process/load-time constructor).

Hence, you have to ensure that your application+stack entirely uses libnl v1 or libnl v3 -- never both. This means that *all* libraries that your application links to and/or dlopens must exclusively v1 or v3.

Some data points for you (in no particular order):

1. AFAIK, all RPMs that ship with RHEL 7 (and CentOS 7?) link against libnl v3.

2. In some RHEL 7 (and also CentOS 7?) installs, I see the following installed by default:

libnl
libnl-devel
libnl3

but *not* libnl3-devel.  Hence, if you compile new libraries/apps from scratch, they'll see the libnl v1 headers, and therefore choose to build against libnl v1.

3. If you then use RHEL-supplied (or CentOS-supplied) IB RPMs, they use libnl v3, but your application will be using libnl v1.  Kaboom.

What I typically do is uninstall libnl-devel.  This prevents any new application from choosing to compile/link against libnl v1.  You may still need libnl for other legacy apps (...maybe?).  But if you have no other dependencies, you might want to uninstall it, too, just to completely, 100%, absolutely, positively avoid this issue.

Hope that helps.



> lt-fi_info:6096 terminated with signal 6 at PC=2b3b13d7d5f7 SP=7fff736e6e18.  Backtrace:
> /lib64/libc.so.6(gsignal+0x37)[0x2b3b13d7d5f7]
> /lib64/libc.so.6(abort+0x148)[0x2b3b13d7ece8]
> /lib64/libc.so.6(+0x2e566)[0x2b3b13d76566]
> /lib64/libc.so.6(+0x2e612)[0x2b3b13d76612]
> /lib64/libnl-route-3.so.200(+0x21649)[0x2b3b1412b649]
> /lib64/ld-linux-x86-64.so.2(+0xf3a3)[0x2b3b126833a3]
> /lib64/ld-linux-x86-64.so.2(+0x146a)[0x2b3b12
> 
> From a web search this looks like a known problem, but, we do have both libnl 1 and 3 installed and ldd seems to pick both of them:
> ldd ../../../installdir/libfabric/1.4.0/lib/libfabric.so |grep nl
> 	libnl.so.1 => /lib64/libnl.so.1 (0x00002b798ae35000)
> 	libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00002b798bfcb000)
> 	libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00002b798c21a000)
> 
> Any thoughts on how this can be worked around? As this must be a fairly common issue for people on CentOS/RedHat 7, is there a resource on this that I can't find?
> 
> Thanks,
> MC
> 
> -- 
> Martin Cuma
> Center for High Performance Computing
> Department of Geology and Geophysics
> University of Utah
> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/libfabric-users


-- 
Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/




More information about the Libfabric-users mailing list