[openib-general] umad abi 2 v 3 and multicast join failed

Hal Rosenstock halr at voltaire.com
Thu May 26 10:44:53 PDT 2005


On Wed, 2005-05-25 at 19:06, Troy Benjegerdes wrote:
> I was running a crufty version of opensm (compiled from the
> roland-uverbs branch), and I started getting these kinds of errors for
> no apparent reason:
> 
> ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status
> -22
> ib0: multicast join failed for ff12:401b:ffff:0:0:0:ffff:ffff, status
> -22
> 
> I'm running 2.6.11 kernels, and 'stock' modules.. I just tried
> rebuilding opensm from the latest SVN, but it apparently needs a new
> umad driver..
> 
> warn: [24878] umad_init: wrong ABI version:
> /sys/class/infiniband_mad/abi_version is 2 but library ABI is 3
> 
> 
> I suppose I need to rebuild the kernel ib_umad (and maybe everything
> else for good measure).. And if I do that, should I expect OpenSM to
> work better regarding the multicast issue?

Yes, one quick workaround would be to pull the latest user_mad.c and
user_mad.h files from OpenIB svn and rebuild the umad module. There
might be some danger in this if RMPP is used as what has been pushed
upstream does not include the changes for RMPP (MAD layer). Is this
worth a try ? How quickly do you need a solution ?

> Also, what will happen if I run opensm on two different nodes? Will they
> fight, or will one of them figure out how to be a backup slave SM if the
> first goes down?

Is this relative to the umad ABI version ? That is just a local issue.
There is no issue with OpenSMs on different nodes using different umad
ABI versions as any communication between them is via standard MADs.

-- Hal




More information about the general mailing list