[openib-general] IPoIB still not working [was IPoIB FAQ Update]

Hal Rosenstock halr at voltaire.com
Tue Dec 7 11:59:17 PST 2004


On Tue, 2004-12-07 at 14:33, Woodruff, Robert J wrote:
>  
> >I'm following the FAQ pretty closely and IPoIB is still not working for
> >me.  I'm using the latest from SVN (as of tonight).  These are PCIe
> HCAs
> >with 4.5.3 firmware (x86_64 MST is horribly broken BTW --can't wait for
> >tvflash).  Hardware/cabling is fine since things work under VAPI 3.2.
> >All modules are loaded, and syslog doesn't show anything.  It seems now
> >(new with 4.5.3 FW) that packets are flowing one direction but not the
> >other.  Here's as much debug info as I could dig up.
> 
> Ok here is what I found. I installed the openib.org code on my 
> EM64T (x86_64) systems that have PCI-E HCAs. We have a 8 node
> Mellanox switch. On one node (a 32-bit Xeon node), 
> we are running opensm from the SF project.
> We have one 32-bit system (Sean's) and 2 EM64T systems connected to 
> the switch that are running the openib.org code.
> When Sean's 32-bit system loads ipoib and configures the 
> interface, everything seems to work fine. However, when I load ipoib
> on the 64-bit x86_64 node, the opensm starts complaining that 
> 
> osm_mcr_rcv_join_mgrp ERR IB10 Provided Join State != FullMember
> 
> so it looks like the x86_64 system is not able to join the multicast
> group.
> I suspect some issue with structure definitions, but have not debugged
> the 
> issue any further.

I run on x86_64 (Opteron) too with a different SM and I do not see that
issue.

While there is code in OpenIB IPoIB to perform send only joins, it
currently joins as full member (the send only join state is commented
out to be full member right now due to some SMs lacking this support
(which BTW is required if it does support multicast) so I can't explain
what OpenSM indicates. Maybe OpenSM is putting out the wrong message.
There are two types of "joins" done by OpenIB: 1. with components
sufficient to create the multicast group, and 2. with components
sufficient to join an already created group. The second style has been
done for a long time and is similar to those being used by other stacks.
The first is relatively new and OpenSM may not like it or already have
this group preconfigured and return an error due to different
characteristics or something like that.

Is it correct to presume you do not have an IB analyzer to capture the
SA packets on the link to/from the x86_64 system ?

-- Hal




More information about the general mailing list