[openib-general] IPoIB still not working

Eitan Zahavi eitan at mellanox.co.il
Tue Dec 7 22:39:06 PST 2004


Forgive me for not following the entire thread. 
But I did take a look at the log files:

The 64bit version have the following  multicast activities:
1. Port 0x0002c9010ad258f1 joining MLID 0xC000 -> success.
   Note that MLID 0xC000 is predefined (IPoIB).
            MGID....................0xff12401bffff0000 : 0x00000000ffffffff
            PortGid.................0xfe80000000000000 : 0x0002c9010ad258f1
            qkey....................0x0
            Mlid....................0x0
            ScopeState..............0x1
            Rate....................0x0
            Mtu.....................0x0

2. Port 0x0002c9010ad258f1 joining MLID 0xC000. (Again).
            MGID....................0xff12401bffff0000 : 0x00000000ffffffff
            PortGid.................0xfe80000000000000 : 0x0002c9010ad258f1
            qkey....................0x1B0B0000
            Mlid....................0xC000
            ScopeState..............0x11
            Rate....................0x3
            Mtu.....................0x4
    -> considered as an update to the scope state.

3. Request to join :
            MGID....................0xff12601bffff0000 : 0x0000000000000016
            PortGid.................0xfe80000000000000 : 0x0002c9010ad258f1
            qkey....................0x0
            Mlid....................0x0
            ScopeState..............0x1
            Rate....................0x0
            Mtu.....................0x0
Results with - ERR 1B10: Provided Join State != FullMember - required for
create.
You can not create a group if you are not a full member.

4. A sequence of requests arrive to create MGRPs with several MGIDs:
MGID 0xff12601bffff0000:0x0000000000000002
MGID 0xff12601bffff0000:0x0000000000000016
MGID 0xff12601bffff0000:0x00000001ffd258f1
All fail due to the same join state issue.

Inspecting the 32bit version:
I see only one request to join
Port 0x0002c90107fc5be1 joining MLID 0xC000
And it succeeds 

Hope this helps.     

Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


-----Original Message-----
From: Woodruff, Robert J [mailto:robert.j.woodruff at intel.com] 
Sent: Wednesday, December 08, 2004 3:12 AM
To: Roland Dreier
Cc: openib-general at openib.org
Subject: RE: [openib-general] IPoIB still not working

 
Here are some log files.

First file, mcast-64.log is the /var/log/messages output 
from the patch you sent on the 64-bit system.

Next log files is the opensm log file 
osm-64bit.log

Next log file is the opensm log file when running the 32-node.
osm-32-bit.log


In the passing case, ipoib sends 2 MCM messages and opensm has no
complaints.
Search for MCMember Record in osm-32-bit.log

In the failing case, ipoib sends 2 MCM messages that look similar with
no errors
reported. However, in the failing case ipoib continues to send MCM
messages
that opensm rejects. In the failing case there are a couple of 
differences, first the MGID lower 32-bits appear to be 0xffffffff in the
passing case and something else when it fails. 
Second, it appears that perhaps the opensm is rejecting the messages
because
of a bug where the scope and join fields are reversed when extracted
from
the mad. In the passing case, since the lower 32 bits of the mgid are
0xfffffffff,
you never get to the code that checks the join member. 
Someone that understands opensm should look at this, but Sean
I think it may be wrong.

This however does not explain why in the failing case, ipoib continues
to 
try to join the mcast group unless it is having difficulties after
trying yo 
join he group and decides to re-try, with the subsequent re-tries to 
join being failed by opensm.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20041208/99dafd61/attachment.html>


More information about the general mailing list