[openib-general] IPoIB still not working
Eitan Zahavi
eitan at mellanox.co.il
Tue Dec 7 22:39:06 PST 2004
Forgive me for not following the entire thread.
But I did take a look at the log files:
The 64bit version have the following multicast activities:
1. Port 0x0002c9010ad258f1 joining MLID 0xC000 -> success.
Note that MLID 0xC000 is predefined (IPoIB).
MGID....................0xff12401bffff0000 : 0x00000000ffffffff
PortGid.................0xfe80000000000000 : 0x0002c9010ad258f1
qkey....................0x0
Mlid....................0x0
ScopeState..............0x1
Rate....................0x0
Mtu.....................0x0
2. Port 0x0002c9010ad258f1 joining MLID 0xC000. (Again).
MGID....................0xff12401bffff0000 : 0x00000000ffffffff
PortGid.................0xfe80000000000000 : 0x0002c9010ad258f1
qkey....................0x1B0B0000
Mlid....................0xC000
ScopeState..............0x11
Rate....................0x3
Mtu.....................0x4
-> considered as an update to the scope state.
3. Request to join :
MGID....................0xff12601bffff0000 : 0x0000000000000016
PortGid.................0xfe80000000000000 : 0x0002c9010ad258f1
qkey....................0x0
Mlid....................0x0
ScopeState..............0x1
Rate....................0x0
Mtu.....................0x0
Results with - ERR 1B10: Provided Join State != FullMember - required for
create.
You can not create a group if you are not a full member.
4. A sequence of requests arrive to create MGRPs with several MGIDs:
MGID 0xff12601bffff0000:0x0000000000000002
MGID 0xff12601bffff0000:0x0000000000000016
MGID 0xff12601bffff0000:0x00000001ffd258f1
All fail due to the same join state issue.
Inspecting the 32bit version:
I see only one request to join
Port 0x0002c90107fc5be1 joining MLID 0xC000
And it succeeds
Hope this helps.
Eitan Zahavi
Design Technology Director
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL
-----Original Message-----
From: Woodruff, Robert J [mailto:robert.j.woodruff at intel.com]
Sent: Wednesday, December 08, 2004 3:12 AM
To: Roland Dreier
Cc: openib-general at openib.org
Subject: RE: [openib-general] IPoIB still not working
Here are some log files.
First file, mcast-64.log is the /var/log/messages output
from the patch you sent on the 64-bit system.
Next log files is the opensm log file
osm-64bit.log
Next log file is the opensm log file when running the 32-node.
osm-32-bit.log
In the passing case, ipoib sends 2 MCM messages and opensm has no
complaints.
Search for MCMember Record in osm-32-bit.log
In the failing case, ipoib sends 2 MCM messages that look similar with
no errors
reported. However, in the failing case ipoib continues to send MCM
messages
that opensm rejects. In the failing case there are a couple of
differences, first the MGID lower 32-bits appear to be 0xffffffff in the
passing case and something else when it fails.
Second, it appears that perhaps the opensm is rejecting the messages
because
of a bug where the scope and join fields are reversed when extracted
from
the mad. In the passing case, since the lower 32 bits of the mgid are
0xfffffffff,
you never get to the code that checks the join member.
Someone that understands opensm should look at this, but Sean
I think it may be wrong.
This however does not explain why in the failing case, ipoib continues
to
try to join the mcast group unless it is having difficulties after
trying yo
join he group and decides to re-try, with the subsequent re-tries to
join being failed by opensm.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20041208/99dafd61/attachment.html>
More information about the general
mailing list