[openib-general] Re: osm log

Tom Duffy tduffy at sun.com
Fri Jan 21 16:14:22 PST 2005


On Fri, 2005-01-21 at 17:16 -0500, Hal Rosenstock wrote:
> Not sure whether you want me to post this to the list or not so I will
> start privately. Let me know if I should take it to the list.

Fine by me.  I got nothing to hide.

> Wow. That was a lot to go through...

Yeah, the other day my filesystem filled and I found out that osm.log
had grown to 34G.  Ouch!

> What is the topology of the subnet ? There are some extreme limitations
> of OpenSM right now when RMPP is used as the responses are limited to a
> single packet which means only a few records of any type will fit into
> the response (depending on the record type).

I should simplify my topology.  I will plug in my solaris box back to
back with a linux box and retest.  Can I run opensm on port 2?

> It looks to me like there are 2 end nodes (and 1 switch port 0) from the
> unicast forwarding table (LFT) as there are 3 LIDs assigned.
> 
> Is the topology that simple ?

I have an IB switch with 3 linux x86_64 systems and a solaris sparc64
system attached (each with only one port).  Also, an IB traffic
generator.  Oh, and one of the Linux systems has an analyzer in the
middle.  Also, one of the Linux systems might be down.

> I see SA deletes of the MC groups being answered properly as there is no
> match found for the deletion. It looks like there are 2 precreated
> groups (0xC000 and 0xC001) which are IPv4 groups. These are joined with
> "join" component masks (0x10083). There are 3 other groups (2 IPv6 ones
> and 1 other IPv4 one) which are created (component mask 0x130c7). 
> 
> I can see 5 MLIDs (0xC000-0xC004) but the MFT of the switch only appears
> to have one port (port 2) on for each of these LIDs. That means that not
> all the MC joins "worked" if there was more than one. Also, when the SA
> GetTable of MCMemberRecord is responded to, no records are returned.
> These GetTable requests have a component mask with MGID, PKey, and Scope
> set. Would the PKey be 0x8001 ? If so, I do see a previous dump of a
> PKey table which only has the full default partition (0xffff). Not sure
> if this is the problem yet.
> 
> Are all the ports active that should be ? That would explain why the
> other port's mutlicast join/creates are not seen by OpenSM.

That is great analysis.  I am still trying to figure out the best way to
determine what state a Solaris HCA is in (there is no convenient
ibstatus tool ala openib, AFAIK).  I can tell you one thing though: both
the green and amber LEDs are lit up on Port1 on the Solaris box.

-tduffy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050121/f89bf221/attachment.sig>


More information about the general mailing list