[openib-general] Re: osm log

Hal Rosenstock halr at voltaire.com
Fri Jan 21 18:30:26 PST 2005


On Fri, 2005-01-21 at 19:14, Tom Duffy wrote:
> On Fri, 2005-01-21 at 17:16 -0500, Hal Rosenstock wrote:
> > Wow. That was a lot to go through...
> 
> Yeah, the other day my filesystem filled and I found out that osm.log
> had grown to 34G.  Ouch!

You might want to use log.rotate

> > What is the topology of the subnet ? There are some extreme limitations
> > of OpenSM right now when RMPP is used as the responses are limited to a
> > single packet which means only a few records of any type will fit into
> > the response (depending on the record type).
> 
> I should simplify my topology.  I will plug in my solaris box back to
> back with a linux box and retest.  Can I run opensm on port 2?

Yes. You need to use -g and port 2's GUID. (You can even run on a second
HCA).

> > It looks to me like there are 2 end nodes (and 1 switch port 0) from the
> > unicast forwarding table (LFT) as there are 3 LIDs assigned.
> > 
> > Is the topology that simple ?
> 
> I have an IB switch with 3 linux x86_64 systems and a solaris sparc64
> system attached (each with only one port).  Also, an IB traffic
> generator.  Oh, and one of the Linux systems has an analyzer in the
> middle.  Also, one of the Linux systems might be down.

OK. That's pretty simple but can you just start with 1 Linux machine and
1 Solaris machine plugged together back to back and see if things work
and then move on.

> > I see SA deletes of the MC groups being answered properly as there is no
> > match found for the deletion. It looks like there are 2 precreated
> > groups (0xC000 and 0xC001) which are IPv4 groups. These are joined with
> > "join" component masks (0x10083). There are 3 other groups (2 IPv6 ones
> > and 1 other IPv4 one) which are created (component mask 0x130c7). 
> > 
> > I can see 5 MLIDs (0xC000-0xC004) but the MFT of the switch only appears
> > to have one port (port 2) on for each of these LIDs. That means that not
> > all the MC joins "worked" if there was more than one. Also, when the SA
> > GetTable of MCMemberRecord is responded to, no records are returned.
> > These GetTable requests have a component mask with MGID, PKey, and Scope
> > set. Would the PKey be 0x8001 ? If so, I do see a previous dump of a
> > PKey table which only has the full default partition (0xffff). Not sure
> > if this is the problem yet.
> > 
> > Are all the ports active that should be ? That would explain why the
> > other port's mutlicast join/creates are not seen by OpenSM.
> 
> That is great analysis.  I am still trying to figure out the best way to
> determine what state a Solaris HCA is in (there is no convenient
> ibstatus tool ala openib, AFAIK).  I can tell you one thing though: both
> the green and amber LEDs are lit up on Port1 on the Solaris box.

You can use smpdump from the Linux machine on attribute 0x14 to the LID
of the Solaris machine
/usr/local/ib/bin/smpdump 3 0x15
if Solaris is LID 3
Send me the output as there is no pretty decode right now. The output is
the raw MAD data.

 /usr/local/ib/bin/smpdump -D 0 0x15
0000 0000 0000 0000 fe80 0000 0000 0000
0002 0001 0050 0a68 0000 0000 0103 0301
1122 0011 4040 0008 0804 f410 0000 0000
0000 2012 1088 0000 0000 0000 0000 0000

PortState starts at bit 260 (byte 32.5) for 4 bits and PortPhysicalState
starts at bit 264 (byte 33) for 4 bits.

PortState is 1 (Down) and PortPhysicalState is 2 (Polling) in the above.

-- Hal






More information about the general mailing list