[ofa-general] SubnAdmGet (6777)

Yossi Etigin yosefe at voltaire.com
Mon Jun 1 10:50:06 PDT 2009


Bob Ciotti wrote:
> Sorry to bounce this off the list - should it be too remedial. I promise
> that I've been consuming a lot of the spec and OFA code. Maybe you consider
> that a promise or a warning we will be more active :|
> 
> Our configuration is >6000 CA in a mix of infinihostIII/connectx and
> longbow extenders and >800 24 port switches on a single subnet. (SGI ICE
> with lots of other stuff plugged in). Its DDR everywhere except across the
> longbows. Hosts range from a few different generations of x86 xeon, x86
> opteron and itanium. We use lustre but have the srp traffic on a separate
> subnet.
> 
> A few weeks ago connection setup times were mentioned on this list along
> with ARP and path record lookups not being scalable. We experience these
> problems as well and need to address these scalability issues. I have a quite
> a bit of test data and a few different ideas to bounce off the list RE path
> records, once I am a little more versed in the spec. There has already been 
> some work done to limit ARP traffic.
> 
> 
> Todays question has to do with SM errors.  
> We have been seeing lots of these - sometimes more than others. Digging
> around some it appears that the 6777 represents the number of duplicates?
> This value fluctuates around some, but not alot. Comments in the code
> indicate that any valuse >1 is a problem. Question is, should or is this
> OK to be happening and how does it occur?
> 
> We will probably do an update to the 1.4 or 1.4.1 SM in the next few days.
> We are currently running a pre 1.4 top of tree pull from back in dec. bob
> 
> 

This may be the path record query triggered by ipoib (ib_sa_path_rec_get).
It uses METHOD_GET and if there is more than one path record it will fail.
using METHOD_GET_TABLE should solve it (the fix is in ib_sa module.)

If you enable debugging in ipoib and see path record query failures, this
is probably it.

--Yossi.






More information about the general mailing list