[ofa-general] IB performance stats (revisited)

Hal Rosenstock halr at voltaire.com
Wed Jul 11 07:22:30 PDT 2007


On Wed, 2007-07-11 at 10:15, Mark Seger wrote:
> My basic philosophy, and I suspect there are those who might disagree, 
> is that you can't use the network to monitor the network, at least not 
> in times of trouble.

Right, in times of certain troubles.

> That's why I insist on having to query the HCAs 
> directly since I can't always be sure the network is there and/or 
> reliable.  If you are willing to concede that this can indeed happen 
> than the question becomes one of how do you reliably get data from an 
> HCA and that's the basis for my (re)starting this discussion.

The reliability comes from timeout/retry mechanisms. If performance data
cannot be obtained on an IB network, it needs to be trouble shooted at a
lower level (by SMPs).

In any case, a rearchitecture of the PMA was proposed and seems
reasonable to me in that it can accomodate either approach. All that is
needed now is for someone to step up and champion an implementation of
this. Unfortunately, I do not have time to do so.

> As for querying the switch for counters, what do you do on a very large 
> network, say 10s of thousands of nodes if you want to get performance 
> data every second?  I also realize this is an extreme situation today 
> (the node count not the frequency of monitoring) but I'm sure everyone 
> would agree systems of these sizes are not that far off.

You have a distributed performance manager to handle this. A hierarchy
of performance managers has been discussed on the list before.

-- Hal

> -mark
> 
> Hal Rosenstock wrote:
> 
> >Hi Eitan,
> >
> >On Wed, 2007-07-11 at 06:51, Eitan Zahavi wrote:
> >  
> >
> >>Hi Ira,
> >>
> >>    
> >>
> >>>Second, I have run some tests querying the fabric of our 
> >>>large clusters here (~500 nodes) and the results were 
> >>>promising for a single node implementation.
> >>>I don't recall the numbers as this was a while ago but it was 
> >>>on the order of
> >>><2 sec and I think <1 but I don't want to be misquoted.
> >>>      
> >>>
> >>Does PerfMgr query switch ports ?
> >>    
> >>
> >
> >Yes (of course it does).
> >
> >  
> >
> >>If it does I am surprised by the short sweep time you got.
> >>
> >>Does it have >1 query on the wire at a given time?
> >>    
> >>
> >
> >Yes, Default appears to be 500 currently (maybe that needs dialing back
> >a bit) but is settable via perfmgr_max_outstanding_queries in options
> >file.
> >
> >  
> >
> >>If not then I am even more surprised.
> >>
> >>Was the cluster running a job at the time of the query ?
> >>    
> >>
> >
> >Is this question related to VL0 contention ?
> >
> >-- Hal
> >
> >  
> >
> >>Thanks
> >>
> >>Eitan Zahavi
> >>Senior Engineering Director, Software Architect
> >>Mellanox Technologies LTD
> >>Tel:+972-4-9097208
> >>Fax:+972-4-9593245
> >>P.O. Box 586 Yokneam 20692 ISRAEL
> >>
> >> 
> >>
> >>    
> >>
> >>>-----Original Message-----
> >>>From: Ira Weiny [mailto:weiny2 at llnl.gov] 
> >>>Sent: Tuesday, July 10, 2007 7:47 PM
> >>>To: Eitan Zahavi
> >>>Cc: halr at voltaire.com; Mark.Seger at hp.com; 
> >>>general at lists.openfabrics.org; Ed.Finn at FMR.COM
> >>>Subject: Re: [ofa-general] IB performance stats (revisited)
> >>>
> >>>On Thu, 28 Jun 2007 10:24:59 +0300
> >>>"Eitan Zahavi" <eitan at mellanox.co.il> wrote:
> >>>
> >>>      
> >>>
> >>>>>On Wed, 2007-06-27 at 14:23, Eitan Zahavi wrote:
> >>>>>          
> >>>>>
> >>>>>>In the last months it is the second time I hear people
> >>>>>>            
> >>>>>>
> >>>>>complaining the
> >>>>>          
> >>>>>
> >>>>>>current monitoring solution in OFA is  integrated with OpenSM.
> >>>>>>            
> >>>>>>
> >>>>>I must have missed this both times (didn't see this in Mark's
> >>>>>post) and the statement itself is somewhat inaccurate as well.
> >>>>>          
> >>>>>
> >>>>Private talks - I hope they will speak up for themselves now...
> >>>>        
> >>>>
> >>>>>>These people do not use OpenSM but do use OFED.
> >>>>>>            
> >>>>>>
> >>>>>I'm not sure I'm following what you mean here.
> >>>>>
> >>>>>If you mean that some people want to run PerfMgr without 
> >>>>>          
> >>>>>
> >>>the SM/SA 
> >>>      
> >>>
> >>>>>aspects (so that they can run a vendor based SM), that is 
> >>>>>          
> >>>>>
> >>>the next 
> >>>      
> >>>
> >>>>>thing we are adding to the implementation.
> >>>>>          
> >>>>>
> >>>>Exactly. OK when is that coming?
> >>>>        
> >>>>
> >>>There is very little which ties the current PerfMgr to 
> >>>OpenSM.  Basically it just gets the current fabric topology.  
> >>>As Hal has said changes are coming.
> >>>
> >>>      
> >>>
> >>>>>> Another drawback if that
> >>>>>>no naming is provided and the reporting uses GUIDs.
> >>>>>>            
> >>>>>>
> >>>>>Naming is provided via NodeDescription.
> >>>>>          
> >>>>>
> >>>>This might be good for hosts but is not covering  switches ...
> >>>>        
> >>>>
> >>>It does include switches.  However, since most systems have 
> >>>the same name for multiple switches this becomes ineffective. 
> >>> I have queried Voltaire for a way to change the 
> >>>NodeDescription for switches, but at the time I asked, there 
> >>>was no way to do it.  Perhaps there is now?  What about other 
> >>>vendors?  This is why ibnetdiscover and other diags have 
> >>>"switch map" support.  (A GUID->name mapping to override the 
> >>>default NodeDescription.) Nothing would please me more than 
> >>>to be able to remove that for a more "automatic" solution.
> >>>
> >>>      
> >>>
> >>>>>>I also can't hold myself from saying again I think you 
> >>>>>>            
> >>>>>>
> >>>are going 
> >>>      
> >>>
> >>>>>>to hit the wall with the concept of doing the PMA from 
> >>>>>>            
> >>>>>>
> >>>a single node.
> >>>      
> >>>
> >>>>>If you are referring to the fact the PerMgr is currently not 
> >>>>>distributed, that will be done as has been stated before.
> >>>>>          
> >>>>>
> >>>>Good. When is it expected? Will it be OFED 1.3?
> >>>>        
> >>>>
> >>>When Hal first sent out the PerfMgr design I thought we 
> >>>should jump right to the distributed model as well.  But now 
> >>>I am glad we have gone the way we did.
> >>>First off, we have something which "works" and from which we 
> >>>can expand.
> >>>Second, I have run some tests querying the fabric of our 
> >>>large clusters here (~500 nodes) and the results were 
> >>>promising for a single node implementation.
> >>>I don't recall the numbers as this was a while ago but it was 
> >>>on the order of
> >>><2 sec and I think <1 but I don't want to be misquoted.
> >>>
> >>>For sure, a distributed model offers many advantages and we 
> >>>will get there.  But for many the current single node 
> >>>approach should work just fine.
> >>>
> >>>Thanks,
> >>>Ira
> >>>
> >>>      
> >>>
> >>>>Thanks
> >>>>        
> >>>>
> >>>>>-- Hal
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>Eitan Zahavi
> >>>>>>Senior Engineering Director, Software Architect Mellanox
> >>>>>>            
> >>>>>>
> >>>>>Technologies
> >>>>>          
> >>>>>
> >>>>>>LTD
> >>>>>>Tel:+972-4-9097208
> >>>>>>Fax:+972-4-9593245
> >>>>>>P.O. Box 586 Yokneam 20692 ISRAEL
> >>>>>>
> >>>>>> 
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>-----Original Message-----
> >>>>>>>From: general-bounces at lists.openfabrics.org
> >>>>>>>[mailto:general-bounces at lists.openfabrics.org] On 
> >>>>>>>              
> >>>>>>>
> >>>Behalf Of Hal 
> >>>      
> >>>
> >>>>>>>Rosenstock
> >>>>>>>Sent: Wednesday, June 27, 2007 8:12 PM
> >>>>>>>To: Mark Seger
> >>>>>>>Cc: Finn, Ed; general at lists.openfabrics.org
> >>>>>>>Subject: Re: [ofa-general] IB performance stats (revisited)
> >>>>>>>
> >>>>>>>On Wed, 2007-06-27 at 13:07, Mark Seger wrote:
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>The performance managers deal with the counter 
> >>>>>>>>>                  
> >>>>>>>>>
> >>>stickiness (by 
> >>>      
> >>>
> >>>>>>>>>resetting them when they think they need to). They
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>typically export
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>their data although this is not specified by IBA so it is
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>in a vendor
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>proprietary manner.
> >>>>>>>>> 
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>so I guess these guys are poor citizens as well...
> >>>>>>>>                
> >>>>>>>>
> >>>>>>>Not sure what you mean.
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>the real issue as I see it then means nobody can trust
> >>>>>>>>                
> >>>>>>>>
> >>>>>the data if
> >>>>>          
> >>>>>
> >>>>>>>>randon tools randomly reset the counters.  a real shame...
> >>>>>>>>                
> >>>>>>>>
> >>>>>>>I consider this to be a real rather than random app for this. 
> >>>>>>>Guess it depends on what one considers random.
> >>>>>>>
> >>>>>>>-- Hal
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>-mark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>                
> >>>>>>>>
> >>>>>>>_______________________________________________
> >>>>>>>general mailing list
> >>>>>>>general at lists.openfabrics.org
> >>>>>>>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >>>>>>>
> >>>>>>>To unsubscribe, please visit
> >>>>>>>http://openib.org/mailman/listinfo/openib-general
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>          
> >>>>>
> >>>>_______________________________________________
> >>>>general mailing list
> >>>>general at lists.openfabrics.org
> >>>>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >>>>
> >>>>To unsubscribe, please visit 
> >>>>http://openib.org/mailman/listinfo/openib-general
> >>>>
> >>>>        
> >>>>
> 




More information about the general mailing list