[ofa-general] IB performance stats (revisited)

Eitan Zahavi eitan at mellanox.co.il
Wed Jul 11 07:03:35 PDT 2007


Hi Hal,
> > 
> > > Second, I have run some tests querying the fabric of our large 
> > > clusters here (~500 nodes) and the results were promising for a 
> > > single node implementation.
> > > I don't recall the numbers as this was a while ago but it 
> was on the 
> > > order of
> > > <2 sec and I think <1 but I don't want to be misquoted.
> > 
> > Does PerfMgr query switch ports ?
> 
> Yes (of course it does).
> 
> > If it does I am surprised by the short sweep time you got.
> > 
> > Does it have >1 query on the wire at a given time?
> 
> Yes, Default appears to be 500 currently (maybe that needs 
> dialing back a bit) but is settable via 
> perfmgr_max_outstanding_queries in options file.
This explains some.
> 
> > If not then I am even more surprised.
> > 
> > Was the cluster running a job at the time of the query ?
> 
> Is this question related to VL0 contention ?
Yes
> 
> -- Hal
> 
> > Thanks
> > 
> > Eitan Zahavi
> > Senior Engineering Director, Software Architect Mellanox 
> Technologies 
> > LTD
> > Tel:+972-4-9097208
> > Fax:+972-4-9593245
> > P.O. Box 586 Yokneam 20692 ISRAEL
> > 
> >  
> > 
> > > -----Original Message-----
> > > From: Ira Weiny [mailto:weiny2 at llnl.gov]
> > > Sent: Tuesday, July 10, 2007 7:47 PM
> > > To: Eitan Zahavi
> > > Cc: halr at voltaire.com; Mark.Seger at hp.com; 
> > > general at lists.openfabrics.org; Ed.Finn at FMR.COM
> > > Subject: Re: [ofa-general] IB performance stats (revisited)
> > > 
> > > On Thu, 28 Jun 2007 10:24:59 +0300
> > > "Eitan Zahavi" <eitan at mellanox.co.il> wrote:
> > > 
> > > > > On Wed, 2007-06-27 at 14:23, Eitan Zahavi wrote:
> > > > > > In the last months it is the second time I hear people
> > > > > complaining the
> > > > > > current monitoring solution in OFA is  integrated 
> with OpenSM.
> > > > > 
> > > > > I must have missed this both times (didn't see this in Mark's
> > > > > post) and the statement itself is somewhat inaccurate as well.
> > > > Private talks - I hope they will speak up for themselves now...
> > > > > 
> > > > > > These people do not use OpenSM but do use OFED.
> > > > > 
> > > > > I'm not sure I'm following what you mean here.
> > > > > 
> > > > > If you mean that some people want to run PerfMgr without
> > > the SM/SA
> > > > > aspects (so that they can run a vendor based SM), that is
> > > the next
> > > > > thing we are adding to the implementation.
> > > > Exactly. OK when is that coming?
> > > 
> > > There is very little which ties the current PerfMgr to OpenSM.  
> > > Basically it just gets the current fabric topology.
> > > As Hal has said changes are coming.
> > > 
> > > >
> > > > > 
> > > > > >  Another drawback if that
> > > > > > no naming is provided and the reporting uses GUIDs.
> > > > > 
> > > > > Naming is provided via NodeDescription.
> > > > This might be good for hosts but is not covering  switches ...
> > > 
> > > It does include switches.  However, since most systems 
> have the same 
> > > name for multiple switches this becomes ineffective.
> > >  I have queried Voltaire for a way to change the 
> NodeDescription for 
> > > switches, but at the time I asked, there was no way to do it.  
> > > Perhaps there is now?  What about other vendors?  This is why 
> > > ibnetdiscover and other diags have "switch map" support.  (A 
> > > GUID->name mapping to override the default 
> NodeDescription.) Nothing 
> > > would please me more than to be able to remove that for a more 
> > > "automatic" solution.
> > > 
> > > > > 
> > > > > > I also can't hold myself from saying again I think you
> > > are going
> > > > > > to hit the wall with the concept of doing the PMA from
> > > a single node.
> > > > > 
> > > > > If you are referring to the fact the PerMgr is currently not 
> > > > > distributed, that will be done as has been stated before.
> > > > Good. When is it expected? Will it be OFED 1.3?
> > > 
> > > When Hal first sent out the PerfMgr design I thought we 
> should jump 
> > > right to the distributed model as well.  But now I am 
> glad we have 
> > > gone the way we did.
> > > First off, we have something which "works" and from which we can 
> > > expand.
> > > Second, I have run some tests querying the fabric of our large 
> > > clusters here (~500 nodes) and the results were promising for a 
> > > single node implementation.
> > > I don't recall the numbers as this was a while ago but it 
> was on the 
> > > order of
> > > <2 sec and I think <1 but I don't want to be misquoted.
> > > 
> > > For sure, a distributed model offers many advantages and 
> we will get 
> > > there.  But for many the current single node approach should work 
> > > just fine.
> > > 
> > > Thanks,
> > > Ira
> > > 
> > > > 
> > > > Thanks
> > > > > 
> > > > > -- Hal
> > > > > 
> > > > > > Eitan Zahavi
> > > > > > Senior Engineering Director, Software Architect Mellanox
> > > > > Technologies
> > > > > > LTD
> > > > > > Tel:+972-4-9097208
> > > > > > Fax:+972-4-9593245
> > > > > > P.O. Box 586 Yokneam 20692 ISRAEL
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > > > -----Original Message-----
> > > > > > > From: general-bounces at lists.openfabrics.org
> > > > > > > [mailto:general-bounces at lists.openfabrics.org] On
> > > Behalf Of Hal
> > > > > > > Rosenstock
> > > > > > > Sent: Wednesday, June 27, 2007 8:12 PM
> > > > > > > To: Mark Seger
> > > > > > > Cc: Finn, Ed; general at lists.openfabrics.org
> > > > > > > Subject: Re: [ofa-general] IB performance stats 
> (revisited)
> > > > > > > 
> > > > > > > On Wed, 2007-06-27 at 13:07, Mark Seger wrote:
> > > > > > > > >The performance managers deal with the counter
> > > stickiness (by
> > > > > > > > >resetting them when they think they need to). They
> > > > > > > typically export
> > > > > > > > >their data although this is not specified by 
> IBA so it is
> > > > > > > in a vendor
> > > > > > > > >proprietary manner.
> > > > > > > > >  
> > > > > > > > >
> > > > > > > > so I guess these guys are poor citizens as well...
> > > > > > > 
> > > > > > > Not sure what you mean.
> > > > > > > 
> > > > > > > > the real issue as I see it then means nobody can trust
> > > > > the data if
> > > > > > > > randon tools randomly reset the counters.  a 
> real shame...
> > > > > > > 
> > > > > > > I consider this to be a real rather than random 
> app for this. 
> > > > > > > Guess it depends on what one considers random.
> > > > > > > 
> > > > > > > -- Hal
> > > > > > > 
> > > > > > > > -mark
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > general mailing list
> > > > > > > general at lists.openfabrics.org 
> > > > > > > 
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/genera
> > > > > > > l
> > > > > > > 
> > > > > > > To unsubscribe, please visit 
> > > > > > > http://openib.org/mailman/listinfo/openib-general
> > > > > > > 
> > > > > 
> > > > > 
> > > > _______________________________________________
> > > > general mailing list
> > > > general at lists.openfabrics.org
> > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > > 
> > > > To unsubscribe, please visit
> > > > http://openib.org/mailman/listinfo/openib-general
> > > > 
> > > 
> 
> 



More information about the general mailing list