[openib-general] Performance Degradation with OFED v. Voltaire (lustre)

Matt Leininger mlleinin at hpcn.ca.sandia.gov
Fri Dec 15 02:19:28 PST 2006


On Fri, 2006-12-15 at 09:44 +0100, Bernadat, Philippe wrote:
> I also looked at the HCA counters, and I indeed think 
> there is something wrong about the MTU:
> 
> For the same test
> 
> With VIB
> 
> PortXmitData:                  2684490382
> PortRcvData:                      1750145
> PortXmitPkts:                    10280007
> PortRcvPkts:                        49962
> 
> With OFED
> 
> XmtBytes:........................2653730483
> RcvBytes:........................1710541
> XmtPkts:.........................5160009
> RcvPkts:.........................50012
> 
> Which means we sent half less packets with OFED 
> and if you do the math it is 2K packets with OFED (counters are 32bit
> units)
> and 1K packets with VIB.
> 
> So fo some reason the tavor_quirk param is ignored/overwriten.
> Is there an interface to control this ?

  Michael said you have to turn on this feature in OpenSM.  From the
release notes I'm not sure how you turn it on in OpenSM.  You did turn
on the tavor mtu work around in the rdma_cm, but did you turn it on in
OpenSM?  Also what version of OpenSM are you running?

  Thanks,

	- Matt

> 
> Philippe
> 
> > -----Original Message-----
> > From: Bernadat, Philippe 
> > Sent: Friday, December 15, 2006 8:59 AM
> > To: Michael S. Tsirkin; Roland Dreier
> > Cc: Eitan Zahavi; Hal Rosenstock; openib-general at openib.org
> > Subject: RE: Performance Degradation with OFED v. Voltaire (lustre)
> > 
> > I have set tavor_quirk to 1 with no effect.
> > Another thing I have tried is the same lustre 
> > LNET echo test with a single thread (vs 8)
> > 
> > VIB:      400 MB/s
> > OFED-1.1: 333 MB/s
> > 
> > I am posting the live param values for all infiniband 
> > modules in case someone could identify some wrong setting:
> > 
> > infiniband/core/ib_cm
> > 
> > mra_timeout_limit              30000
> > 
> > infiniband/core/rdma_cm
> > 
> > max_cm_retries                    15
> > tavor_quirk                        1
> > 
> > infiniband/hw/ipath/ib_ipath
> > 
> > cfgports                           0
> > debug                              1
> > disable_sma                        0
> > kpiobufs                           0
> > lkey_table_size                   12
> > max_ahs                        65535
> > max_cqes                      196607
> > max_cqs                       131071
> > max_mcast_grps                 16384
> > max_mcast_qp_attached             16
> > max_pds                        65535
> > max_qps                        16384
> > max_qp_wrs                     16383
> > max_sges                          96
> > max_srqs                        1024
> > max_srq_sges                     128
> > max_srq_wrs                   131071
> > qp_table_size                    251
> > 
> > infiniband/hw/mthca/ib_mthca
> > 
> > catas_reset_disable                0
> > debug_level                        0
> > fmr_reserved_mtts             262144
> > fw_cmd_doorbell                    0
> > msi                                0
> > msi_x                              1
> > num_cq                         65536
> > num_mcg                         8192
> > num_mpt                       131072
> > num_mtt                      1048576
> > num_qp                         65536
> > num_udav                       32768
> > rdb_per_qp                         4
> > tune_pci                           1
> > 
> > infiniband/ulp/ipoib/ib_ipoib
> > 
> > debug_level                        0
> > mcast_debug_level                  0
> > recv_queue_size                  128
> > send_queue_size                   64
> > 
> > Philippe
> > 
> > > -----Original Message-----
> > > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
> > > Sent: Thursday, December 14, 2006 6:32 PM
> > > To: Roland Dreier
> > > Cc: Bernadat, Philippe; Eitan Zahavi; Hal Rosenstock; 
> > > openib-general at openib.org
> > > Subject: Re: Performance Degradation with OFED v. Voltaire
> > > 
> > > >  > I think Eric described the major differences earlier on, 
> > > here it is, see
> > > >  > second half:
> > > > 
> > > > OK, I forgot about that.
> > > > 
> > > > I guess one last thing to check would be the MTU being used 
> > > for the RC
> > > > connections.  Since this is PCI-X HW then the MTU should 
> > be 1024 for
> > > > best throughput (instead of the max MTU of 2048).
> > > 
> > > The MTU issue is described in the OFED release notes.
> > > You must turn the Tavor work-around for it on in opensm.
> > > This was introduced late in release cycle to it was deemed safer
> > > to make it off by default.
> > > 
> > > By the way, Eitan, Hal, can we turn this on by default now?
> > > This was we'll get more feedback from people, and we'll still have
> > > time to turn it off before release if this unexpectedly 
> > > creates issues.
> > > 
> > > -- 
> > > MST
> > > 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 




More information about the general mailing list