[openib-general] Performance Degradation with OFED v. Voltaire (lustre)

Eitan Zahavi eitan at mellanox.co.il
Fri Dec 15 09:20:03 PST 2006


Matt Leininger wrote:
> On Fri, 2006-12-15 at 09:44 +0100, Bernadat, Philippe wrote:
>   
>> I also looked at the HCA counters, and I indeed think 
>> there is something wrong about the MTU:
>>
>> For the same test
>>
>> With VIB
>>
>> PortXmitData:                  2684490382
>> PortRcvData:                      1750145
>> PortXmitPkts:                    10280007
>> PortRcvPkts:                        49962
>>
>> With OFED
>>
>> XmtBytes:........................2653730483
>> RcvBytes:........................1710541
>> XmtPkts:.........................5160009
>> RcvPkts:.........................50012
>>
>> Which means we sent half less packets with OFED 
>> and if you do the math it is 2K packets with OFED (counters are 32bit
>> units)
>> and 1K packets with VIB.
>>
>> So fo some reason the tavor_quirk param is ignored/overwriten.
>> Is there an interface to control this ?
>>     
>
>   Michael said you have to turn on this feature in OpenSM.  From the
> release notes I'm not sure how you turn it on in OpenSM.  You did turn
> on the tavor mtu work around in the rdma_cm, but did you turn it on in
> OpenSM?  Also what version of OpenSM are you running?
>   
To turn this option on in opensm you need to:
1. Run: opensm -c -o
2. Modify the file /var/cache/osm/opensm.opts by changing the line below
enable_quirks FALSE
to
enable_quirks TRUE

3. Run: opensm
>   Thanks,
>
> 	- Matt
>
>   
>> Philippe
>>
>>     
>>> -----Original Message-----
>>> From: Bernadat, Philippe 
>>> Sent: Friday, December 15, 2006 8:59 AM
>>> To: Michael S. Tsirkin; Roland Dreier
>>> Cc: Eitan Zahavi; Hal Rosenstock; openib-general at openib.org
>>> Subject: RE: Performance Degradation with OFED v. Voltaire (lustre)
>>>
>>> I have set tavor_quirk to 1 with no effect.
>>> Another thing I have tried is the same lustre 
>>> LNET echo test with a single thread (vs 8)
>>>
>>> VIB:      400 MB/s
>>> OFED-1.1: 333 MB/s
>>>
>>> I am posting the live param values for all infiniband 
>>> modules in case someone could identify some wrong setting:
>>>
>>> infiniband/core/ib_cm
>>>
>>> mra_timeout_limit              30000
>>>
>>> infiniband/core/rdma_cm
>>>
>>> max_cm_retries                    15
>>> tavor_quirk                        1
>>>
>>> infiniband/hw/ipath/ib_ipath
>>>
>>> cfgports                           0
>>> debug                              1
>>> disable_sma                        0
>>> kpiobufs                           0
>>> lkey_table_size                   12
>>> max_ahs                        65535
>>> max_cqes                      196607
>>> max_cqs                       131071
>>> max_mcast_grps                 16384
>>> max_mcast_qp_attached             16
>>> max_pds                        65535
>>> max_qps                        16384
>>> max_qp_wrs                     16383
>>> max_sges                          96
>>> max_srqs                        1024
>>> max_srq_sges                     128
>>> max_srq_wrs                   131071
>>> qp_table_size                    251
>>>
>>> infiniband/hw/mthca/ib_mthca
>>>
>>> catas_reset_disable                0
>>> debug_level                        0
>>> fmr_reserved_mtts             262144
>>> fw_cmd_doorbell                    0
>>> msi                                0
>>> msi_x                              1
>>> num_cq                         65536
>>> num_mcg                         8192
>>> num_mpt                       131072
>>> num_mtt                      1048576
>>> num_qp                         65536
>>> num_udav                       32768
>>> rdb_per_qp                         4
>>> tune_pci                           1
>>>
>>> infiniband/ulp/ipoib/ib_ipoib
>>>
>>> debug_level                        0
>>> mcast_debug_level                  0
>>> recv_queue_size                  128
>>> send_queue_size                   64
>>>
>>> Philippe
>>>
>>>       
>>>> -----Original Message-----
>>>> From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
>>>> Sent: Thursday, December 14, 2006 6:32 PM
>>>> To: Roland Dreier
>>>> Cc: Bernadat, Philippe; Eitan Zahavi; Hal Rosenstock; 
>>>> openib-general at openib.org
>>>> Subject: Re: Performance Degradation with OFED v. Voltaire
>>>>
>>>>         
>>>>>  > I think Eric described the major differences earlier on, 
>>>>>           
>>>> here it is, see
>>>>         
>>>>>  > second half:
>>>>>
>>>>> OK, I forgot about that.
>>>>>
>>>>> I guess one last thing to check would be the MTU being used 
>>>>>           
>>>> for the RC
>>>>         
>>>>> connections.  Since this is PCI-X HW then the MTU should 
>>>>>           
>>> be 1024 for
>>>       
>>>>> best throughput (instead of the max MTU of 2048).
>>>>>           
>>>> The MTU issue is described in the OFED release notes.
>>>> You must turn the Tavor work-around for it on in opensm.
>>>> This was introduced late in release cycle to it was deemed safer
>>>> to make it off by default.
>>>>
>>>> By the way, Eitan, Hal, can we turn this on by default now?
>>>> This was we'll get more feedback from people, and we'll still have
>>>> time to turn it off before release if this unexpectedly 
>>>> creates issues.
>>>>
>>>> -- 
>>>> MST
>>>>
>>>>         
>> _______________________________________________
>> openib-general mailing list
>> openib-general at openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>
>>     
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>   





More information about the general mailing list