[ewg] Need help for Infiniband optimisation for our cluster (MTU...)

Mike Heinz michael.heinz at qlogic.com
Tue Dec 7 08:10:29 PST 2010


That's the ipoib value. It tells Linux it supports 64k packets, but they get broken up when they hit the wire. I spent a few minutes looking through stock ofed for a tool that displays the active mtu size, but the only tool I know of is part of QLogic's OFED+ stack.

-----Original Message-----
From: ewg-bounces at lists.openfabrics.org [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of giggzounet
Sent: Tuesday, December 07, 2010 10:59 AM
To: ewg at lists.openfabrics.org
Subject: Re: [ewg] Need help for Infiniband optimisation for our cluster (MTU...)

ok Thx for all your explanations.

So the MTU value which I'm seeing on the ib0 interface (65520) is not
connected to the "real" infiniband MTU value ?


Le 07/12/2010 16:52, Mike Heinz a écrit :
> Heh. I forgot Intel sells an mpi, I thought you were saying you had recompiled one of the OFED mpis with icc.
> 
> 1) For your small cluster, there's no reason not to use connected mode. The only reason for providing a datagram mode with MPI is to support very large clusters where there simply aren't enough system resources for every node to connect with every other node.
> 
> I would still suggest experimenting with mvapich-1 (and recompiling it with icc) to see if you get better performance.
> 
> 2) Similarly, for a small cluster, QoS won't give you any benefit. The purpose of QoS is to divide up the fabric's bandwidth so that multiple simultaneous apps can share it in a controlled way. If you're only running one app at a time (which seems likely) you want that app to get all available bandwidth.
> 
> I'm not sure how you check the MTU size when using stock OFED, but my memory for those HCAs is that they can use 2k MTUs. You only use a smaller MTU size when the larger size causes reliability problems.
> 
> -----Original Message-----
> From: ewg-bounces at lists.openfabrics.org [mailto:ewg-bounces-ZwoEplunGu2DiMhRP7xlEA at public.gmane.orgcs.org] On Behalf Of giggzounet
> Sent: Tuesday, December 07, 2010 10:32 AM
> To: ewg at lists.openfabrics.org
> Subject: Re: [ewg] Need help for Infiniband optimisation for our cluster (MTU...)
> 
> Hi,
> 
> Thx for your answer!
> 
> Particularly the explication between connected and datagram mode (I see
> that with the IMB1 benchmarks of mpi)!
> 
> The hardware we are using in details:
> - on the master: Mellanox MHGH18-XTC ConnectX with VPI adapter, single
> port 20Gb/s, PCIe2.0 x8 2.5GT/s
> - on the nodes: Integrated Mellanox DDR Infiniband 20Gbs ConnectX with
> QSFP Connector.
> 
> How can I know the limit of the MTU size ?
> 
> 
> On the Infiniband we are just using mpi with different CFD programs. But
> always with mpi (intel mpi or openmpi). Sould I use QoS ?
> 
> Thx for your help!
> 
> 
> Le 07/12/2010 16:09, Richard Croucher a écrit :
>> Connected mode will provide more throughput but datagram mode will provide
>> lower latency.  
>> You don't say what HCA's you are using.  Some of the optimizations for
>> Connected mode are only available for the newer ConnectX QDR HCA's.
>>
>> Your HCA will probably limit the MTU size.  Leave this as large as possible.
>>
>> If you are only running a single application on the InfiniBand you need not
>> bother with QoS.   If you are running multiple, then you do need to set
>> this.  This is quite complex since you need to define V'L's, their
>> arbitration policies and assign SL's to them.  This is described in the
>> OpenSM docs.  This is relevant even if you are using the embedded SM in the
>> switch.
>>
>> AS a newbie, take a look in the ../OFED/docs  
>> There is probably all you need there. Mellanox also have some useful docs on
>> their website.
>>
>> -----Original Message-----
>> From: ewg-bounces at lists.openfabrics.org
>> [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of giggzounet
>> Sent: 07 December 2010 14:01
>> To: ewg at lists.openfabrics.org
>> Subject: [ewg] Need help for Infiniband optimisation for our cluster
>> (MTU...)
>>
>> Hi,
>>
>> I'm new on this list. We have in our laboratory a little cluster:
>> - master 8 cores
>> - 8 nodes with 12 cores
>> - DDR infiniband switch Mellanox MTS3600R
>>
>> On these machines we have an oscar cluster with CentOS 5.5. We have
>> installed the ofed packages 1.5.1. The default config for the infiniband
>> is used. So infiniband is running in connected mode.
>>
>> Our cluster is used to solve CFD (Computational Fluid Dynamics)
>> problems. And I'm trying to optimize the infiniband network and so I
>> have several questions:
>>
>> - Is it the right mailing list to ask ? (if not...where should I post ?)
>>
>> - Is there a how-to on infiniband optimisation ?
>>
>> - CFD computations need a lot of bandwidth. There are a lot of data
>> exchange through MPI (we are using intel mpi). Has the infiniband mode
>> (connected or datagram) influence in this case ? What is the "best" MTU
>> for those computation ?
>>
>>
>> Best regards,
>> Guillaume
>>
>> _______________________________________________
>> ewg mailing list
>> ewg at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> 
> 
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


_______________________________________________
ewg mailing list
ewg at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg




More information about the ewg mailing list