[ewg] Need help for Infiniband optimisation for our cluster (MTU...)

Richard Croucher richard at informatix-sol.com
Tue Dec 7 07:09:05 PST 2010


Connected mode will provide more throughput but datagram mode will provide
lower latency.  
You don't say what HCA's you are using.  Some of the optimizations for
Connected mode are only available for the newer ConnectX QDR HCA's.

Your HCA will probably limit the MTU size.  Leave this as large as possible.

If you are only running a single application on the InfiniBand you need not
bother with QoS.   If you are running multiple, then you do need to set
this.  This is quite complex since you need to define V'L's, their
arbitration policies and assign SL's to them.  This is described in the
OpenSM docs.  This is relevant even if you are using the embedded SM in the
switch.

AS a newbie, take a look in the ../OFED/docs  
There is probably all you need there. Mellanox also have some useful docs on
their website.

-----Original Message-----
From: ewg-bounces at lists.openfabrics.org
[mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of giggzounet
Sent: 07 December 2010 14:01
To: ewg at lists.openfabrics.org
Subject: [ewg] Need help for Infiniband optimisation for our cluster
(MTU...)

Hi,

I'm new on this list. We have in our laboratory a little cluster:
- master 8 cores
- 8 nodes with 12 cores
- DDR infiniband switch Mellanox MTS3600R

On these machines we have an oscar cluster with CentOS 5.5. We have
installed the ofed packages 1.5.1. The default config for the infiniband
is used. So infiniband is running in connected mode.

Our cluster is used to solve CFD (Computational Fluid Dynamics)
problems. And I'm trying to optimize the infiniband network and so I
have several questions:

- Is it the right mailing list to ask ? (if not...where should I post ?)

- Is there a how-to on infiniband optimisation ?

- CFD computations need a lot of bandwidth. There are a lot of data
exchange through MPI (we are using intel mpi). Has the infiniband mode
(connected or datagram) influence in this case ? What is the "best" MTU
for those computation ?


Best regards,
Guillaume

_______________________________________________
ewg mailing list
ewg at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg




More information about the ewg mailing list