[libfabric-users] Fwd: Optimisation Tips for verbs provider

Thu Sep 8 12:18:17 PDT 2016

Valentino,

You can run the fi_msg_bw test with other completion-waiting options like sread and fd too. I don’t get any variance when using them as well. The test calls fi_cq_read after posting every window size number of sends/recvs. The window size is an adjustable parameter. You can view all available options by calling fi_msg_bw -h.

May be you could try posting a bunch of sends/recvs and then collect the completions for the bunch. Is there a need to post the messages one by one? If that’s the case please try using spin wait for getting the completion. But even that might not guarantee consistent numbers. Currently, there is an issue of unpredictable transfer times if the sender overruns the receiver. Posting recvs in advance can avoid this issue.

-Arun.

From: Libfabric-users [mailto:libfabric-users-bounces at lists.openfabrics.org] On Behalf Of Valentino Picotti
Sent: Thursday, September 08, 2016 5:12 AM
To: libfabric-users at lists.openfabrics.org
Subject: [libfabric-users] Fwd: Optimisation Tips for verbs provider

I forgot to CC the list, here is my reply to Arun:

---------- Forwarded message ----------
From: Valentino Picotti <valentino.picotti at gmail.com<mailto:valentino.picotti at gmail.com>>
Date: 8 September 2016 at 14:00
Subject: Re: [libfabric-users] Optimisation Tips for verbs provider
To: "Ilango, Arun" <arun.ilango at intel.com<mailto:arun.ilango at intel.com>>

Thanks for the reply,

I run the fi_msg_bw with CQ size and window size of 1 and I got the following result:

bytes   iters   total       time     MB/sec    usec/xfer   Mxfers/sec

512k    1m      488g      189.50s   2766.76     189.50       0.01
21,61 Gbps is an excellent result. I'm using libfabric 1.3.0 from the latest tarball.

So the problem is in my transport layer.
My fabric initialisation doesn't differs too much from the fi_msg_bw one, so the problem might be in the main loop.
At a first glance, It seems that i call fi_cq_read less often than the bw test.
In the test the sequence is:
- post work request ft_post_tx/rx
- spin on a completion bw_(tx/rx)_comp

In my client/server main loop:
- call fi_cq_read
- post work request

I don't spin waiting for completions, could this be the reason?

Thanks,
Valentino

On 7 September 2016 at 19:17, Ilango, Arun <arun.ilango at intel.com<mailto:arun.ilango at intel.com>> wrote:
Hi Valentino,

Libfabric has a set of tests available at https://github.com/ofiwg/fabtests. Can you run the fi_msg_bw test with the same size and iterations on your setup and check if you notice any variance? Also what version/commit number of libfabric are you using?

Thanks,
Arun.

From: Libfabric-users [mailto:libfabric-users-bounces at lists.openfabrics.org<mailto:libfabric-users-bounces at lists.openfabrics.org>] On Behalf Of Valentino Picotti
Sent: Wednesday, September 07, 2016 7:48 AM
To: libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>
Subject: [libfabric-users] Optimisation Tips for verbs provider

Hi all,

I apologies in advance for the long email.

In the past month I've integrated libfabric in a project based on infiniband verbs with the aim to be provider independent. This project has a transport layer that makes the application independent from the transport implementation (that is chosen at compile time).
I worked only on the libfabric implementation of the transport layer and this was my first experience with RDMA APIs and hardware. What I did was to map the various ibv_* and rdma_* calls to fi_* calls and I got a working layer quite easily (after studying the libfabric terminology).
Now I'm trying to achieve the same performance of raw verbs.
I'm testing the transport layer with a one sided communication where a client sends the data to a server with the message API(fi_send/fi_recv). The client and the server run on two different nodes connected with one IB EDR link: i don't set processor affinity nor change power management policy. The depth of completion queues and the size of sent buffers are the same across the tests.
Running on the verbs transport layer I get a stable bandwidth of 22 Gbps, instead with libfabric over verbs I get a very floating bandwidth: from 0.4 Gbps to 19 Gbps in the same test[1]. The bandwidth is calculated as the number of buffers sent every 5 seconds.

This is how i setup the verbs provider:

  m_hints->caps = FI_MSG;
  m_hints->mode = FI_LOCAL_MR;
  m_hints->ep_attr->type = FI_EP_MSG;
  m_hints->domain_attr->threading = FI_THREAD_COMPLETION;
  m_hints->domain_attr->data_progress = FI_PROGRESS_MANUAL;
  m_hints->domain_attr->resource_mgmt = FI_RM_DISABLED;
  m_hints->fabric_attr->prov_name = strdup("verbs");

Furthermore I bind two completion queues to the endpoints: one with FI_SEND flag and the other with FI_RECV.

I can't figure out why I'm getting that high variance with libfabric.
Do you have any idea? I'm missing same optimisations tips for the verbs provider?

Thanks in advance,

Valentino

[1] Test run with depth queue of 1 and buffer size of 512KB

Example of a test output with libfarbic:

2016-09-07 - 15:10:56 t_server: INFO: Accepted connection

2016-09-07 - 15:10:56 t_server: INFO: Start receiving...

2016-09-07 - 15:11:01 t_server: INFO: Bandwith: 8.3324 Gb/s

2016-09-07 - 15:11:06 t_server: INFO: Bandwith: 15.831 Gb/s

2016-09-07 - 15:11:11 t_server: INFO: Bandwith: 19.1713 Gb/s

2016-09-07 - 15:11:16 t_server: INFO: Bandwith: 10.8825 Gb/s

2016-09-07 - 15:11:21 t_server: INFO: Bandwith: 8.07991 Gb/s

2016-09-07 - 15:11:26 t_server: INFO: Bandwith: 15.4015 Gb/s

2016-09-07 - 15:11:31 t_server: INFO: Bandwith: 20.4263 Gb/s

2016-09-07 - 15:11:36 t_server: INFO: Bandwith: 19.7023 Gb/s

2016-09-07 - 15:11:41 t_server: INFO: Bandwith: 10.474 Gb/s

2016-09-07 - 15:11:46 t_server: INFO: Bandwith: 17.4072 Gb/s

2016-09-07 - 15:11:51 t_server: INFO: Bandwith: 0.440402 Gb/s

2016-09-07 - 15:11:56 t_server: INFO: Bandwith: 2.73217 Gb/s

2016-09-07 - 15:12:01 t_server: INFO: Bandwith: 0.984822 Gb/s

2016-09-07 - 15:12:06 t_server: INFO: Bandwith: 2.93013 Gb/s

2016-09-07 - 15:12:11 t_server: INFO: Bandwith: 0.847248 Gb/s

2016-09-07 - 15:12:16 t_server: INFO: Bandwith: 7.72255 Gb/s

2016-09-07 - 15:12:21 t_server: INFO: Bandwith: 14.7849 Gb/s

2016-09-07 - 15:12:26 t_server: INFO: Bandwith: 12.9243 Gb/s

2016-09-07 - 15:12:31 t_server: INFO: Bandwith: 0.687027 Gb/s

2016-09-07 - 15:12:36 t_server: INFO: Bandwith: 1.44787 Gb/s

2016-09-07 - 15:12:41 t_server: INFO: Bandwith: 2.681 Gb/s

Example of a test output with raw verbs:

2016-09-07 - 16:36:00 t_server: INFO: Accepted connection

2016-09-07 - 16:36:00 t_server: INFO: Start receiving...

2016-09-07 - 16:36:05 t_server: INFO: Bandwith: 17.9491 Gb/s

2016-09-07 - 16:36:10 t_server: INFO: Bandwith: 23.4671 Gb/s

2016-09-07 - 16:36:15 t_server: INFO: Bandwith: 23.0368 Gb/s

2016-09-07 - 16:36:20 t_server: INFO: Bandwith: 22.9638 Gb/s

2016-09-07 - 16:36:25 t_server: INFO: Bandwith: 22.8203 Gb/s

2016-09-07 - 16:36:30 t_server: INFO: Bandwith: 20.058 Gb/s

2016-09-07 - 16:36:35 t_server: INFO: Bandwith: 22.5033 Gb/s

2016-09-07 - 16:36:40 t_server: INFO: Bandwith: 20.1754 Gb/s

2016-09-07 - 16:36:45 t_server: INFO: Bandwith: 22.5578 Gb/s

2016-09-07 - 16:36:50 t_server: INFO: Bandwith: 20.0588 Gb/s

2016-09-07 - 16:36:55 t_server: INFO: Bandwith: 22.2718 Gb/s

2016-09-07 - 16:37:00 t_server: INFO: Bandwith: 22.494 Gb/s

2016-09-07 - 16:37:05 t_server: INFO: Bandwith: 23.1836 Gb/s

2016-09-07 - 16:37:10 t_server: INFO: Bandwith: 23.0972 Gb/s

2016-09-07 - 16:37:15 t_server: INFO: Bandwith: 21.5033 Gb/s

2016-09-07 - 16:37:20 t_server: INFO: Bandwith: 18.5506 Gb/s

2016-09-07 - 16:37:25 t_server: INFO: Bandwith: 20.3709 Gb/s

2016-09-07 - 16:37:30 t_server: INFO: Bandwith: 21.3457 Gb/s

2016-09-07 - 16:37:35 t_server: INFO: Bandwith: 20.5059 Gb/s

2016-09-07 - 16:37:40 t_server: INFO: Bandwith: 22.4899 Gb/s

2016-09-07 - 16:37:45 t_server: INFO: Bandwith: 22.1266 Gb/s

2016-09-07 - 16:37:50 t_server: INFO: Bandwith: 22.4504 Gb/s

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20160908/79989fab/attachment.html>