[Users] infiniband rdma poor transfer bw

Susan Coulter markus at lanl.gov
Mon Aug 27 08:59:20 PDT 2012


Gaetano,
This is the right list for such a question - unfortunately I am just learning verbs programming myself, so cannot address your question.
There are others on the list that have much more experience - hopefully someone will respond.


On Aug 23, 2012, at 4:51 PM, Gaetano Mendola wrote:

> Hi all,
> I'm sorry in advance if this is not the right mailing list for my question.
> 
> In my application I use an infiniband infrastructure to send a stream
> of data from a server to
> another one. I have used to easy the development ip over infiniband
> because I'm more familiar
> with socket programming. Until now the performance (max bw) was good
> enough for me (I knew
> I wasn't getting the maximum bandwith achievable), now I need to get
> out from that infiniband
> connection more bandwidth.
> 
> ib_write_bw claims that my max achievable bandwidth is around 1500
> MB/s (I'm not getting
> 3000MB/s because my card is installed in a PCI 2.0 8x).
> 
> So far so good. I coded my communication channel using ibverbs and
> rdma but I'm getting far
> less than the bandwith I can get, I'm even getting a bit less
> bandwidth than using socket but
> at least my application doesn't use any CPU power:
> 
> ib_write_bw: 1500 MB/s
> 
> sockets: 700 MB/s <= One core of my system is at 100% during this test
> 
> ibvers+rdma: 600 MB/s <= No CPU is used at all during this test
> 
> It seems that the bottleneck is here:
> 
> ibv_sge sge;
> sge.addr = (uintptr_t)memory_to_transfer;
> sge.length = memory_to_transfer_size;
> sge.lkey = memory_to_transfer_mr->lkey;
> 
> ibv_send_wr wr;
> memset(&wr, 0, sizeof(wr));
> wr.wr_id = 0;
> wr.opcode = IBV_WR_RDMA_WRITE;
> wr.sg_list = &sge;
> wr.num_sge = 1;
> wr.send_flags = IBV_SEND_SIGNALED;
> wr.wr.rdma.remote_addr = (uintptr_t)thePeerMemoryRegion.addr;
> wr.wr.rdma.rkey = thePeerMemoryRegion.rkey;
> 
> ibv_send_wr *bad_wr = NULL;
> if (ibv_post_send(theCommunicationIdentifier->qp, &wr, &bad_wr) != 0) {
>  notifyError("Unable to ibv post receive");
> }
> 
> at this point the next code waiting for completation that is:
> 
> //Wait for completation
> ibv_cq *cq;
> void* cq_context;
> if (ibv_get_cq_event(theCompletionEventChannel, &cq, &cq_context) != 0) {
>  notifyError("Unable to get a ibv cq event");
> }
> 
> ibv_ack_cq_events(cq, 1);
> 
> if (ibv_req_notify_cq(cq, 0) != 0) {
>  notifyError("Unable to get a req notify");
> }
> 
> ibv_wc wc;
> int myRet = ibv_poll_cq(cq, 1, &wc);
> if (myRet > 1) {
>  LOG(WARNING) << "Got more than a single ibv_wc, expecting one";
> }
> 
> 
> The time from my ibv_post_send and when ibv_get_cq_event returns an
> event is 13.3ms when transfering chuncks of 8 MB achieving then around 600 MB/s.
> 
> To specify more (in pseudocode what I do globally):
> 
> Active Side:
> 
> post a message receive
> rdma connection
> wait for rdma connection event
> <<at this point transfer tx flow starts>>
> start:
> register memory containing bytes to transfer
> wait remote memory region addr/key ( I wait for a ibv_wc)
> send data with ibv_post_send
> post a message receive
> wait for ibv_post_send event ( I wait for a ibv_wc) (this lasts 13.3 ms)
> send message "DONE"
> unregister memory
> goto start
> 
> Passive Side:
> 
> post a message receive
> rdma accept
> wait for rdma connection event
> <<at this point transfer rx flow starts>>
> start:
> register memory that has to receive the bytes
> send addr/key of memory registered
> wait "DONE" message
> unregister memory
> post a message receive
> goto start
> 
> Does anyone knows what I'm doing wrong? Or what I can improve? I'm not
> affected by
> "Not Invented Here" syndrome so I'm even open to throw away what I
> have done until
> now and adopting something else.
> 
> I only need a point to point contiguous transfer.
> 
> 
> Regards
> Gaetano Mendola
> 
> 
> --
> cpp-today.blogspot.com
> _______________________________________________
> Users mailing list
> Users at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users

====================================

Susan Coulter
HPC-3 Network/Infrastructure
505-667-8425
Increase the Peace...
An eye for an eye leaves the whole world blind
====================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20120827/a57e7c74/attachment.html>


More information about the Users mailing list