[ofa-general] atomic operations on ppc64

Dotan Barak dotanba at gmail.com
Thu Sep 18 18:43:29 PDT 2008


Rui Machado wrote:
> 2008/9/17 Rui Machado <ruimario at gmail.com>:
>   
>> From: Rui Machado <ruimario at gmail.com>
>> Date: 2008/9/17
>> Subject: Re: [ofa-general] atomic operations on ppc64
>> To: Dotan Barak <dotanba at gmail.com>
>>
>>
>> 2008/9/17 Dotan Barak <dotanba at gmail.com>:
>>     
>>> On Wed, Sep 17, 2008 at 5:54 PM, Rui Machado <ruimario at gmail.com> wrote:
>>>       
>>>> 2008/9/17 Dotan Barak <dotanba at gmail.com>:
>>>>         
>>>>> On Wed, Sep 17, 2008 at 5:44 PM, Rui Machado <ruimario at gmail.com> wrote:
>>>>>           
>>>>>> 2008/9/17 Dotan Barak <dotanba at gmail.com>:
>>>>>>             
>>>>>>> On Wed, Sep 17, 2008 at 5:28 PM, Rui Machado <ruimario at gmail.com> wrote:
>>>>>>>               
>>>>>>>> Hey Dotan,
>>>>>>>>
>>>>>>>> 2008/9/17 Dotan Barak <dotanba at gmail.com>:
>>>>>>>>                 
>>>>>>>>> On Wed, Sep 17, 2008 at 5:12 PM, Rui Machado <ruimario at gmail.com> wrote:
>>>>>>>>>                   
>>>>>>>>>> Hi list,
>>>>>>>>>>
>>>>>>>>>> does anyone have experienced problems using IB atomic operations
>>>>>>>>>> (fetch and add) on a ppc64 platform?
>>>>>>>>>> I tried a small example (using fetch and add) on x86 and ppc64 and on
>>>>>>>>>> x86 worked fine while on ppc64 didn't.
>>>>>>>>>>                     
>>>>>>>>> Do you handle the ntoh/hton or do you let the driver/HCA deal with it by itself?
>>>>>>>>>                   
>>>>>>>> Nop, I don't use those. I guess then I'm letting the driver/HCA deal with it....
>>>>>>>>                 
>>>>>>> Do you see endianess issues or completely corrupted data?
>>>>>>>
>>>>>>>               
>>>>>> Just to make it clear (to me :) ). I'm talking about ppc64<-->ppc64
>>>>>> communication.
>>>>>> Should I still concern with converting data because of endianess?
>>>>>> What happens is that I ask for a fetch and add and it doesn't happen.
>>>>>> The value on the server doesn't get modified.
>>>>>>             
>>>>> This is a weird behaviour indeed ..
>>>>>
>>>>> Can you post the code in your program that fill the SR?
>>>>>
>>>>> Dotan
>>>>>
>>>>>           
>>>> Not sure what do you mean by SR.
>>>> Here's is the function inc() which I call to increment 1 one the
>>>> remote machine. The remote machine has its buffer full of zeroes.
>>>> That's what the client gets all the time although I increment 3 times
>>>> in a row (with a sleep in between)
>>>>
>>>> Is this enough?
>>>> Thanks for the help
>>>>
>>>> void inc()
>>>> {
>>>>
>>>>        struct ibv_qp_attr check_attr;
>>>>        struct ibv_qp_init_attr check_init_attr;
>>>>
>>>>        void *ev_ctx;
>>>>
>>>>        struct ibv_send_wr *bad_wr;
>>>>        struct ibv_wc wc;
>>>>        struct ibv_sge slist;
>>>>        struct ibv_send_wr swr3;
>>>>
>>>>
>>>>        slist.addr = (uintptr_t)buffer;
>>>>        slist.length = 8;
>>>>        slist.lkey =mr->lkey;
>>>>
>>>>        swr3.wr.atomic.remote_addr = remote_node->mi.bufAddr;
>>>>        swr3.wr.atomic.rkey = remote_node->mi.buf_rkey;
>>>>        swr3.wr.atomic.compare_add = 1;
>>>>
>>>>        swr3.wr_id      = 1;
>>>>        swr3.sg_list    = &slist;
>>>>        swr3.num_sge    = 1;
>>>>        swr3.opcode     = IBV_WR_ATOMIC_FETCH_AND_ADD;
>>>>        swr3.send_flags = IBV_SEND_SIGNALED;
>>>>        swr3.next       = NULL;
>>>>
>>>>
>>>>        if(ibv_post_send(qp,&swr3,&bad_wr)){
>>>>                printf("Couldn't post send...\n");
>>>>                return 0;
>>>>        }
>>>>
>>>>
>>>>        int ne=0;
>>>>        do{
>>>>                ne = ibv_poll_cq(cq,1,&wc);
>>>>        }while(ne==0);
>>>>
>>>>        if((ne < 0) || (wc.status != IBV_WC_SUCCESS)){
>>>>
>>>>                //check qp status
>>>>                if(!ibv_query_qp(qp,&check_attr,IBV_QP_STATE,&check_init_attr))
>>>>                        printf("The qp state is: %d\n ",check_attr.qp_state);
>>>>
>>>>        }
>>>> }
>>>>
>>>>         
>>> The code looks good and it should work...
>>> (I would have memset every structure before using it ..)
>>>
>>>
>>> Did you check the memory in the sender side or in the reciver side?
>>>
>>>       
>> As I mentioned it does work on x86.
>>
>> Actually on both:
>>
>> server:
>> Initial counter at buffer is 0
>> counter at buffer is 0
>> counter at buffer is 0
>> counter at buffer is 0
>> counter at buffer is 0
>> counter at buffer is 0
>> counter at buffer is 0
>> counter at buffer is 0
>>
>>
>> client:
>> initial IB atomic counter 0
>> IB atomic counter 0
>> IB atomic counter 0
>> IB atomic counter 0
>>
>> What could this be related to? Driver, HW?
>>
>>     
>
> Anyone with some insight on this?
> Maybe how can I debug this further?
>   
Bugs can be anywhere: application / Driver / HW ...

Can you try to use server in x86 and client in PPC64 and then server in 
PPC64 and client in x86?

Which OFED version do you use?
Can you send the output of ibv_devinfo?

Dotan



More information about the general mailing list