[ofa-general] atomic operations on ppc64

Rui Machado ruimario at gmail.com
Fri Sep 19 06:16:08 PDT 2008


2008/9/19 Dotan Barak <dotanba at gmail.com>:
> Rui Machado wrote:
>>
>> 2008/9/17 Rui Machado <ruimario at gmail.com>:
>>
>>>
>>> From: Rui Machado <ruimario at gmail.com>
>>> Date: 2008/9/17
>>> Subject: Re: [ofa-general] atomic operations on ppc64
>>> To: Dotan Barak <dotanba at gmail.com>
>>>
>>>
>>> 2008/9/17 Dotan Barak <dotanba at gmail.com>:
>>>
>>>>
>>>> On Wed, Sep 17, 2008 at 5:54 PM, Rui Machado <ruimario at gmail.com> wrote:
>>>>
>>>>>
>>>>> 2008/9/17 Dotan Barak <dotanba at gmail.com>:
>>>>>
>>>>>>
>>>>>> On Wed, Sep 17, 2008 at 5:44 PM, Rui Machado <ruimario at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> 2008/9/17 Dotan Barak <dotanba at gmail.com>:
>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 17, 2008 at 5:28 PM, Rui Machado <ruimario at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hey Dotan,
>>>>>>>>>
>>>>>>>>> 2008/9/17 Dotan Barak <dotanba at gmail.com>:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 17, 2008 at 5:12 PM, Rui Machado <ruimario at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi list,
>>>>>>>>>>>
>>>>>>>>>>> does anyone have experienced problems using IB atomic operations
>>>>>>>>>>> (fetch and add) on a ppc64 platform?
>>>>>>>>>>> I tried a small example (using fetch and add) on x86 and ppc64
>>>>>>>>>>> and on
>>>>>>>>>>> x86 worked fine while on ppc64 didn't.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Do you handle the ntoh/hton or do you let the driver/HCA deal with
>>>>>>>>>> it by itself?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Nop, I don't use those. I guess then I'm letting the driver/HCA
>>>>>>>>> deal with it....
>>>>>>>>>
>>>>>>>>
>>>>>>>> Do you see endianess issues or completely corrupted data?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Just to make it clear (to me :) ). I'm talking about ppc64<-->ppc64
>>>>>>> communication.
>>>>>>> Should I still concern with converting data because of endianess?
>>>>>>> What happens is that I ask for a fetch and add and it doesn't happen.
>>>>>>> The value on the server doesn't get modified.
>>>>>>>
>>>>>>
>>>>>> This is a weird behaviour indeed ..
>>>>>>
>>>>>> Can you post the code in your program that fill the SR?
>>>>>>
>>>>>> Dotan
>>>>>>
>>>>>>
>>>>>
>>>>> Not sure what do you mean by SR.
>>>>> Here's is the function inc() which I call to increment 1 one the
>>>>> remote machine. The remote machine has its buffer full of zeroes.
>>>>> That's what the client gets all the time although I increment 3 times
>>>>> in a row (with a sleep in between)
>>>>>
>>>>> Is this enough?
>>>>> Thanks for the help
>>>>>
>>>>> void inc()
>>>>> {
>>>>>
>>>>>       struct ibv_qp_attr check_attr;
>>>>>       struct ibv_qp_init_attr check_init_attr;
>>>>>
>>>>>       void *ev_ctx;
>>>>>
>>>>>       struct ibv_send_wr *bad_wr;
>>>>>       struct ibv_wc wc;
>>>>>       struct ibv_sge slist;
>>>>>       struct ibv_send_wr swr3;
>>>>>
>>>>>
>>>>>       slist.addr = (uintptr_t)buffer;
>>>>>       slist.length = 8;
>>>>>       slist.lkey =mr->lkey;
>>>>>
>>>>>       swr3.wr.atomic.remote_addr = remote_node->mi.bufAddr;
>>>>>       swr3.wr.atomic.rkey = remote_node->mi.buf_rkey;
>>>>>       swr3.wr.atomic.compare_add = 1;
>>>>>
>>>>>       swr3.wr_id      = 1;
>>>>>       swr3.sg_list    = &slist;
>>>>>       swr3.num_sge    = 1;
>>>>>       swr3.opcode     = IBV_WR_ATOMIC_FETCH_AND_ADD;
>>>>>       swr3.send_flags = IBV_SEND_SIGNALED;
>>>>>       swr3.next       = NULL;
>>>>>
>>>>>
>>>>>       if(ibv_post_send(qp,&swr3,&bad_wr)){
>>>>>               printf("Couldn't post send...\n");
>>>>>               return 0;
>>>>>       }
>>>>>
>>>>>
>>>>>       int ne=0;
>>>>>       do{
>>>>>               ne = ibv_poll_cq(cq,1,&wc);
>>>>>       }while(ne==0);
>>>>>
>>>>>       if((ne < 0) || (wc.status != IBV_WC_SUCCESS)){
>>>>>
>>>>>               //check qp status
>>>>>
>>>>> if(!ibv_query_qp(qp,&check_attr,IBV_QP_STATE,&check_init_attr))
>>>>>                       printf("The qp state is: %d\n
>>>>> ",check_attr.qp_state);
>>>>>
>>>>>       }
>>>>> }
>>>>>
>>>>>
>>>>
>>>> The code looks good and it should work...
>>>> (I would have memset every structure before using it ..)
>>>>
>>>>
>>>> Did you check the memory in the sender side or in the reciver side?
>>>>
>>>>
>>>
>>> As I mentioned it does work on x86.
>>>
>>> Actually on both:
>>>
>>> server:
>>> Initial counter at buffer is 0
>>> counter at buffer is 0
>>> counter at buffer is 0
>>> counter at buffer is 0
>>> counter at buffer is 0
>>> counter at buffer is 0
>>> counter at buffer is 0
>>> counter at buffer is 0
>>>
>>>
>>> client:
>>> initial IB atomic counter 0
>>> IB atomic counter 0
>>> IB atomic counter 0
>>> IB atomic counter 0
>>>
>>> What could this be related to? Driver, HW?
>>>
>>>
>>
>> Anyone with some insight on this?
>> Maybe how can I debug this further?
>>
>
> Bugs can be anywhere: application / Driver / HW ...
>
> Can you try to use server in x86 and client in PPC64 and then server in
> PPC64 and client in x86?
>
> Which OFED version do you use?
> Can you send the output of ibv_devinfo?
>
> Dotan
>

I tried the combination ppc64-x86 and x86-ppc64.
The result was a hang on the client side on poll (see code above) on both cases.
Could endianess be playing a role here? This is the first time I try
to put different architectures to communicate.

I have ofed 1.2.5.5

ibv_devinfo (x86 machine used for the ppc64-x86 communication)

hca_id: mthca0
        fw_ver:                         1.2.0
        node_guid:                      0002:c902:0021:b820
        sys_image_guid:                 0002:c902:0021:b823
        vendor_id:                      0x02c9
        vendor_part_id:                 25204
        hw_ver:                         0xA0
        board_id:                       MT_03B0110001
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 142
                        port_lid:               134
                        port_lmc:               0x01

ibv_devinfo (ppc64)

hca_id: mlx4_0
        fw_ver:                         2.3.000
        node_guid:                      0002:c903:0000:9334
        sys_image_guid:                 0002:c903:0000:9337
        vendor_id:                      0x02c9
        vendor_part_id:                 25418
        hw_ver:                         0xA0
        board_id:                       IBM08A0000001
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 142
                        port_lid:               68
                        port_lmc:               0x01

                port:   2
                        state:                  PORT_DOWN (1)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00


Cheers,



More information about the general mailing list