***SPAM*** Re: [ofa-general] atomic operations on ppc64

Rui Machado ruimario at gmail.com
Thu Sep 25 09:29:54 PDT 2008


2008/9/26 Dotan Barak <dotanba at gmail.com>:
> Rui Machado wrote:
>>
>> 2008/9/25 Ronni Zimmermann <ronniz at mellanox.co.il>:
>>
>>>
>>> Rui Machado wrote:
>>>
>>>>>
>>>>> 2008/9/22 Ronni Zimmermann <ronniz at mellanox.co.il>:
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>> We run tests which use atomic operations (both fetch and
>>>>>>
>>>>
>>>> add and comp and swap) on PPC64 all the time, without
>>>> experiencing any problem.
>>>>
>>>>>>
>>>>>> Just to make sure I ran few simple tests, which use atomic
>>>>>>
>>>>
>>>> operations, on our PPC64 machines, both with SLES10 SP1 and
>>>> with RHAS5.1, and all of them passed.
>>>>
>>>>>>
>>>>>> I was working with the latest OFED1.4 driver and mlx4 HCA
>>>>>>
>>>>
>>>> with the latest released FW and with FW 2.3.000 (on the
>>>> SLES10 SP1 machine).
>>>>
>>>>>>
>>>>>> Given the above information I believe that there's either
>>>>>>
>>>>
>>>> a problem with your code (although looking at the code you
>>>> posted I couldn't see anything wrong) or it's an OFED1.2.5
>>>> issue, as Dotan suggested.
>>>>      OK thanks for the feedback. We have ppc64 machines with mlx4
>>>> and mthca0 (from ibv_devinfo) ) Both don't work. Any
>>>> experience with the mthca0? It is older and should be better
>>>> supported on 1.2.5 or?
>>>> My priority is the machines with the mlx4 but of course I
>>>> would like to see both working.
>>>>
>>>>
>>>
>>> Sorry, I have no experience with mthca0 on PPC64 machines.
>>> It is indeed an older HCA, but I don't know weather or not it's working
>>> properly on PPC64 with ofed 1.2.5.
>>>
>>>
>>>>
>>>> I also tried with a 2.6.26.2 kernel (had it at hand) with the same
>>>> ofed1.2.5 installation and still see the problem.
>>>> I guess my last and longest try to install the whole ofed 1.4 package.
>>>>
>>>>
>>>
>>> Please bear in mind that OFED 1.4 is RC2 and will probably be GA by the
>>> end of October.
>>> If installing a new driver on youe machine is a big problem for you, and
>>> you don't need the new features supported by ofed 1.4 and not by ofed
>>> 1.3.1, maybe it'll be better for you to install ofed 1.3.1, which is
>>> already GA.
>>>
>>>
>>
>> Actually I just tried with ofed 1.4 and still see the problem :(
>> I think I installed it correctly with a 2.6.26.2 kernel although I see
>> the warning:
>> libibverbs: Warning: couldn't load driver 'mthca': libmthca-rdmav2.so:
>> cannot open shared object file: No such file or directory
>> A small example using RDMA read is working.
>>
>> I just wanted to see if the problem exists with 1.4 even if it is a
>> RC. Probably I will install 1.3.1 when I solve this problem. And I
>> really need to solve it!
>>
>>
>
> The problem that you describes is pretty basic and even an RC shouldn't have
> this issue.
>
Agreed!

> I think that you should upgrade the HCA's Firmware. as Ronni suggested.
>

But I'm not sure about the fw version. As I mentioned, on that
Mellanox page the latest firwmare for the IBM version is 2.3.00 which
is the one I have. Or am I wrong?

> I have a feeling that the problem is in your code:
> You should access the buffer that the HCA read/write as volatile, to "tip"
> the compiler
> that this memory will be modified by other components and he shouldn't do
> any optimization
> when you want to read data from it and actually do the reading ...
>
I tried that as you said before but didn't help.
And the RDMA read works fine.
Of course, it is possible that the problem is with my code. In fact it
looks every time closer to this possibility. But can the code be in
such a way wrong that it works on x86 but not on ppc. That is what
intrigues me.

Cheers



More information about the general mailing list