[ofa-general] Re: [PATCH] libmlx4: avoid memcpy in blueflame post_sends

Jack Morgenstein jackm at dev.mellanox.co.il
Sun Jan 13 00:01:04 PST 2008


On Thursday 10 January 2008 20:08, Roland Dreier wrote:
>  > However, your solution still results in a procedure call (mlx4_bf_copy
>  > is compiled as a procedure using gcc 4.1.0 on an X86_64 host, even if
>  > I add "inline").
> 
> Can you give more detail on the platform and how you compiled?  I
> can't reproduce it with gcc 4.1.3 here.  Are you compiling with
> optimization enabled?  Are other things like set_atomic_seg() getting
> inlined properly?
> 
>  > I would prefer the patch below (which does generate inline code, and does the
>  > (sizeof(unsigned long) * 2) calculation just once).
> 
> Dividing by 2 * sizeof (long) seems to generate slightly worse code
> for me.  Since sizeof (long) is a compile time constant, in my version
> the compiler just generates a sub $10, while in your version there is
> a sub $1 instead (which costs the same) plus an extra right shift at
> the beginning of the loop.
> 
>  - R.

Your implementation is the better one.
I did not notice at the time, but I had evidently
(a long time ago) defined CFLAGS in the local bash environment to just an include
path.  The result was that this local env variable was used (instead of the correct one)
when the makefile was generated at installation time -- so the -O2 flag was absent.

Once I corrected this, the code from your patch was properly generated, and I could
see that the result was better than the patch I proposed.

Let's go with your patch.

- Jack




More information about the general mailing list