[ofa-general] Re: [PATCH] libmlx4: avoid memcpy in blueflame post_sends

Roland Dreier rdreier at cisco.com
Thu Jan 10 10:08:38 PST 2008


 > However, your solution still results in a procedure call (mlx4_bf_copy
 > is compiled as a procedure using gcc 4.1.0 on an X86_64 host, even if
 > I add "inline").

Can you give more detail on the platform and how you compiled?  I
can't reproduce it with gcc 4.1.3 here.  Are you compiling with
optimization enabled?  Are other things like set_atomic_seg() getting
inlined properly?

 > I would prefer the patch below (which does generate inline code, and does the
 > (sizeof(unsigned long) * 2) calculation just once).

Dividing by 2 * sizeof (long) seems to generate slightly worse code
for me.  Since sizeof (long) is a compile time constant, in my version
the compiler just generates a sub $10, while in your version there is
a sub $1 instead (which costs the same) plus an extra right shift at
the beginning of the loop.

 - R.





More information about the general mailing list