[ofa-general] Re: [PATCH] libmlx4: avoid memcpy in blueflame post_sends
Roland Dreier
rdreier at cisco.com
Thu Jan 10 10:08:38 PST 2008
> However, your solution still results in a procedure call (mlx4_bf_copy
> is compiled as a procedure using gcc 4.1.0 on an X86_64 host, even if
> I add "inline").
Can you give more detail on the platform and how you compiled? I
can't reproduce it with gcc 4.1.3 here. Are you compiling with
optimization enabled? Are other things like set_atomic_seg() getting
inlined properly?
> I would prefer the patch below (which does generate inline code, and does the
> (sizeof(unsigned long) * 2) calculation just once).
Dividing by 2 * sizeof (long) seems to generate slightly worse code
for me. Since sizeof (long) is a compile time constant, in my version
the compiler just generates a sub $10, while in your version there is
a sub $1 instead (which costs the same) plus an extra right shift at
the beginning of the loop.
- R.
More information about the general
mailing list