> > memcpy(ctx->bf_page + ctx->bf_offset, ctrl, align(size * 16, 64)); By the way, if we have an SQ with 32-byte WQEs, and we do blueflame from a WQE at the end of the buffer, we might end up reading off the end of the buffer. Not very likely, I guess. I wonder if memset(,0,) for the remaining bytes might be faster anyway? - R.