[ofw] [RFC] winverbs implementation approach

Tzachi Dar tzachid at mellanox.co.il
Fri Mar 14 04:20:55 PDT 2008


Hi,

I'm not sure that the tests on Linux are the same as on windows, but
here are my thoughts:
The way that our IOCTLS are implemented, it takes about 3us just from
passing from user to kernel.
We can change that to be about 0.7 us. There is also another overhead
that is involved.
Allocating memory, as far as I remember takes about 1us (including the
free). This can of course be measured again. There are also many
functions in windows / C rtl that can be used, and probably have
different overheads. As a result the allocation might not be the most
important thing.

In any case I was thinking about the implementation of the following
class:

Const int MaxSize=100;

Class allocator {
Char buffer[MaxSize];
Public:
	void *alloc(int size) {
		if (size <= MaxSize) 
			return buffer;
		else 
			return new char[size];
	}
	void free(char *p) {
		if(p!= buffer)
			delete []p;
	}
}


I believe that using such a class makes the code that is using the
allocator as simple as the code that always allocate.

Please also note that an auto cleanup can be added to the destructor of
the class (but this is probably for another mail thread).

In any case I'm fine with both solutions.

Thanks
Tzachi

> -----Original Message-----
> From: Sean Hefty [mailto:sean.hefty at intel.com] 
> Sent: Friday, March 14, 2008 1:25 AM
> To: Hefty, Sean; Tzachi Dar; Fab Tillier; ofw at lists.openfabrics.org
> Subject: RE: [ofw] [RFC] winverbs implementation approach
> 
> >I will see about avoiding the allocations.
> 
> I ran some tests on the Linux stack creating/destroying 
> rdma_cm_id's.  This involves an IOCTL to the kernel, but does 
> not allocate HW resources.  The performance difference 
> between allocating memory versus using a stack variable was 
> less than 0.5%.  (It was easiest for me to run tests on 
> Linux, so my hope is that Windows performance would be 
> similar.)  If HW resources were allocated, the difference 
> would be smaller.
> 
> Given this, I don't think it's worth the code complexity to 
> try to optimize out the memory allocation.  As an 
> alternative, I would like to consider moving the IOCTL calls 
> into the uvp itself at some unspecified future point in time, 
> which I hope will provide additional optimizations as well.
> 
> - Sean
> 
> 



More information about the ofw mailing list