[Openib-windows] Using of fast mutexes in WinIb

Fab Tillier ftillier at silverstorm.com
Thu Nov 17 13:42:19 PST 2005


> From: Leonid Keller [mailto:leonid at mellanox.co.il]
> Sent: Thursday, November 17, 2005 6:32 AM
> 
> > From: Fab Tillier [mailto:ftillier at silverstorm.com]
> > Sent: Wednesday, November 16, 2005 7:24 PM
> >
> > Hi Leo,
> >
> > > From: Leonid Keller [mailto:leonid at mellanox.co.il]
> > > Sent: Tuesday, November 15, 2005 7:32 AM
> > >
> > > Hi Fab,
> > > I come across the following problem: implementation of
> > > cl_mutex_acquire() via Fast Mutexes causes all the code in critical
> > > section to work at APC_LEVEL.
> > >
> > > The first case i saw, was at the start-up of IpoIb driver,
> > > which takes mutex in __ipoib_pnp_cb and makes all MTHCA driver
> > > control verbs to work at APC_LEVEL, which is troublesome.
> > > (e.g., create_cq calls AllocateCommonBuffer, which requires
> > > PASSIVE_LEVEL).
> >
> > Why does create_cq call AllocateCommonBuffer?  Are you making multiple
> > page-sized calls instead of a single larger call for the
> > cases where the memory required spans multiple pages?  Physically
> > contiguous memory is a scarce resource, so any time you can break
> > up your requests into page sized requests the better.
> 
> The algorithm of allocating tries to allocate one contiguous buffer.
> If it fails, it requests lesser buffers. In the worst case it will
> allocate N buffers 1 page size each.

Are there performance advantages to allocating a contiguous buffer, or does it
not make a difference?  We should try to avoid requiring contiguous memory all
together if it doesn't have a performance benefit.

> I used AllocateCommonBuffer in the first time, because it returns bus
> addresses.

Ok, I see.  This doesn't help user-mode clients, though - their buffers are
allocated and must be registered.  I think it would be simpler to combine both
kernel and user logic so that it is similar.  The issues of proper DMA mappings
for memory registrations need to be solved no matter how things are allocated in
the kernel, and once we have a solution for user-mode, it will also apply to
kernel mode.

> One can implement that for the work at DISPATCH_LEVEL:
> 	va = MmAllocateContiguousMemorySpecifyCache(...);
> 	p_mdl = IoAllocateMdl( va, ...);
> 	MmBuildMdlForNonPagedPool( p_mdl );
> 	la = p_adapter->MapTransfer(adapter, p_mdl ...);
> 
> It has 2 little drawbacks as far as i see:
> 	1) MmAllocateContiguousMemorySpecifyCache allocates always an
> integer number of pages;
> 	2) MapTransfer fails, when the number of map registers gets
> exceeded. (But maybe AllocateCommonBuffer will also fail in this case)
> 
> What do you think ?

Again, if contiguous memory doesn't have a performance benefit, we can just use
ExAllocatePoolWithTag, followed with IoAllocateMdl and MmBuildMdlForNonPagedPool
like in your example above.  For user-mode, the only additional step would be a
call to MmProbeAndLockPages since the memory would be pagable.

Since the HCAs are all 64-bit hardware, I don't think we need to use
MapTransfer.  I think we should use GetScatterGatherList instead, as it more
closely provides the functionality we need.

Note that both MapTransfer and GetScatterGatherList (as well as any DMA mapping
functions) will not work properly if Driver Verifier is enabled to check DMA
usage.  Driver verifier breaks any driver that depends on sharing the buffer.
In this case, AllocateCommonBuffer is the only way that I know of, but leave
user-mode out of luck.

So given that we need to support user-mode, we need to find a solution to how to
properly DMA map long-term memory registrations without Driver Verifier breaking
things.  I'm looking into this with Microsoft and will report back any findings.

- Fab




More information about the ofw mailing list