[Openib-windows] A major patch

Leonid Keller leonid at mellanox.co.il
Thu Jun 8 03:59:02 PDT 2006


See below 

> -----Original Message-----
> From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com] 
> On Behalf Of Fabian Tillier
> Sent: Thursday, June 08, 2006 12:18 AM
> To: Leonid Keller
> Cc: openib-windows at openib.org
> Subject: Re: [Openib-windows] A major patch
> 
> Hi Leo,
> 
> On 6/6/06, Leonid Keller <leonid at mellanox.co.il> wrote:
> >
> > Hi Fab,
> >     I've sync'ed our repository with Openib, because of a lot of 
> > changes, accumulated.
> > Some of the patches were intersecting with changes, introducing for 
> > FMR support, so i've added also the FMR patch.
> 
> I wish you wouldn't have, but so be it.  In the future, 
> please back out the stuff that isn't critical.  We're trying 
> to stablize the stack for WHQL now, and adding new features 
> like FMR should not take priority over other bugs, like 
> support for mixed rate fabrics.  I would much rather have 
> seen support for CQ resize than FMR, as that is actually used by WSD.
> 
> Perhaps your internal development for new features should 
> happen on branches so that the new features can be 
> incorporated at the appropriate time without creating 
> interdependencies with bug fixes.
> 
> > I'm waiting for you comments to the changes in IBAL in FMR 
> patch, if any.
> 
> The code looks good, thanks.  There are minor formatting 
> issues that I'll go through and fix.
> 
> Note that I am going to rename the functions from ib_xxx_fmr 
> to mlnx_xxx_fmr, because we're not dealing with IB standard 
> FRM support, and when we do it will just create confusion.  
> It needs to be clear that FMR support as you implemented it 
> is really a vendor specific extension to work around memory 
> registration performance problems, not a IB spec standard verb.
> 
> >
> > Here are the comment to the sync, i've performed:
> >
> >
> > [MTHCA, IBAL]
> >
> >     added FMR support;
> >
> > [MTHCA]
> >
> >     1. fixed (and now works) "livefish" support;
> >
> >     2. fixed (and now works) multiple HCA support;
> >
> >     3. support of work of 32-bit tools with 64-bit kernel;
> >
> >     4. support *bad_wr parameter in post/recv verbs as optional;
> >
> >     5. make the wait on a command completion alertable for user 
> > processes;
> 
> What happens when an operation wakes up due to an alert?  I 
> assume you then resume the wait?

No, and seems right wrong. But ...

To recall, it's a wait on a command, sent to the HCA card.
The real reason for that change was to facilitate cancelling of user
applications by Ctrl-C.

The original solution is:
	1. to wait in KernelMode with timeout in non-alertable state.

Other probable soutions are:
	2. to wait in UserMode with timeout in non-alertable state. On
alert return an error and exit.
		it's usually allowable only for the highest-level
drivers;
 
	3. to wait in KernelMode with timeout in alertable state. On
alert resume the wait. 
		it can cause an executing of an APC with a racing
contents, e.g. the thread is waiting on a command during create_qp,
while APC performs destroy of all the thread resources.

I tend to return to the original solution as a more robust.
What to you think ?


> 
> Thanks,
> 
> - Fab
> 
> 
> 




More information about the ofw mailing list