[Openib-windows] [PATCH] 2 new vendor calls

Fab Tillier ftillier at silverstorm.com
Mon Sep 26 09:50:55 PDT 2005


> From: Leonid Keller [mailto:leonid at mellanox.co.il]
> Sent: Sunday, September 25, 2005 2:24 AM
> 
> > From: Fab Tillier [mailto:ftillier at silverstorm.com]
> > Sent: Friday, September 23, 2005 9:22 AM
> >
> > > From: Leonid Keller [mailto:leonid at mellanox.co.il]
> > > Sent: Thursday, September 22, 2005 1:09 AM
> > >
> > > You are right about security and resource tracking: i saw
> > > these problems, but didn't find a simple solution.
> > > The right solution IMO would have been to add the context
> > > of FW_MAP_CRSPACE
> > > operation to the process resource tracking information and
> > > to perform
> > > FW_UNMAP_CRSPACE automatically upon process termination.
> > >
> > > Then we could remove the context from FW_MAP_CRSPACE and not provide
> > > FW_UNMAP_CRSPACE at all.
> >
> > You still want to allow FW_UNMAP_CRSPACE, rather than relying
> > on process termination to drive that.
>
> ???
> Or my English is wrong or i'm missing here something.
> I thought, that i've suggested to perform the unmapping on process
termination,
> which would allow us not to expose FW_UNMAP_CRSPACE at all.

Right - I'm saying you need both a FW_UNMAP_CRSPACE command so that the app can
undo a previous FW_MAP_CRSPACE command, as well as properly trapping and
cleaning things up in case of an abnormal termination.

> Is it right thing to do - is another question.
> We have a lot of paired APIs like FW_OPEN_IF/FW_CLOSE_IF or
> create_qp/destroy_qp.
> Why shouldn't we provide here also a paired one ?

We should provide a paired call.  I think we're in agreement here.

> > The cleanup in case of abnormal process usage is
> > necessary but shouldn't be the norm.
>
> Yes, but, first, we has to do it there anyway, second, you suggest below to do
> it on CA cleanup anyway.

It must be done on CA cleanup if it hasn't been done already.  The normal case
is that nothing special happens on CA cleanup because the app was well behaved.

> > > Your idea of putting the context into CA object seams to me
> > > troublesome:
> > >         - it prevents several applications to use
> > > simultaneously this mechanism (on the same HCA);
> >
> > Do we want to allow multiple applications simultaneous access
> > to the FW?  I
> > don't think so - it creates the possibility for two different
> > sessions trying to
> > update the FW simultaneously, which could result in a corrupt
> > FW image.
>
> It's - responsibility of the customers (see more reasons below).
> We have to warn them wrt that.
> [BTW, customer can burn into HCA a corrupt or wrong image - it's a normal and
> not too dangerous situation.
> He can then fix that by burning the card with one application and a right
> image]

I agree that customers have to be aware of what they're doing with respect to FW
update.  I don't think there's any reason to allow multiple apps to do FW update
simultaneously - there is no practical reason, and not preventing it because the
user should know better isn't right.

> > The FW access should be serialized to a single client at a time.
> A burning application is just one tool.
> There is DevMon, which is mainly used to look at CR space, but allows also to
> change it.
> There is also trace.exe, which reads debug prints of the FW from the card.
> These two applications are highly used by developers and sometimes
> simultaneously.

In that case, I think we need two different MAP_CRSPACE commands - one to map
read-only, one to map read/write.  This will allow multiple applications to read
the CR space, but not modify it simultaneously.

To protect the mappings, we can either define a second command, or add a
parameter to the MAP_CRSPACE command.  Use MmProtectMdlSystemAddress to provide
access control.

It's probably best to define two commands, so that we can eventually transition
these to two different WMI requests, and allow setting different access control
policies on both.  For example, writing to the CR space is something only
administrators should be allowed to do.  Reading form it is probably safe for
everyone.

> There are or can be also another card management applications, some of them
> having additional equirements like mapping DDR or even UAR.
> We are talking about tools for developers.
> Customer will get a small subset of them.
> That's why I don't think, that we may restrict the access to one application
> at a time.

I think write access must be serialized.  We're going to be in multi-user
environments, and shouldn't allow users to inadvertently screw one another up.

> > >         - if an application was killed and restarted, there
> > > will be a memory leak.
> >
> > No, when an application is killed, it's CA object is cleaned
> > up due to the
> > resource tracking.  If that CA object tracks the FW CR space
> > mapping, then that
> > can get cleaned up automatically.
>
> Nice, just we need to do it in the resource tracking code for not to impose
> the above restriction.
> 
> > > I don't know enough how the resource tracking mechaism
> > > works today and whether
> > > there is a simple way to implement the above solution.
> > >
> > > Could you enlight me on this ?
> >
> > Currently, the user's CA object (the mlnx_um_ca_t) isn't
> > passed into the calls.
> > We probably need to change things so that any user's call to
> > ib_open_ca results
> > in some call to the HCA driver to establish a user context.
> > Currently, this
> > only happens for user-mode clients,
>
> Are our tools applications not user-mode clients ?

Yes, they are.  However, the mlnx_um_ca_t object isn't input to the CI call
interface, so the HCA driver has no idea which client is calling it.

> > and that user-mode context isn't used for
> > any of the other verbs.  If we can fix things to use a
> > per-user CA context in
> > all the verbs calls, then resource cleanup will be handled in
> > the proxy.
>
> Can we pass mlnx_um_ca_t structure to the vendor call ?
> If it isn't too complicate, then i'd suggest the following "ultimate"
solution,
> which takes in mind future extensions and various ideas, i got so far:
>    1. User interface
>         - we provide both FW_MAP_CRSPACE and FW_UNMAP_CRSPACE calls;

Agreed.

>         - they will have the same parameter structure:
>                 struct _FW_MAP_PARAMS {
>                                 PVOID __ptr64 va;       /* mapped address; OUT
> for map, IN for UNMAP */
>                         IN      ULONG   bar_num;        /* 0 - CR, 1 - UAR, 2
> - DDR */
>                         IN      ULONG offset;   /* map/unmap from this offset
> */

What requirements are there on offset?  Must it be page aligned?  8-byte
aligned?

>                         IN      ULONG   size;           /* map/unmap this size
> */
>                 };
>         - return codes:
>                 STATUS_NOT_IMPLEMENTED          - for UAR and DDR space (for
> now);
>                 STATUS_BUFFER_TOO_SMALL         - not enough parameters;
>                 STATUS_INVALID PARAMETERS       - wrong parameters;
>                 STATUS_NONE_MAPPED              - unmap parameters do not
> match map ones;

Would users be able to map multiple regions of the same space?  I think
splitting the map/unmap parameters probably makes more sense, as well as
splitting the input/output parameters.

For FW_MAP_CRSPACE, I would like to see the following:

struct FW_MAP_CRSPACE_IN
{
	ULONG		BarNum;
	ULONG		Offset;
	ULONG		Size;
}

struct FW_MAP_CRSPACE_OUT
{
	VOID		*pVa;
}

For FW_UNMAP_CRSPACE, I would like to see the following as input parameter:

struct FW_UNMAP_CRSPACE_IN
{
	VOID		*pVa;
}

There would be no output parameter.  The input VA would be used to find the
correct MDL that was used to map to that address, and wouldn't ever be
dereferenced.  A bogus VA would just result in a STATUS_INVALID_PARAMETER return
value, with no adverse effects.

Lastly, is this generic enough capability that it should be exposed through a
standard verb call, or is this too HW vendor specific?  The advantage to making
it a standard verb call is that the access layer would know how to properly
track it.

>    2. Implementation
>         - we add necessary fielsd to mlnx_um_ca_t;
>         - the structure is passed to FW_MAP_CRSPACE/FW_UNMAP_CRSPACE;
>         - FW_MAP_CRSPACE store the context; FW_UNMAP_CRSPACE use it for
> unmapping;
>         - we add resource tracking code, that performs cleanup on appl.
> termination.
> What do you think ?

This sounds like the right approach.  Probably the first thing to do is to make
the changes required so that the mappings can be tracked and cleaned up on app
termination.  Right now, IBAL provides client multiplexing to the HCA.  Moving
that logic into the HCA would allow the HCA driver to track additional things as
it sees fit (such as in our case the CR space mappings).

This involves changing how the HCA is opened internally to IBAL, as well as
changes to the ib_open_ca code path.

- Fab




More information about the ofw mailing list