[ofa-general] Re: Incorrect max_sge reported in mthca device query

Tue Apr 10 00:54:33 PDT 2007

> Quoting Tom Tucker <tom at opengridcomputing.com>:
> Subject: RE: [ofa-general] Re: Incorrect max_sge reported in mthca device query
> 
> On Thu, 2007-04-05 at 09:27 -0700, Sean Hefty wrote:
> > >The challenge with the current query/request method is that as we've
> > >discussed the advertised max may not work. What makes the adjust/retry
> > >unworkable is that you don't know which of the advertised maxes caused
> > >the request to fail. So when you retry, which qp_attr do you adjust? The
> > >send sge? The recv sge? The qp depth?
> > >
> > >So what I'm proposing, and I think is similar if not identical to what
> > >other folks have talked about is having an interface that treats the
> > >qp_attr values as requested-sizes that can be adjusted by the provider.
> > >So for example, if I ask for a send_sge of 30, but you can only do 28,
> > >you give me 28 and adjust the qp_attr structure so that I know what I
> > >got. This would allow me to perform a predictable sequence of 1. query,
> > >2. request, 3. adjust in my code.
> > 
> > If the send sge/recv sge/qp depth/etc. aren't independent though, this pushes
> > the problem and policy decision down to the provider.  I can't think of an easy
> > solution to this.
> 
> Agreed. But practically I think they are. I think the SGE max is driven
> off the max size of a WR and type of QP. This is true of the iWARP
> adapters as well.  

Are you sure? For example for mthca the amount of memory you use
is proportional to #WRs * #SGEs. So they aren't really independent.

> But taking the bait...even if you didn't push it down to the provider,
> how do you expose the inter-relationships to the consumer? An approach
> in this vein is a "could_you_would_you/why_not" interface that would
> return whether or not the specified qp_attr would work and if it didn't
> some indication of which resource(s) caused the problem. The problems
> there are a) the resource may be gone when you go back with what you
> just had "approved", and b) you still have to fuss with multiple whacks
> at it if you couldn't get what you asked for.

Right.

> I think something simpler, although arguably not perfect is the way to
> go.

You also have to take into account that some #WRs/#SGEs combinations
will perform better than others. For example it's common for hardware
to assume power of 2 ring sizes, so you are wasting memory unless
you match that.

And by the way, #WRs/#SGEs isn't the only parameter that has
this problem, for example for Tavor RC QPs work better with 1K
MTU than with 2K MTU, while current apps tend to simply use
the max MTU supported.

So I think that the only sane way is to let the user actually
specify his full requirements and have the provider satisfy
them in an optimal way.

-- 
MST