[openib-general] max_send_sge < max_sge

Michael S. Tsirkin mst at mellanox.co.il
Tue Jun 27 15:38:26 PDT 2006


Quoting r. Pete Wyckoff <pw at osc.edu>:
> Subject: Re: max_send_sge < max_sge
> 
> mst at mellanox.co.il wrote on Tue, 27 Jun 2006 09:42 +0300:
> > Quoting r. Pete Wyckoff <pw at osc.edu>:
> > > Is this a known issue?
> > 
> > Yes. The fact that ibv_query_device returns some value in hca_cap can not
> > guarantee that ibv_create_qp with these parameters will succeed. For
> > example, system administrator might have imposed a limit on the amount of
> > memory you can pin down, and you will get ENOMEM.
> 
> I was hoping to get a guaranteed maximum number from ibv_query_device so that
> I would know that calls to ibv_create_qp would not fail due to my asking for
> too many CQ entries.  My code has some idea of how many it wants (16), and
> compares that to the hca_cap values to settle for what it can get.  I only
> happened to notice that 30 wouldn't work even though it was so claimed when
> debugging.

Ah. I see. Unfortunately I don't think ibv_query_device currently provides this
guarantee, and its not something easy to fix.  What are you doing of the hca cap
is below the values you want?  Also, please see below for ideas about extending
the API in a way that might be useful to you.

> > > Should I always subtract 1 from the reported max on the send side?  Just
> > > for this hardware?
> > 
> > Unless you use it, passing the absolute maximum value supported by hardware
> > does not seem, to me, to make sense - it will just slow you down, and waste
> > resources.  Is there a protocol out there that actually has a use for 30
> > sge?
> 
> Perhaps I don't understand what is more resource-costly about using
> 29 sge when they are supported by the hardware.

Well, more SGEs per WR does mean more resources are consumed for the same
amount of WRs per QP. OK?

> I'm using them on the send side to avoid having to either:
>     1.  memcpy 29 little buffers into one big buffer
> or
>     2.  send 29 rdma writes instead of a single rdma write with 29 sges
> The buffer on the receiver is contiguous and big enough to hold
> everything.

Its the same thing. Seems I'm not being clear.  I was just saying that large SGE
and WR values have cost so one should use a smallest SGE and WR numbers
that still give good performance, not maximum thinkable values. But
you probably know this :)

> > In my opinion, for the application to be robust it has to either use small
> > values that empirically work on most systems, or be able to scale down to
> > require less resources if an allocation fails.
> 
> Scale down?  So if ibv_create_qp fails, you think I should look at
> the return value (which is NULL, not ENOMEM or EINVAL or anything
> informative), and then gradually reduce the values for max_recv_sge,
> max_send_sge, max_recv_wr, max_send_wr, max_inline_data below the
> reported HCA maximum until I find something that works?

Well, if there's no bug I see no reason for ibv_create_qp to fail except that
you are asking for too much WRs/SGEs. So yes, the trick you describe will work
I think.

At some point, I tried to think about extending the API in such a way that
verbs like ibv_create_qp would round the parameters down to
whatever does work. Would something like this be useful to you?
Further, if the given SGE/WR pair can't be satisfied, will you want to scale
down the number of SGEs or the number of WRs?

> I'll subtract 1 from the hca_cap.max_sge for Mellanox hardware
> before doing the comparison against how many SGEs I'd like to get.
> Otherwise I can't see much alternative to trusting the hca_cap
> values that are returned.

If this works for you, great. I was just trying to point out query device can
not guarantee that QP allocaton will always succeed even if you stay within
limits it reports.

For example, are you using a large number of WRs per QP as well?  If so after
alocating a couple of QPs you might run out of locked memory limit allowed
per-user, depending on your system setup. QP allocation will then fail, even if
you use the hcacap - 1 heuristic.

-- 
MST




More information about the general mailing list