[ofa-general] [Bug report / partial patch] OFED 1.3 send max_sge lower than reported by ib_query_device

Tom Tucker tom at opengridcomputing.com
Wed Sep 26 19:06:45 PDT 2007


FWIW, I have code in my apps that retries QP creation with reduced
values when the allocation with max fails. 

There was also an earlier e-mail thread on this exact same issue, but
the "solution" bantered about was to use special values in the qp_attr
structure ala QP_MAX_SEND_SGE (-1?). The provider would recognize this
value and allocate the max for that attribute that would succeed given
the current resource situation. The qp_attr structure would then be
updated by the provider with the values given. This approach extends,
but doesn't break the API, allows existing apps to work as usual, and
avoids the retry logic that I've added to my apps.

Just a thought,
Tom

On Wed, 2007-09-26 at 20:41 -0500, Jim Mott wrote:
> The same bug exists with mthca.  I saw it originally in the kernel doing RDS work, but I just put together a short user space test.
> 
> ibv_query_device(MT25204) returns max_sge=30
>   - ibv_create_qp with qp_attr.cap.max_send_sge = dev_attr.max_sge fails
>   - ibv_create_qp with qp_attr.cap.max_send_sge = dev_attr.max_sge-1 works
> 
> I only have the two types of adapters to test with.
> -----Original Message-----
> From: Roland Dreier [mailto:rdreier at cisco.com] 
> Sent: Wednesday, September 26, 2007 5:32 PM
> To: Jim Mott
> Cc: general at lists.openfabrics.org
> Subject: Re: [ofa-general] [Bug report / partial patch] OFED 1.3 send max_sge lower than reported by ib_query_device
> 
>  > A minimal API change that could help would be to add two new fields
>  > to ib_device_attr structure returned by ib_query_device:
>  >   - delta_sge_sg
>  >   - delta_sge_rd
> 
> Hmm, a cute idea but I'm still left wondering if it's worth the ABI
> breakage etc just to give a few more S/G entries in some situations.
> 
>  > The behavior would be that in all cases using max_sge for send or
>  > receive SGE count in create_qp would always succeed.  That means
>  > the current value the drivers return there would have to be reduced
>  > to fix this bug.  All existing codes would continue to run.
> 
> Actually are there any drivers other than patched mlx4 where max_sge
> doesn't always work?  I agree we do want to get this right, but I
> thought we had fixed all such bugs.  (And we should make sure that any
> "shrinking WQE" patch for mlx4 doesn't introduce new bugs)
> 
> (BTW I see a different bug in unpatched mlx4, namely that it might
> report a too-big number of S/G entries allowed for the SQ)
> 
>  > It looks like there is some movement in this direction already
>  > with the fields:
>  >   - max_sge_rd (nes, amso1100, ehca, cxgb3 only)
> 
> This field is obsolete, since we don't handle RD and almost certainly
> never will.  I'm not sure why anyone is setting a value.
> 
>  >   - max_srq_sge (amso1100, mthca, mlx4, ehca, ipath only)
> 
> Any devices that handle SRQ should set this.  I think cxgb3 does not
> support SRQ.
> 
>  - R.
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list