[ofa-general] Draft patch to address bugzilla bug#728

Roland Dreier rdreier at cisco.com
Fri Oct 12 14:15:17 PDT 2007


 > >  > While working on this I observed that for mthca max_srq_sge
 > >  > returned by ib_query_device() is not equal to max_sge returned
 > >  > by ib_query_srq(). Why is that?
 > > 
 > > Not sure.  I'll take a look.  What are the two values that you get?
 > 
 > I get 28 and 16. This is on InfiniBand: Mellanox Technologies MT23108 
 > InfiniHost (rev a1) HCAs that we have.

I don't see anything in the mthca code that could cause this.  As far
as I can see, the SRQ code just returns the same limit that the
consumer passes in.  Are you sure you're not just seeing the effect of
your code that picks the largest power of two less than the max_sg
limit you get from the driver?

 > >  > +	if (IPOIB_CM_RX_SG >= max_sge_supported) {
 > >  > +		fragment_size	= CM_PACKET_SIZE/max_sge_supported;
 > >  > +		num_frags	= CM_PACKET_SIZE/fragment_size;
 > >  > +	} else {
 > >  > +		fragment_size	= CM_PACKET_SIZE/IPOIB_CM_RX_SG;
 > >  > +		num_frags	= IPOIB_CM_RX_SG;
 > >  > +	}
 > >  > +	order = get_order(fragment_size);
 > > 
 > > I think that if the device can't handle enough SG entries to handle
 > > the full CM_PACKET_SIZE with PAGE_SIZE fragments, we just have to
 > > reduce the size of the receive buffers.  Trying to allocate multi-page
 > > receive fragments (especially with GFP_ATOMIC on the receive path) is
 > > almost certainly going to fail once memory gets fragmented.  Lots
 > > of other ethernet drivers have been forced to avoid multi-page
 > > allocations when using jumbo frames because of serious issues observed
 > > in practice, so we should avoid making the same mistake.
 > 
 > I sort of expected that this might come up, hence the draft patch. If we
 > are driving the systems so hard that in steady state (i.e on the receipt 
 > of every packet) one may fail to allocate a handful of multi-page (read 4K
 > page) fragments, what will happen when one uses say Rhel5.1 where we need
 > to allocate only one 64K page? Won't that fail too?

Order 0 allocations don't fail because of fragmentation, so it should
actually work better.  But I guess there's a reason RH is giving up on
64K pages for now.

 > Are you suggesting that we reduce the MTU to be sized according to the
 > PAGE_SIZE * max_num_sg supported? If that is correct, then it is a MTU vs 
 > memory trade-off -right?

Yes, reduce the MTU to the largets receive buffer that we can handle
with PAGE_SIZE fragments.  It's not really a tradeoff between memory
and MTU -- more a tradeoff between MTU and working better on real
systems where memory gets framented.

 - R.



More information about the general mailing list