[ofa-general] Draft patch to address bugzilla bug#728

Fri Oct 12 11:22:21 PDT 2007

>  > While working on this I observed that for mthca max_srq_sge
>  > returned by ib_query_device() is not equal to max_sge returned
>  > by ib_query_srq(). Why is that?
> 
> Not sure.  I'll take a look.  What are the two values that you get?

I get 28 and 16. This is on InfiniBand: Mellanox Technologies MT23108 
InfiniHost (rev a1) HCAs that we have.

>  > +	if (IPOIB_CM_RX_SG >= max_sge_supported) {
>  > +		fragment_size	= CM_PACKET_SIZE/max_sge_supported;
>  > +		num_frags	= CM_PACKET_SIZE/fragment_size;
>  > +	} else {
>  > +		fragment_size	= CM_PACKET_SIZE/IPOIB_CM_RX_SG;
>  > +		num_frags	= IPOIB_CM_RX_SG;
>  > +	}
>  > +	order = get_order(fragment_size);
> 
> I think that if the device can't handle enough SG entries to handle
> the full CM_PACKET_SIZE with PAGE_SIZE fragments, we just have to
> reduce the size of the receive buffers.  Trying to allocate multi-page
> receive fragments (especially with GFP_ATOMIC on the receive path) is
> almost certainly going to fail once memory gets fragmented.  Lots
> of other ethernet drivers have been forced to avoid multi-page
> allocations when using jumbo frames because of serious issues observed
> in practice, so we should avoid making the same mistake.

I sort of expected that this might come up, hence the draft patch. If we
are driving the systems so hard that in steady state (i.e on the receipt 
of every packet) one may fail to allocate a handful of multi-page (read 4K
page) fragments, what will happen when one uses say Rhel5.1 where we need
to allocate only one 64K page? Won't that fail too?

Are you suggesting that we reduce the MTU to be sized according to the
PAGE_SIZE * max_num_sg supported? If that is correct, then it is a MTU vs 
memory trade-off -right?

Pradeep