[ofa-general] RE: [PATCH 2/3][NET_BATCH] net core use batching

Mon Oct 8 12:46:23 PDT 2007

> -----Original Message-----
> From: J Hadi Salim [mailto:j.hadi123 at gmail.com] On Behalf Of jamal
> Sent: Monday, October 08, 2007 11:27 AM
> To: David Miller
> Cc: krkumar2 at in.ibm.com; johnpol at 2ka.mipt.ru; 
> herbert at gondor.apana.org.au; kaber at trash.net; 
> shemminger at linux-foundation.org; jagana at us.ibm.com; 
> Robert.Olsson at data.slu.se; rick.jones2 at hp.com; 
> xma at us.ibm.com; gaagaan at gmail.com; netdev at vger.kernel.org; 
> rdreier at cisco.com; Waskiewicz Jr, Peter P; 
> mcarlson at broadcom.com; jeff at garzik.org; mchan at broadcom.com; 
> general at lists.openfabrics.org; kumarkr at linux.ibm.com; 
> tgraf at suug.ch; randy.dunlap at oracle.com; sri at us.ibm.com
> Subject: [PATCH 2/3][NET_BATCH] net core use batching
> 
> This patch adds the usage of batching within the core.
> 
> cheers,
> jamal

Hey Jamal,
	I still have concerns how this will work with Tx multiqueue.
The way the batching code looks right now, you will probably send a
batch of skb's from multiple bands from PRIO or RR to the driver.  For
non-Tx multiqueue drivers, this is fine.  For Tx multiqueue drivers,
this isn't fine, since the Tx ring is selected by the value of
skb->queue_mapping (set by the qdisc on {prio|rr}_classify()).  If the
whole batch comes in with different queue_mappings, this could prove to
be an interesting issue.
	Now I see in the driver HOWTO you recently sent that the driver
will be expected to loop over the list and call it's ->hard_start_xmit()
for each skb.  I think that should be fine for multiqueue, I just wanted
to see if you had any thoughts on how it should work, any performance
issues you can see (I can't think of any).  Since the batching feature
and Tx multiqueue are very new features, I'd like to make sure we can
think of any possible issues with them coexisting before they are both
mainline.

	Looking ahead for multiqueue, I'm still working on the per-queue
lock implementation for multiqueue, which I know will not work with
batching as it's designed today.  I'm still not sure how to handle this,
because it really would require the batch you send to have the same
queue_mapping in each skb, so you're grabbing the correct queue_lock.
Or, we could have the core grab all the queue locks for each
skb->queue_mapping represented in the batch.  That would block another
batch though if it had any of those queues in it's next batch before the
first one completed.  Thoughts?

Thanks Jamal,

-PJ Waskiewicz
<peter.p.waskiewicz.jr at intel.com>