[ofa-general] IPoIB-UD post_send failures (OFED 1.3)

Eli Cohen eli at dev.mellanox.co.il
Sun May 11 01:18:19 PDT 2008


On Sat, 2008-05-10 at 12:07 -0700, akepner at sgi.com wrote:

> I haven't been able to get any new debug data (the only way we 
> know to reproduce this one is to use a pretty large system - a 
> scarce resource), but it does look like there's a hole here, 
> since ipoib_cm.c:ipoib_cm_send() and ipoib_ib.c:ipoib_send() 
> check on different conditions (off by one) to detect a full 
> queue. 
> 
> ipoib_cm.c:ipoib_cm_send() does:
>         if (++priv->tx_outstanding == ipoib_sendq_size)
>                 netif_stop_queue(dev);
> 
> but ipoib_ib.c:ipoib_send() does:
>         if (++priv->tx_outstanding == (ipoib_sendq_size - 1)) {
>                 netif_stop_queue(dev);
> 
> So a call to ipoib_cm_send() with tx_outstanding = (ipoib_sendq_size - 2), 
> followed by a call to ipoib_send() would get to a situation where 
> the queue was full, but not stopped.
The reason why the queue is stopped when there is one entry still left
is to allow ipoib_ib_tx_timer_func() to post a special send request that
will ensure a completion is reported for this operation thus freeing
entries at the tx ring. I don't think the scenario you describe here can
lead to a deadlock since if that happens, it will be released because of
either one of the following two reasons:
1. If the tx queue contains not yet polled, more than one completion of
send WRs posted by ipoib_cm_send(), they will soon be polled since they
are posted to a signaled QP and sooner or later will generate
completions and interrupts. In this case, subsequent postings to
ipoib_send() will work as expected.

2. If there is only one outstanding ipoib_cm_send() at the tx queue, it
means that there are 126 outstanding ipoib_send() requests at the tx
queue and this means that a few of them are signaled and are expected to
be completed soon.

If you just want to make sure there is no bug in my theory you can just
use this patch:

Index: ofa_1_3_dev_kernel/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- ofa_1_3_dev_kernel.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2008-05-07 12:30:10.000000000 +0300
+++ ofa_1_3_dev_kernel/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2008-05-11 09:59:42.000000000 +0300
@@ -535,7 +535,9 @@ static inline int post_send(struct ipoib
 	} else
 		priv->tx_wr.opcode      = IB_WR_SEND;
 
-	if (unlikely((priv->tx_head & (MAX_SEND_CQE - 1)) == MAX_SEND_CQE - 1))
+	/* start forcing signaled if we get near queue full */
+	if (unlikely((priv->tx_head & (MAX_SEND_CQE - 1)) == MAX_SEND_CQE - 1) ||
+	    priv->tx_outstanding > (ipoib_sendq_size -  5))
 		priv->tx_wr.send_flags |= IB_SEND_SIGNALED;
 	else
 		priv->tx_wr.send_flags &= ~IB_SEND_SIGNALED;



And last, could you arrange a remote access to a machine in this
condition so we could check the state of the device/FW?




More information about the general mailing list