[ofa-general] Re: IPOIB CM (NOSRQ) extension

Pradeep Satyanarayana pradeeps at linux.vnet.ibm.com
Mon Jun 11 11:08:47 PDT 2007


Michael S. Tsirkin wrote:
>> Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
>> Subject: IPOIB CM (NOSRQ) extension
>>
>> This patch handles the corner case of running out of RC QPs. In that
>> case it switches to UD mode. This patch can be used both by NOSRQ and
>> SRQ code.
>>
>> Signed-off-by: Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>
> 
> You don't provide any way to retry going back to connected mode,
> after a failure, which is really intermittent by nature. That's pretty bad.

This node switched to datagram mode, because the passive side was
under a resource crunch (no RC QPs). And, the user is indeed alerted
about this condition. So, yes we do not attempt to go back to connected
mode.

> 
>> ---
>>
>> --- c/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
>> 2007-06-07 11:13:55.000000000 -0400
>> +++ b/linux-2.6.22-rc3/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
>> 2007-06-07 11:11:21.000000000 -0400
>> @@ -1383,6 +1383,11 @@ static int ipoib_cm_tx_handler(struct ib
>>  		break;
>>  	case IB_CM_REQ_ERROR:
>>  	case IB_CM_REJ_RECEIVED:
>> +		ipoib_warn(priv, "REJ received\n");
>> +		neigh = tx->neigh;
>> +		if (neigh)
>> +			clear_bit(IPOIB_FLAG_OPER_UP, &neigh->cm->flags);
>> +		break;
>>  	case IB_CM_TIMEWAIT_EXIT:
>>  		ipoib_dbg(priv, "CM error %d.\n", event->event);
>>  		spin_lock_irq(&priv->tx_lock);
> 
> This has an effect of dropping down to datagram mode
> on errors such as CM timeout, or a reject due to stale connection.
> I think this is a wrong thing to do.

I can make this conditional upon there being no RC QPs. Will code that
up in the next patch.

Pradeep




More information about the general mailing list