[ofw] RE: [IPoIB CM] [Patch] ARP REP should be send in UD mode (connectivity issue)

Alex Estrin alex.estrin at qlogic.com
Thu Jan 22 06:14:09 PST 2009


Hello,

When Responder generates ARP REP packet and endpoint is in IPOIB_CM_DISCONNECTED state,
it will move endpoint to transition state IPOIB_CM_CONNECT, 
initiate connect request for that endpoint and queue ARP REP packet.
When connection is established (endpoint in state IPOIB_CM_CONNECTED) ARP REP will resume through UD QP.
TCP applications start sending TCP packets immediately after it received ARP REP, so delaying it 
would make sure all TCP packets go through connected QP.
In your case I think host didn't get connect reply back.
Not sure why, but it is likely something went wrong with a path record either locally or on remote side.
We need to look deeper in this. 
Also we could probably reduce connect timeout, or connect retries
so host can reinit endpoint(on connection timeout) and on next ARP REP will retry connect again.
Please see more notes inline.

Thanks,
Alex.

> Hello,Alex,
> Recently, I found the following problem:
> 1. Connect 2 machines B2B, run opensm, set static IPoIB 
> adresses, verify
> ping.
> 2. Then disconnect a cable for 10-15 seconds, and connect it back
> 3. Wait for a couple of seconds for opensm to indicate that 
> the links is
> UP, then try to ping again.
> 4. The ping now will not work
> 
> Why this happens: 
> 1. On the sender side, ping (ARP REQ) packet will be 
> generated and sent
> to the responder size
> 2. Responder will generate ARP REP packet, but it will be not sent:
> in recv_mgr_filter_arp, when getting to IPOIB_CM_DISCONNECTED or
> IPOIB_CM_DISCONNECTED, the code wil return NDIS_STATUS_PENDING,
> and these ARP REPs will be queued
> 3. Now, CEP manager will not be able to restore the communication,
> because of no response for ARP packets :)
> 4. Sending ARP REP in UD mode will resolve this issue
> 
> Patch: ARP REP should be send in UD mode
> Signed-off by: Alexander Naslednikov (xalex at mellanox.co.il)
> Index: ipoib_port.c
> ===================================================================
> --- ipoib_port.c	(revision 3775)
> +++ ipoib_port.c	(working copy)
> @@ -4098,70 +4101,12 @@
>  			return status;
>  		}
>  		ipoib_addr_set_qpn( &p_ib_arp->dst_hw, qpn );
> -
> -		if( p_arp->op == ARP_OP_REP && 
> -			p_port->p_adapter->params.cm_enabled && 
> -			p_desc->p_endpt->cm_flag == IPOIB_CM_FLAG_RC )
> -		{
> -			cm_state_t	cm_state;
> -			cm_state = 
> -				( cm_state_t
> )InterlockedCompareExchange( (volatile LONG
> *)&p_desc->p_endpt->conn.state,
> -
> IPOIB_CM_CONNECT, IPOIB_CM_DISCONNECTED );
> -			switch( cm_state )
> -			{
> -			case IPOIB_CM_DISCONNECTED:
> -					IPOIB_PRINT(
> TRACE_LEVEL_INFORMATION, IPOIB_DBG_INIT,
> -						("ARP REPLY pending
> Endpt[%p] QPN %#x MAC %02x:%02x:%02x:%02x:%02x:%02x\n",
> -						p_desc->p_endpt, 
> -						cl_ntoh32(
> ipoib_addr_get_qpn( &p_ib_arp->dst_hw )),
> -
> p_desc->p_endpt->mac.addr[0], p_desc->p_endpt->mac.addr[1],
> -
> p_desc->p_endpt->mac.addr[2], p_desc->p_endpt->mac.addr[3],
> -
> p_desc->p_endpt->mac.addr[4], p_desc->p_endpt->mac.addr[5] ) );
> -					ipoib_addr_set_sid(
> &p_desc->p_endpt->conn.service_id,
> -
> ipoib_addr_get_qpn( &p_ib_arp->dst_hw ) );
> -
> -					ExFreeToNPagedLookasideList(
> -
> &p_port->buf_mgr.send_buf_list, p_desc->p_buf );
> -					cl_qlist_insert_tail(
> &p_port->send_mgr.pending_list,
> -
> IPOIB_LIST_ITEM_FROM_PACKET( p_desc->p_pkt ) );
> -					NdisInterlockedInsertTailList(
> &p_port->endpt_mgr.pending_conns, 
> -
> &p_desc->p_endpt->list_item, 
> -
> &p_port->endpt_mgr.conn_lock );
> -					cl_event_signal(
> &p_port->endpt_mgr.event );

Here we add endpoint to the connecting queue and signal cm management thread to process.
Please see __endpt_cm_mgr_thread().

> -					return NDIS_STATUS_PENDING;
> -			
> -			case IPOIB_CM_CONNECT:
> -				/* queue ARP REP packet until connected
> */
> -					ExFreeToNPagedLookasideList(
> -					&p_port->buf_mgr.send_buf_list,
> p_desc->p_buf );
> -					cl_qlist_insert_tail(
> &p_port->send_mgr.pending_list,
> -
> IPOIB_LIST_ITEM_FROM_PACKET( p_desc->p_pkt ) );
> -					return NDIS_STATUS_PENDING;
> -			default:
> -				break;
> -			}
> -		}
>  	}
>  	else
>  	{
>  		cl_memclr( &p_ib_arp->dst_hw, sizeof(ipoib_hw_addr_t) );
>  	}
> -
> -#if DBG
> -	if( p_port->p_adapter->params.cm_enabled )
> -	{
> -		IPOIB_PRINT( TRACE_LEVEL_INFORMATION, IPOIB_DBG_INIT,
> -		(" ARP SEND to ENDPT[%p] State: %d flag: %#x, QPN: %#x
> MAC %02x:%02x:%02x:%02x:%02x:%02x\n",
> -			p_desc->p_endpt, 
> -			endpt_cm_get_state( p_desc->p_endpt ),
> -			p_desc->p_endpt->cm_flag, 
> -			cl_ntoh32( ipoib_addr_get_qpn( &p_ib_arp->dst_hw
> )),
> -			p_desc->p_endpt->mac.addr[0],
> p_desc->p_endpt->mac.addr[1],
> -			p_desc->p_endpt->mac.addr[2],
> p_desc->p_endpt->mac.addr[3],
> -			p_desc->p_endpt->mac.addr[4],
> p_desc->p_endpt->mac.addr[5] ));
> -	}
> -#endif
> -
> +	
>  	p_ib_arp->dst_ip = p_arp->dst_ip;
>  
>  	p_desc->send_wr[0].local_ds[1].vaddr = cl_get_physaddr( p_ib_arp
> );
> 


More information about the ofw mailing list