[ofw] RE: [IPoIB CM] [Patch] ARP REP should be send in UD mode (connectivity issue)

Alex Naslednikov xalex at mellanox.co.il
Thu Jan 22 07:35:03 PST 2009


Please, see inline 

-----Original Message-----
From: Alex Estrin [mailto:alex.estrin at qlogic.com] 
Sent: Thursday, January 22, 2009 4:44 PM
To: Alex Naslednikov; ofw at lists.openfabrics.org
Subject: RE: [IPoIB CM] [Patch] ARP REP should be send in UD mode
(connectivity issue)

> I believe that reducing connection timeout will partially resolve the 
> issue.
> But RFC defines ARP REP to be sent through UD QP, and it solves the 
> problem.

All ARPs are sent through UD QP. Please see __send_mgr_filter() and look
for:

	case ETH_PROT_TYPE_ARP:
		cl_perf_start( FilterArp );
		status = __send_mgr_filter_arp(
			p_port, p_eth_hdr, p_buf, buf_len, p_desc );
		p_desc->send_dir = SEND_UD_QP;
		cl_perf_stop( &p_port->p_adapter->perf, FilterArp );
		break;

Then, later in __build_send_desc()

		if( p_desc->send_dir == SEND_UD_QP )
		{
			p_desc->send_qp = p_port->ib_mgr.h_qp; // UD QP
...

[XaleX] Yes, you are right, queued ARP REPs will be also sent via UD QP.
	But my point was not to delay such packet untill the connection
time-out will expire. And I can't understand how can we get connected
reply back in this case.
	If I understood you right, you postpone the ARP response to be
sure that all TCP applications will work through RC QP.
	So, we can do the following: destroy appropriate CEP objects and
recreate them again.
	What's your opinion regards this ?

> So why do we try to debug RC flow ?

Connection request is issued in context of send ARP REPLY. 
While processing of ARP REPLY packet itself get delayed until connection
succeed or timed out.

Thanks,
Alex.

> -----Original Message-----
> From: Alex Estrin [mailto:alex.estrin at qlogic.com]
> Sent: Thursday, January 22, 2009 4:14 PM
> To: Alex Naslednikov; ofw at lists.openfabrics.org
> Subject: RE: [IPoIB CM] [Patch] ARP REP should be send in UD mode 
> (connectivity issue)
> 
> Hello,
> 
> When Responder generates ARP REP packet and endpoint is in 
> IPOIB_CM_DISCONNECTED state, it will move endpoint to transition state

> IPOIB_CM_CONNECT, initiate connect request for that endpoint and queue

> ARP REP packet.
> [Xalex]
> When connection is established (endpoint in state IPOIB_CM_CONNECTED) 
> ARP REP will resume through UD QP.
> TCP applications start sending TCP packets immediately after it 
> received ARP REP, so delaying it would make sure all TCP packets go 
> through connected QP.
> In your case I think host didn't get connect reply back.
> Not sure why, but it is likely something went wrong with a path record

> either locally or on remote side.
> We need to look deeper in this.
> Also we could probably reduce connect timeout, or connect retries so 
> host can reinit endpoint(on connection timeout) and on next ARP REP 
> will retry connect again.
> Please see more notes inline.
> 
> Thanks,
> Alex.
> 
> > Hello,Alex,
> > Recently, I found the following problem:
> > 1. Connect 2 machines B2B, run opensm, set static IPoIB adresses, 
> > verify ping.
> > 2. Then disconnect a cable for 10-15 seconds, and connect it back 3.
> > Wait for a couple of seconds for opensm to indicate that
> the links is
> > UP, then try to ping again.
> > 4. The ping now will not work
> >
> > Why this happens:
> > 1. On the sender side, ping (ARP REQ) packet will be generated and 
> > sent to the responder size 2. Responder will generate ARP
> REP packet,
> > but it will be not sent:
> > in recv_mgr_filter_arp, when getting to IPOIB_CM_DISCONNECTED or 
> > IPOIB_CM_DISCONNECTED, the code wil return NDIS_STATUS_PENDING, and 
> > these ARP REPs will be queued 3. Now, CEP manager will not
> be able to
> > restore the communication, because of no response for ARP packets :)

> > 4. Sending ARP REP in UD mode will resolve this issue
> >
> > Patch: ARP REP should be send in UD mode Signed-off by: Alexander 
> > Naslednikov (xalex at mellanox.co.il)
> > Index: ipoib_port.c
> > ===================================================================
> > --- ipoib_port.c      (revision 3775)
> > +++ ipoib_port.c      (working copy)
> > @@ -4098,70 +4101,12 @@
> >                       return status;
> >               }
> >               ipoib_addr_set_qpn( &p_ib_arp->dst_hw, qpn );
> > -
> > -             if( p_arp->op == ARP_OP_REP &&
> > -                     p_port->p_adapter->params.cm_enabled &&
> > -                     p_desc->p_endpt->cm_flag == IPOIB_CM_FLAG_RC )
> > -             {
> > -                     cm_state_t      cm_state;
> > -                     cm_state =
> > -                             ( cm_state_t
> > )InterlockedCompareExchange( (volatile LONG 
> > *)&p_desc->p_endpt->conn.state,
> > -
> > IPOIB_CM_CONNECT, IPOIB_CM_DISCONNECTED );
> > -                     switch( cm_state )
> > -                     {
> > -                     case IPOIB_CM_DISCONNECTED:
> > -                                     IPOIB_PRINT(
> > TRACE_LEVEL_INFORMATION, IPOIB_DBG_INIT,
> > -                                             ("ARP REPLY pending
> > Endpt[%p] QPN %#x MAC %02x:%02x:%02x:%02x:%02x:%02x\n",
> > -                                             p_desc->p_endpt,
> > -                                             cl_ntoh32(
> > ipoib_addr_get_qpn( &p_ib_arp->dst_hw )),
> > -
> > p_desc->p_endpt->mac.addr[0], p_desc->p_endpt->mac.addr[1],
> > -
> > p_desc->p_endpt->mac.addr[2], p_desc->p_endpt->mac.addr[3],
> > -
> > p_desc->p_endpt->mac.addr[4], p_desc->p_endpt->mac.addr[5] ) );
> > -                                     ipoib_addr_set_sid(
> > &p_desc->p_endpt->conn.service_id,
> > -
> > ipoib_addr_get_qpn( &p_ib_arp->dst_hw ) );
> > -
> > -                                     ExFreeToNPagedLookasideList(
> > -
> > &p_port->buf_mgr.send_buf_list, p_desc->p_buf );
> > -                                     cl_qlist_insert_tail(
> > &p_port->send_mgr.pending_list,
> > -
> > IPOIB_LIST_ITEM_FROM_PACKET( p_desc->p_pkt ) );
> > -                                     NdisInterlockedInsertTailList(
> > &p_port->endpt_mgr.pending_conns,
> > -
> > &p_desc->p_endpt->list_item,
> > -
> > &p_port->endpt_mgr.conn_lock );
> > -                                     cl_event_signal(
> > &p_port->endpt_mgr.event );
> 
> Here we add endpoint to the connecting queue and signal cm management 
> thread to process.
> Please see __endpt_cm_mgr_thread().
> 
> > -                                     return NDIS_STATUS_PENDING;
> > -                    
> > -                     case IPOIB_CM_CONNECT:
> > -                             /* queue ARP REP packet until 
> connected
> > */
> > -                                     ExFreeToNPagedLookasideList(
> > -                                     
> &p_port->buf_mgr.send_buf_list,
> > p_desc->p_buf );
> > -                                     cl_qlist_insert_tail(
> > &p_port->send_mgr.pending_list,
> > -
> > IPOIB_LIST_ITEM_FROM_PACKET( p_desc->p_pkt ) );
> > -                                     return NDIS_STATUS_PENDING;
> > -                     default:
> > -                             break;
> > -                     }
> > -             }
> >       }
> >       else
> >       {
> >               cl_memclr( &p_ib_arp->dst_hw,
> sizeof(ipoib_hw_addr_t) );
> >       }
> > -
> > -#if DBG
> > -     if( p_port->p_adapter->params.cm_enabled )
> > -     {
> > -             IPOIB_PRINT( TRACE_LEVEL_INFORMATION, IPOIB_DBG_INIT,
> > -             (" ARP SEND to ENDPT[%p] State: %d flag: %#x, QPN: %#x
> > MAC %02x:%02x:%02x:%02x:%02x:%02x\n",
> > -                     p_desc->p_endpt,
> > -                     endpt_cm_get_state( p_desc->p_endpt ),
> > -                     p_desc->p_endpt->cm_flag,
> > -                     cl_ntoh32( ipoib_addr_get_qpn( 
> &p_ib_arp->dst_hw
> > )),
> > -                     p_desc->p_endpt->mac.addr[0],
> > p_desc->p_endpt->mac.addr[1],
> > -                     p_desc->p_endpt->mac.addr[2],
> > p_desc->p_endpt->mac.addr[3],
> > -                     p_desc->p_endpt->mac.addr[4],
> > p_desc->p_endpt->mac.addr[5] ));
> > -     }
> > -#endif
> > -
> > +    
> >       p_ib_arp->dst_ip = p_arp->dst_ip;
> > 
> >       p_desc->send_wr[0].local_ds[1].vaddr =
> cl_get_physaddr( p_ib_arp
> );
> >
> 
> 
> 



More information about the ofw mailing list