[ofw] RE: [IPoIB CM] [Patch] ARP REP should be send in UD mode (connectivity issue)
Alex Naslednikov
xalex at mellanox.co.il
Thu Jan 22 07:35:03 PST 2009
Please, see inline
-----Original Message-----
From: Alex Estrin [mailto:alex.estrin at qlogic.com]
Sent: Thursday, January 22, 2009 4:44 PM
To: Alex Naslednikov; ofw at lists.openfabrics.org
Subject: RE: [IPoIB CM] [Patch] ARP REP should be send in UD mode
(connectivity issue)
> I believe that reducing connection timeout will partially resolve the
> issue.
> But RFC defines ARP REP to be sent through UD QP, and it solves the
> problem.
All ARPs are sent through UD QP. Please see __send_mgr_filter() and look
for:
case ETH_PROT_TYPE_ARP:
cl_perf_start( FilterArp );
status = __send_mgr_filter_arp(
p_port, p_eth_hdr, p_buf, buf_len, p_desc );
p_desc->send_dir = SEND_UD_QP;
cl_perf_stop( &p_port->p_adapter->perf, FilterArp );
break;
Then, later in __build_send_desc()
if( p_desc->send_dir == SEND_UD_QP )
{
p_desc->send_qp = p_port->ib_mgr.h_qp; // UD QP
...
[XaleX] Yes, you are right, queued ARP REPs will be also sent via UD QP.
But my point was not to delay such packet untill the connection
time-out will expire. And I can't understand how can we get connected
reply back in this case.
If I understood you right, you postpone the ARP response to be
sure that all TCP applications will work through RC QP.
So, we can do the following: destroy appropriate CEP objects and
recreate them again.
What's your opinion regards this ?
> So why do we try to debug RC flow ?
Connection request is issued in context of send ARP REPLY.
While processing of ARP REPLY packet itself get delayed until connection
succeed or timed out.
Thanks,
Alex.
> -----Original Message-----
> From: Alex Estrin [mailto:alex.estrin at qlogic.com]
> Sent: Thursday, January 22, 2009 4:14 PM
> To: Alex Naslednikov; ofw at lists.openfabrics.org
> Subject: RE: [IPoIB CM] [Patch] ARP REP should be send in UD mode
> (connectivity issue)
>
> Hello,
>
> When Responder generates ARP REP packet and endpoint is in
> IPOIB_CM_DISCONNECTED state, it will move endpoint to transition state
> IPOIB_CM_CONNECT, initiate connect request for that endpoint and queue
> ARP REP packet.
> [Xalex]
> When connection is established (endpoint in state IPOIB_CM_CONNECTED)
> ARP REP will resume through UD QP.
> TCP applications start sending TCP packets immediately after it
> received ARP REP, so delaying it would make sure all TCP packets go
> through connected QP.
> In your case I think host didn't get connect reply back.
> Not sure why, but it is likely something went wrong with a path record
> either locally or on remote side.
> We need to look deeper in this.
> Also we could probably reduce connect timeout, or connect retries so
> host can reinit endpoint(on connection timeout) and on next ARP REP
> will retry connect again.
> Please see more notes inline.
>
> Thanks,
> Alex.
>
> > Hello,Alex,
> > Recently, I found the following problem:
> > 1. Connect 2 machines B2B, run opensm, set static IPoIB adresses,
> > verify ping.
> > 2. Then disconnect a cable for 10-15 seconds, and connect it back 3.
> > Wait for a couple of seconds for opensm to indicate that
> the links is
> > UP, then try to ping again.
> > 4. The ping now will not work
> >
> > Why this happens:
> > 1. On the sender side, ping (ARP REQ) packet will be generated and
> > sent to the responder size 2. Responder will generate ARP
> REP packet,
> > but it will be not sent:
> > in recv_mgr_filter_arp, when getting to IPOIB_CM_DISCONNECTED or
> > IPOIB_CM_DISCONNECTED, the code wil return NDIS_STATUS_PENDING, and
> > these ARP REPs will be queued 3. Now, CEP manager will not
> be able to
> > restore the communication, because of no response for ARP packets :)
> > 4. Sending ARP REP in UD mode will resolve this issue
> >
> > Patch: ARP REP should be send in UD mode Signed-off by: Alexander
> > Naslednikov (xalex at mellanox.co.il)
> > Index: ipoib_port.c
> > ===================================================================
> > --- ipoib_port.c (revision 3775)
> > +++ ipoib_port.c (working copy)
> > @@ -4098,70 +4101,12 @@
> > return status;
> > }
> > ipoib_addr_set_qpn( &p_ib_arp->dst_hw, qpn );
> > -
> > - if( p_arp->op == ARP_OP_REP &&
> > - p_port->p_adapter->params.cm_enabled &&
> > - p_desc->p_endpt->cm_flag == IPOIB_CM_FLAG_RC )
> > - {
> > - cm_state_t cm_state;
> > - cm_state =
> > - ( cm_state_t
> > )InterlockedCompareExchange( (volatile LONG
> > *)&p_desc->p_endpt->conn.state,
> > -
> > IPOIB_CM_CONNECT, IPOIB_CM_DISCONNECTED );
> > - switch( cm_state )
> > - {
> > - case IPOIB_CM_DISCONNECTED:
> > - IPOIB_PRINT(
> > TRACE_LEVEL_INFORMATION, IPOIB_DBG_INIT,
> > - ("ARP REPLY pending
> > Endpt[%p] QPN %#x MAC %02x:%02x:%02x:%02x:%02x:%02x\n",
> > - p_desc->p_endpt,
> > - cl_ntoh32(
> > ipoib_addr_get_qpn( &p_ib_arp->dst_hw )),
> > -
> > p_desc->p_endpt->mac.addr[0], p_desc->p_endpt->mac.addr[1],
> > -
> > p_desc->p_endpt->mac.addr[2], p_desc->p_endpt->mac.addr[3],
> > -
> > p_desc->p_endpt->mac.addr[4], p_desc->p_endpt->mac.addr[5] ) );
> > - ipoib_addr_set_sid(
> > &p_desc->p_endpt->conn.service_id,
> > -
> > ipoib_addr_get_qpn( &p_ib_arp->dst_hw ) );
> > -
> > - ExFreeToNPagedLookasideList(
> > -
> > &p_port->buf_mgr.send_buf_list, p_desc->p_buf );
> > - cl_qlist_insert_tail(
> > &p_port->send_mgr.pending_list,
> > -
> > IPOIB_LIST_ITEM_FROM_PACKET( p_desc->p_pkt ) );
> > - NdisInterlockedInsertTailList(
> > &p_port->endpt_mgr.pending_conns,
> > -
> > &p_desc->p_endpt->list_item,
> > -
> > &p_port->endpt_mgr.conn_lock );
> > - cl_event_signal(
> > &p_port->endpt_mgr.event );
>
> Here we add endpoint to the connecting queue and signal cm management
> thread to process.
> Please see __endpt_cm_mgr_thread().
>
> > - return NDIS_STATUS_PENDING;
> > -
> > - case IPOIB_CM_CONNECT:
> > - /* queue ARP REP packet until
> connected
> > */
> > - ExFreeToNPagedLookasideList(
> > -
> &p_port->buf_mgr.send_buf_list,
> > p_desc->p_buf );
> > - cl_qlist_insert_tail(
> > &p_port->send_mgr.pending_list,
> > -
> > IPOIB_LIST_ITEM_FROM_PACKET( p_desc->p_pkt ) );
> > - return NDIS_STATUS_PENDING;
> > - default:
> > - break;
> > - }
> > - }
> > }
> > else
> > {
> > cl_memclr( &p_ib_arp->dst_hw,
> sizeof(ipoib_hw_addr_t) );
> > }
> > -
> > -#if DBG
> > - if( p_port->p_adapter->params.cm_enabled )
> > - {
> > - IPOIB_PRINT( TRACE_LEVEL_INFORMATION, IPOIB_DBG_INIT,
> > - (" ARP SEND to ENDPT[%p] State: %d flag: %#x, QPN: %#x
> > MAC %02x:%02x:%02x:%02x:%02x:%02x\n",
> > - p_desc->p_endpt,
> > - endpt_cm_get_state( p_desc->p_endpt ),
> > - p_desc->p_endpt->cm_flag,
> > - cl_ntoh32( ipoib_addr_get_qpn(
> &p_ib_arp->dst_hw
> > )),
> > - p_desc->p_endpt->mac.addr[0],
> > p_desc->p_endpt->mac.addr[1],
> > - p_desc->p_endpt->mac.addr[2],
> > p_desc->p_endpt->mac.addr[3],
> > - p_desc->p_endpt->mac.addr[4],
> > p_desc->p_endpt->mac.addr[5] ));
> > - }
> > -#endif
> > -
> > +
> > p_ib_arp->dst_ip = p_arp->dst_ip;
> >
> > p_desc->send_wr[0].local_ds[1].vaddr =
> cl_get_physaddr( p_ib_arp
> );
> >
>
>
>
More information about the ofw
mailing list