[ofw] RE: [IPoIB CM] [Patch] ARP REP should be send in UD mode (connectivity issue)

Alex Naslednikov xalex at mellanox.co.il
Thu Jan 22 06:31:21 PST 2009


I believe that reducing connection timeout will partially resolve the
issue.
But RFC defines ARP REP to be sent through UD QP, and it solves the
problem.
So why do we try to debug RC flow ?

-----Original Message-----
From: Alex Estrin [mailto:alex.estrin at qlogic.com]
Sent: Thursday, January 22, 2009 4:14 PM
To: Alex Naslednikov; ofw at lists.openfabrics.org
Subject: RE: [IPoIB CM] [Patch] ARP REP should be send in UD mode
(connectivity issue)

Hello,

When Responder generates ARP REP packet and endpoint is in
IPOIB_CM_DISCONNECTED state, it will move endpoint to transition state
IPOIB_CM_CONNECT, initiate connect request for that endpoint and queue
ARP REP packet.
[Xalex] 
When connection is established (endpoint in state IPOIB_CM_CONNECTED)
ARP REP will resume through UD QP.
TCP applications start sending TCP packets immediately after it received
ARP REP, so delaying it would make sure all TCP packets go through
connected QP.
In your case I think host didn't get connect reply back.
Not sure why, but it is likely something went wrong with a path record
either locally or on remote side.
We need to look deeper in this.
Also we could probably reduce connect timeout, or connect retries so
host can reinit endpoint(on connection timeout) and on next ARP REP will
retry connect again.
Please see more notes inline.

Thanks,
Alex.

> Hello,Alex,
> Recently, I found the following problem:
> 1. Connect 2 machines B2B, run opensm, set static IPoIB adresses,
> verify ping.
> 2. Then disconnect a cable for 10-15 seconds, and connect it back 3.
> Wait for a couple of seconds for opensm to indicate that the links is
> UP, then try to ping again.
> 4. The ping now will not work
>
> Why this happens:
> 1. On the sender side, ping (ARP REQ) packet will be generated and
> sent to the responder size 2. Responder will generate ARP REP packet,
> but it will be not sent:
> in recv_mgr_filter_arp, when getting to IPOIB_CM_DISCONNECTED or
> IPOIB_CM_DISCONNECTED, the code wil return NDIS_STATUS_PENDING, and
> these ARP REPs will be queued 3. Now, CEP manager will not be able to
> restore the communication, because of no response for ARP packets :)
> 4. Sending ARP REP in UD mode will resolve this issue
>
> Patch: ARP REP should be send in UD mode Signed-off by: Alexander
> Naslednikov (xalex at mellanox.co.il)
> Index: ipoib_port.c
> ===================================================================
> --- ipoib_port.c      (revision 3775)
> +++ ipoib_port.c      (working copy)
> @@ -4098,70 +4101,12 @@
>                       return status;
>               }
>               ipoib_addr_set_qpn( &p_ib_arp->dst_hw, qpn );
> -
> -             if( p_arp->op == ARP_OP_REP &&
> -                     p_port->p_adapter->params.cm_enabled &&
> -                     p_desc->p_endpt->cm_flag == IPOIB_CM_FLAG_RC )
> -             {
> -                     cm_state_t      cm_state;
> -                     cm_state =
> -                             ( cm_state_t
> )InterlockedCompareExchange( (volatile LONG
> *)&p_desc->p_endpt->conn.state,
> -
> IPOIB_CM_CONNECT, IPOIB_CM_DISCONNECTED );
> -                     switch( cm_state )
> -                     {
> -                     case IPOIB_CM_DISCONNECTED:
> -                                     IPOIB_PRINT(
> TRACE_LEVEL_INFORMATION, IPOIB_DBG_INIT,
> -                                             ("ARP REPLY pending
> Endpt[%p] QPN %#x MAC %02x:%02x:%02x:%02x:%02x:%02x\n",
> -                                             p_desc->p_endpt,
> -                                             cl_ntoh32(
> ipoib_addr_get_qpn( &p_ib_arp->dst_hw )),
> -
> p_desc->p_endpt->mac.addr[0], p_desc->p_endpt->mac.addr[1],
> -
> p_desc->p_endpt->mac.addr[2], p_desc->p_endpt->mac.addr[3],
> -
> p_desc->p_endpt->mac.addr[4], p_desc->p_endpt->mac.addr[5] ) );
> -                                     ipoib_addr_set_sid(
> &p_desc->p_endpt->conn.service_id,
> -
> ipoib_addr_get_qpn( &p_ib_arp->dst_hw ) );
> -
> -                                     ExFreeToNPagedLookasideList(
> -
> &p_port->buf_mgr.send_buf_list, p_desc->p_buf );
> -                                     cl_qlist_insert_tail(
> &p_port->send_mgr.pending_list,
> -
> IPOIB_LIST_ITEM_FROM_PACKET( p_desc->p_pkt ) );
> -                                     NdisInterlockedInsertTailList(
> &p_port->endpt_mgr.pending_conns,
> -
> &p_desc->p_endpt->list_item,
> -
> &p_port->endpt_mgr.conn_lock );
> -                                     cl_event_signal(
> &p_port->endpt_mgr.event );

Here we add endpoint to the connecting queue and signal cm management
thread to process.
Please see __endpt_cm_mgr_thread().

> -                                     return NDIS_STATUS_PENDING;
> -                    
> -                     case IPOIB_CM_CONNECT:
> -                             /* queue ARP REP packet until connected
> */
> -                                     ExFreeToNPagedLookasideList(
> -                                     &p_port->buf_mgr.send_buf_list,
> p_desc->p_buf );
> -                                     cl_qlist_insert_tail(
> &p_port->send_mgr.pending_list,
> -
> IPOIB_LIST_ITEM_FROM_PACKET( p_desc->p_pkt ) );
> -                                     return NDIS_STATUS_PENDING;
> -                     default:
> -                             break;
> -                     }
> -             }
>       }
>       else
>       {
>               cl_memclr( &p_ib_arp->dst_hw, sizeof(ipoib_hw_addr_t) );
>       }
> -
> -#if DBG
> -     if( p_port->p_adapter->params.cm_enabled )
> -     {
> -             IPOIB_PRINT( TRACE_LEVEL_INFORMATION, IPOIB_DBG_INIT,
> -             (" ARP SEND to ENDPT[%p] State: %d flag: %#x, QPN: %#x
> MAC %02x:%02x:%02x:%02x:%02x:%02x\n",
> -                     p_desc->p_endpt,
> -                     endpt_cm_get_state( p_desc->p_endpt ),
> -                     p_desc->p_endpt->cm_flag,
> -                     cl_ntoh32( ipoib_addr_get_qpn( &p_ib_arp->dst_hw
> )),
> -                     p_desc->p_endpt->mac.addr[0],
> p_desc->p_endpt->mac.addr[1],
> -                     p_desc->p_endpt->mac.addr[2],
> p_desc->p_endpt->mac.addr[3],
> -                     p_desc->p_endpt->mac.addr[4],
> p_desc->p_endpt->mac.addr[5] ));
> -     }
> -#endif
> -
> +    
>       p_ib_arp->dst_ip = p_arp->dst_ip;
> 
>       p_desc->send_wr[0].local_ds[1].vaddr = cl_get_physaddr( p_ib_arp
);
>





More information about the ofw mailing list