[ofa-general] Multiports single HCA uDAPL program problem

Davis, Arlin R arlin.r.davis at intel.com
Mon Feb 2 10:01:32 PST 2009


 
>One more problem happened when trying to establish 1 connection per 
>rail, as illustrated
>in the graph.
>
>          node0                    node1
>rail0: psp0 <----------------> ep0         (port 0 on hca)
>rail1: psp1 <----------------> ep1         (port 1 on hca)
>
>rail0 got connected first and connection are always stable and correct.
>However rail1 sometime connected properly sometime doesn't.
>Following is the error message:
>
>11836 Waiting for connect response
>11836 Error unexpected conn event : 
>DAT_CONNECTION_EVENT_NON_PEER_REJECTED
>11836 Error connect_ep: DAT_ABORT
>
>The program establishes the connection for both rail exactly the same.
>What may caused this?

rdma_cm is rejecting the connect request. Turn on warnings for more information:

 export DAPL_DBG_TYPE=0x0003

-arlin



More information about the general mailing list