[Openib-windows] Handling a few arp sent together

Fabian Tillier ftillier at silverstorm.com
Mon Aug 7 16:33:26 PDT 2006


Hi Tzachi,

On 8/7/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
>
> Hi Fab,
>
> While doing some tests on SDP and iperf which involved 5 simultaneous
> connections, I have found out that there is a problem that might cause
> connections to fail from time to time.
>
> It seems that the problem was caused because the function ipoib_mac_to_gid
> was not always returning the dest GID.
> It seems that the problem is caused in the function __recv_arp. The main
> problem is that when a new end point is accepted, we try to check that the
> end point that we have is still valid. The comment says:
>  /*
>   * If the endpoint exists for the GID, make sure
>   * the dlid and qpn match the arp.
>   */
> and later in the code we check the src_hw.gid and dlid and the qp number. My
> problem starts from the fact that if this is only arp that is done the dlid
> will be 0 (we take it from the callback of the query path record) and
> therefore the endpoint is removed. A few lines later such an end point is
> created, and inserted again, however there is a time window that such an
> entry doesn't exist (and therfore there is no answer to the query).
>
> I have tried to replace the check:
>
>   else if( (((*pp_src)->dlid != p_wc->recv.ud.remote_lid)  ||
>    (*pp_src)->qpn != p_wc->recv.ud.remote_qp) )
> with
>
>   else if( (((*pp_src)->dlid != p_wc->recv.ud.remote_lid)  &&
> ((*pp_src)->dlid != 0) ||
>    (*pp_src)->qpn != p_wc->recv.ud.remote_qp) )
> and it seems to solve my problem. Please note that the exact endpoint will
> be created only a few lines bellow.
>
> So, do you see any problem with the solution proposed?

That makes perfect sense.  This is a fallout of delaying the LID
assignment, good catch.

I've applied this in revision 445.  Please let me know if you see any
issues with it.

Thanks,

- Fab




More information about the ofw mailing list