[Openib-windows] Handling a few arp sent together
Fabian Tillier
ftillier at silverstorm.com
Mon Aug 7 16:33:26 PDT 2006
Hi Tzachi,
On 8/7/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
>
> Hi Fab,
>
> While doing some tests on SDP and iperf which involved 5 simultaneous
> connections, I have found out that there is a problem that might cause
> connections to fail from time to time.
>
> It seems that the problem was caused because the function ipoib_mac_to_gid
> was not always returning the dest GID.
> It seems that the problem is caused in the function __recv_arp. The main
> problem is that when a new end point is accepted, we try to check that the
> end point that we have is still valid. The comment says:
> /*
> * If the endpoint exists for the GID, make sure
> * the dlid and qpn match the arp.
> */
> and later in the code we check the src_hw.gid and dlid and the qp number. My
> problem starts from the fact that if this is only arp that is done the dlid
> will be 0 (we take it from the callback of the query path record) and
> therefore the endpoint is removed. A few lines later such an end point is
> created, and inserted again, however there is a time window that such an
> entry doesn't exist (and therfore there is no answer to the query).
>
> I have tried to replace the check:
>
> else if( (((*pp_src)->dlid != p_wc->recv.ud.remote_lid) ||
> (*pp_src)->qpn != p_wc->recv.ud.remote_qp) )
> with
>
> else if( (((*pp_src)->dlid != p_wc->recv.ud.remote_lid) &&
> ((*pp_src)->dlid != 0) ||
> (*pp_src)->qpn != p_wc->recv.ud.remote_qp) )
> and it seems to solve my problem. Please note that the exact endpoint will
> be created only a few lines bellow.
>
> So, do you see any problem with the solution proposed?
That makes perfect sense. This is a fallout of delaying the LID
assignment, good catch.
I've applied this in revision 445. Please let me know if you see any
issues with it.
Thanks,
- Fab
More information about the ofw
mailing list