[Openib-windows] Interoperability of the current stack and the Linux 1.8.X stack

Fabian Tillier ftillier at silverstorm.com
Mon Jul 10 06:57:55 PDT 2006


Hi Tzachi,

On 7/10/06, Tzachi Dar <tzachid at mellanox.co.il> wrote:
>
> Hi Fab,
>
> One of our customers has saw an issue that prevented the new code from
> joining to clusters were old Linux installations exists.

Which Linux stack is this?  Any chance the customer could upgrade to
something less broken?  More below.

> The problem is related to joining to existing broadcast groups on IPOIB.
>
> As the problem is in the old code, I have prepared a fix that is dependent
> on a registry key and will not be run by default.

Why not just fix the old code?

> As for the second issue (parameters for joining a group
> mcast_req.member_rec.mtu = 0; mcast_req.member_rec.rate = 0; ).
> This is the way that Linux gen 2 is using, so I suggest that we will always
> use it like this.
>
> Thanks
> Tzachi
>
> Index: ipoib_port.c
> ===================================================================
> --- ipoib_port.c (revision 1524)
> +++ ipoib_port.c (working copy)
> @@ -4813,7 +4813,8 @@
>   IPOIB_ENTER( IPOIB_DBG_MCAST );
>
>   /* Check that the rate is realizable for our port. */
> - if( p_port->ib_mgr.rate < (p_member_rec->rate & 0x3F) )
> + if( p_port->ib_mgr.rate < (p_member_rec->rate & 0x3F) &&
> +  (g_ipoib.bypass_check_bcast_rate == 0))

I'm confused by this change.  Why would it not be correct to check
that the local port rate is greater than or equal to the broadcast
group's rate?  This doesn't even seem like a stack issue, but rather
an SM issue.

>   {
>    /*
>     * The MC group rate is higher than our port's rate.  Log an error
> @@ -4825,7 +4826,7 @@
>     EVENT_IPOIB_BCAST_RATE, 2,
>     (uint32_t)(p_member_rec->rate & 0x3F),
>     (uint32_t)p_port->ib_mgr.rate );
> -  return IB_ERROR;
> +   return IB_ERROR;
>   }
>
>   /* Join the broadcast group. */
> @@ -5226,6 +5227,8 @@
>   mcast_req.member_rec = p_port->ib_mgr.bcast_rec;
>   /* Clear fields that aren't specified in the join */
>   mcast_req.member_rec.mlid = 0;
> + mcast_req.member_rec.mtu = 0;
> + mcast_req.member_rec.rate = 0;

I don't see why this is needed.  The MTU and rate must be the same as
the broadcast group, so this would only be a problem if the broadcast
group was returned with invalid information.  Why is this a problem?
This again seems like an SM issue.

- Fab




More information about the ofw mailing list