[openib-general] IPoIB broadcast MC group membership

amith rajith mamidala mamidala at cse.ohio-state.edu
Wed Feb 22 11:47:01 PST 2006


I have a question related to the mixture of HCA bandwidths
in the fabric. For an upper layer, like MVAPICH, "negotiating"
for a rate so that all the ports are "involved" can be quite
expensive, especially if the code falls in the critical path.
Can any additional support be provided by the underlying
SA interface, so that the upper protocal layers can do the job
in a minimum time. This kind of support can be used not only for
MPI but for other stacks as well,

Thanks,
Amith

On Wed, 22 Feb 2006, Fabian Tillier wrote:

> On 2/22/06, Greg Lindahl <lindahl at pathscale.com> wrote:
> >
> > On Tue, Feb 21, 2006 at 11:40:53PM -0800, Fabian Tillier wrote:
> >
> > > You'd have to make the group 1X.  Note that the group being 1X doesn't
> > > limit unicast traffic to 1X rates, since the rate for unicast traffic
> > > would be set based on the rate reported in the path records for the
> > > various endpoints.
> > >
> > > So 4X SDR and 4X DDR nodes would have to set their inter-packet delay
> > > for the broadcast group to end up with a 1X packet injection rate.
> >
> > So, basically, MVAPICH doesn't have code that does either the group
> > creation properly when there is a mixture of HCA bandwidths, or limit
> > the packet injection rate. And IPoIB could violate this rule depending
> > on how user programs use it, e.g. if I did a lot of broadcasting, I
> > could easily exceed 1X's bandwidth.
> >
> > So this is more than just a "fix OpenSM" issue. It's more of a "fix
> > the spec" issue, if I'm understanding it correctly.
>
> No, the spec is fine.  This is a "fix the SW" issue.  If OpenSM
> rejected join requests of nodes for which the MC group is unrealizable
> (that is, some setting of the requestor conflict with the existing
> group, such as the rate), such nodes would not be able to join the
> broadcast group and thus not have IPoIB connectivity.

>
> When the SA responds to the MC join request, the response includes the
> rate.  The recipient of the response should create an address vector
> for the MC group that takes the rate into account, which would cause
> the hardware to honor the injection rate such as to not flood the
> group.  I haven't looked at MVAPICH, so I can't tell you if what it
> does is correct.  IPoIB does seem to do the right thing, though.
>




> - Fab
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>



More information about the general mailing list