[openib-general] Failed multicast join with new multicast module
Hal Rosenstock
halr at voltaire.com
Mon May 29 05:33:09 PDT 2006
On Sat, 2006-05-27 at 12:44, Sean Hefty wrote:
> >I forget exactly what the strategy for this was before the multicast
> >module was introduced: whether it was exponential backoff up to some
> >limit, or whether it was linear up to some retry count.
> >
> >Also, in looking at the new multicast code, I see the following:
> >
> >static int retry_timer = 5000; /* 5 sec */
> >module_param(retry_timer, int, 0444);
> >MODULE_PARM_DESC(retry_timer, "Time in ms between retried requests.");
> >
> >static int retries = 3;
> >module_param(retries, int, 0444);
> >MODULE_PARM_DESC(retries, "Number of times to retry a request.");
> >
> >so it appears that the multicast module has it's own retry strategy. Is
> >that true ? If so, does this interact with IPoIB's for rerequesting or
> >has that changed ?
>
> The multicast module uses its own retry strategy, basically just passing the
> request down to the MAD layer. It should fail the join request to the user if
> the retries are exceeded. I should have a userspace multicast test module by
> the end of this coming week which will let me stress the multicast code more.
>
> Ipoib uses its own retry strategy, and I believe re-issues the request. Ipoib
> uses an exponential backoff strategy, so it sounds like there's an issue with
> the ipoib changes. Looking at the code, I need to understand how send-only
> joins are retried.
Send-only joins is another case. These are full member joins (JoinState
1) to groups which are not yet created so they fail.
-- Hal
> - Sean
More information about the general
mailing list