[openib-general] Failed multicast join with new multicast module

Hal Rosenstock halr at voltaire.com
Mon May 29 05:33:09 PDT 2006


On Sat, 2006-05-27 at 12:44, Sean Hefty wrote:
> >I forget exactly what the strategy for this was before the multicast
> >module was introduced: whether it was exponential backoff up to some
> >limit, or whether it was linear up to some retry count.
> >
> >Also, in looking at the new multicast code, I see the following:
> >
> >static int retry_timer = 5000; /* 5 sec */
> >module_param(retry_timer, int, 0444);
> >MODULE_PARM_DESC(retry_timer, "Time in ms between retried requests.");
> >
> >static int retries = 3;
> >module_param(retries, int, 0444);
> >MODULE_PARM_DESC(retries, "Number of times to retry a request.");
> >
> >so it appears that the multicast module has it's own retry strategy. Is
> >that true ? If so, does this interact with IPoIB's for rerequesting or
> >has that changed ?
> 
> The multicast module uses its own retry strategy, basically just passing the
> request down to the MAD layer.  It should fail the join request to the user if
> the retries are exceeded.  I should have a userspace multicast test module by
> the end of this coming week which will let me stress the multicast code more.
> 
> Ipoib uses its own retry strategy, and I believe re-issues the request.  Ipoib
> uses an exponential backoff strategy, so it sounds like there's an issue with
> the ipoib changes.  Looking at the code, I need to understand how send-only
> joins are retried.

Send-only joins is another case. These are full member joins (JoinState
1) to groups which are not yet created so they fail.

-- Hal

> - Sean




More information about the general mailing list