[openib-general] Failed multicast join with new multicast module

Sean Hefty sean.hefty at intel.com
Sat May 27 09:44:52 PDT 2006


>I forget exactly what the strategy for this was before the multicast
>module was introduced: whether it was exponential backoff up to some
>limit, or whether it was linear up to some retry count.
>
>Also, in looking at the new multicast code, I see the following:
>
>static int retry_timer = 5000; /* 5 sec */
>module_param(retry_timer, int, 0444);
>MODULE_PARM_DESC(retry_timer, "Time in ms between retried requests.");
>
>static int retries = 3;
>module_param(retries, int, 0444);
>MODULE_PARM_DESC(retries, "Number of times to retry a request.");
>
>so it appears that the multicast module has it's own retry strategy. Is
>that true ? If so, does this interact with IPoIB's for rerequesting or
>has that changed ?

The multicast module uses its own retry strategy, basically just passing the
request down to the MAD layer.  It should fail the join request to the user if
the retries are exceeded.  I should have a userspace multicast test module by
the end of this coming week which will let me stress the multicast code more.

Ipoib uses its own retry strategy, and I believe re-issues the request.  Ipoib
uses an exponential backoff strategy, so it sounds like there's an issue with
the ipoib changes.  Looking at the code, I need to understand how send-only
joins are retried.

- Sean



More information about the general mailing list