[ofa-general] [PATCH] perftest Add rdma_cm retries
Steve Wise
swise at opengridcomputing.com
Thu Jul 23 19:50:21 PDT 2009
David McMillen wrote:
>
>
> On Thu, Jul 23, 2009 at 8:29 PM, Steve Wise
> <swise at opengridcomputing.com <mailto:swise at opengridcomputing.com>> wrote:
>
> Can't you just up the value passed into rdma_resolve_addr()?
> Currently this code passes in 2000 (ms). Did you try changing
> this to say 20000?
>
>
> I didn't try that. Timeouts on rdma_resolve_addr are much more rare
> than on rdma_resolve_route, so test cases are harder to come by. I
> did want to offer a solution that seemed to work.
>
> I have not looked at every code path for every possible subsystem that
> rdma_cm will use. I don't even have a good reason to know that any
> particular timeout value is appropriate. It would be nice if there
> was some way to get that information for a particular instance of an
> rdma_cm_id. The same goes for the retry mechanism - is it worthwhile
> to retry, and how many times is enough? The values in this patch
> happen to work for the Infiniband fabrics I use, but my experience is
> limited.
>
> Are you saying that one rdma_resolve_addr with a 20,000 ms timeout is
> as good (or maybe even better) than 10 repeats of failed calls using
> 2,000 ms timeouts? If that is true, and always will be for any fabric
> rdma_cm uses, then it seems obvious that we should just change the
> timeout and not do the retry.
I think so. But if you test it on your setup, that would be best...
Stevo
More information about the general
mailing list