[ofa-general] [PATCH] perftest Add rdma_cm retries

Steve Wise swise at opengridcomputing.com
Thu Jul 23 19:50:21 PDT 2009


David McMillen wrote:
>
>
> On Thu, Jul 23, 2009 at 8:29 PM, Steve Wise 
> <swise at opengridcomputing.com <mailto:swise at opengridcomputing.com>> wrote:
>
>     Can't you just up the value passed into rdma_resolve_addr()?
>      Currently this code passes in 2000 (ms).  Did you try changing
>     this to say 20000?
>
>
> I didn't try that.  Timeouts on rdma_resolve_addr are much more rare 
> than on rdma_resolve_route, so test cases are harder to come by.  I 
> did want to offer a solution that seemed to work.
>
> I have not looked at every code path for every possible subsystem that 
> rdma_cm will use.  I don't even have a good reason to know that any 
> particular timeout value is appropriate.  It would be nice if there 
> was some way to get that information for a particular instance of an 
> rdma_cm_id.  The same goes for the retry mechanism - is it worthwhile 
> to retry, and how many times is enough?  The values in this patch 
> happen to work for the Infiniband fabrics I use, but my experience is 
> limited.
>
> Are you saying that one rdma_resolve_addr with a 20,000 ms timeout is 
> as good (or maybe even better) than 10 repeats of failed calls using 
> 2,000 ms timeouts?  If that is true, and always will be for any fabric 
> rdma_cm uses, then it seems obvious that we should just change the 
> timeout and not do the retry.

I think so.  But if you test it on your setup, that would be best...

Stevo



More information about the general mailing list