[ofa-general] [PATCH] perftest Add rdma_cm retries

David McMillen davem at systemfabricworks.com
Mon Jul 27 11:01:49 PDT 2009


On Mon, Jul 27, 2009 at 9:01 AM, Or Gerlitz <ogerlitz at voltaire.com> wrote:

> David McMillen wrote:
>
>>
>> I am pleased to see the discussions this has raised about possible changes
>> to the underlying kernel services.   However, this user-mode patch does
>> address a real problem in the stack as it stands, and I'd like to see
>> agreement that it should be applied
>>
>
> Generally speaking, since you only changed a synthetic program, I don't
> have any  issue with your patch, except for maybe creating possible
> confusion among people that can use this code as reference for their apps,
> see more below


I would be happy to rework this patch again to make it a proper example.
Please let me know what to change.  I read through the threads referenced
below and did not see anything that relates to this patch.

In all versions of rdma_cm, including what is in the about to released OFED
1.5, and assuming a properly functioning fabric, both rdma_resolve_addr and
rdma_resolve_route can fail due to ETIMEDOUT, and both can be retried with
success.  Is there an example that I missed somewhere that shows how I
should be doing things?

Thanks,
   Dave


>
>
>  I don't know enough about the kernel level code to make comments on the
>> best way to improve it, but I would like to make an observation from the
>> user level.  There is no way for the user level to have an informed decision
>> about the proper value for timeouts or retry counts ...
>>
> The scalability issue is in the air for couple of years now, somehow it
> came into of many people being sure the problem is SA scalability, where
> personally, I am not sure this is the case. Also, in the past have tried to
> set up time for low level technical talking on the matter in ofa meeting,
> but it was almost always washed a way or given very tiny time slot since
> more important issues such as why ofed is the greatest thing on earth and
> what was the content of its last version and what will be the content of its
> next version, etc, etc. So what happens is that from time to time this or
> that related issue comes to the list, we have a thread on that and things
> are left in the air. Recently Sean commented that he works on something
> http://lists.openfabrics.org/pipermail/ewg/2009-July/013618.html also at
> May we had a related thread "How to establish IB communcation more
> effectively?"  @
> http://lists.openfabrics.org/pipermail/general/2009-May/thread.html#59574etc etc
>
> Or.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090727/8f6643cb/attachment.html>


More information about the general mailing list