[ofa-general] rdma_resolve_route() returning -EINVAL

Talpey, Thomas Thomas.Talpey at netapp.com
Thu Oct 2 18:07:31 PDT 2008


At 06:29 PM 10/2/2008, Hal Rosenstock wrote:
>Tom,
>
>On Thu, Oct 2, 2008 at 1:39 PM, Talpey, Thomas 
><Thomas.Talpey at netapp.com> wrote:
>> I'm debugging a reconnect problem in the NFS/RDMA client and
>> am seeing something rather odd. The context is that if a client
>> mount point goes idle for 5 minutes, the Linux RPC layer closes
>> the associated connection. When a new request needs to be
>> sent, the RPC layer then performs a reconnect.
>>
>> At this point, the NFS/RDMA client code will call rdma_create_id()
>> to create a new rdma_cm_id, then rdma_resolve_addr() and
>> finally rdma_resolve_route(). In the reconnect scenario, that
>> last step however returns -EINVAL.
>>
>> Looking at the code, I think the only reasons for this return are
>> 1) calling rdma_resolve_route() in the wrong state (which I'm not),
>> and 2) way down in the ib_post_send_mad() function, if there is
>> a timeout passed-in (which there is) and there's no receive handler
>> registered for the MAD (no clue but it worked the first time).
>
>Are you saying you're suspecting reason 2 above ? FWIW, my read
>relative to ib_post_send_mad is that CM does register a receive

Hi Hal, thanks for looking at it. As it turns out I've determined it's
actually 1) above, but for a new reason.

It turns out that the CM has a new upcall enum called
RDMA_CM_EVENT_TIMEWAIT_EXIT which is emitted shortly after
any disconnect. This upcall arrives either before or during my
connection recovery and signals a completion in my code that
causes the re-binding to skip a step.

What's the purpose of this new upcall, do you know? It's not used
by anything I see.

Tom.

>handler so I don't think -EINVAL comes from there. Are you actually
>seeing the lack of a receive handler or is it from reviewing the code
>looking from where -EINVAL could possibly come ?
>
>-- Hal
>
>> This is using the ib_mthca driver, and 2.6.27-rc7 btw. Any clues to
>> help figure out what might be wrong?
>>
>> Thanks,
>> Tom.




More information about the general mailing list