[openib-general] RE: [Dapl-devel] queue pair destruction on dat_ep_disconnect
Sears, Steven
Steven.Sears at netapp.com
Fri Mar 18 09:45:09 PST 2005
May I point out that taking your analysis to its logical conclusion, you could only ever have a single EP on any machine if each EP has a unique QP. This is obviously false, I think you're looking in the wrong place.
dat_ep_disconnect() is not supposed to destroy a QP, just transition the state to a not-connected state (IB state ERROR). An EP, and by extension a QP, can have several different attributes, it wouldn't be efficient or intuitive if you destroyed the underlying QP just because you are disconnecting. The QP remains attached to the EP until you explicitly free it in dat_ep_free(); this is intentional and by design.
If you look at the state diagram in the DAT spec, you will notice that you should dat_ep_reset() the EP before you try to use it again. This will transition the underlying QP from the ERROR state to INIT. But I don't think you're trying to reuse the EP, so I don't know why it's a problem.
Getting back to your real problem, I'm not sure why you can't create a new EP on a different IA, they should be completely separate. If dat_ep_create() fails, something is hosed. I don't know about the destroy_cbk field as it isn't in the reference implementation, so I can't help you there.
-Steve
> -----Original Message-----
> From: mark kowalski [mailto:mkowalski01 at gmail.com]
> Sent: Friday, March 18, 2005 12:25 PM
> To: openib-general at openib.org; dapl-devel at lists.sourceforge.net
> Subject: [Dapl-devel] queue pair destruction on dat_ep_disconnect
>
>
> I've been doing some experimentation to see if a client using udapl
> can recover from a hard port failure if the second port on the hca
> offers a path to the same destination. I've noticed a problem.
> when on the client I see a timeout waiting on a response from
> the server or I get a transport error from the evd_wait on the
> dat_ep_post_send, I will eventually dat_ep_disconnect the endpoint in
> preparation of recreating the endpoint and trying to connect to a new
> IA obtained from data_ia_open on a name associated with the other port
> on the hca. What I've noticed is that the ep_disconnect does not seem
> to destroy the underlying queue pair and eventhough I issue a new
> dat_ep_create, access to the new end point fails with "resource busy"
> because the destroy_cbk field is still filled in. If I issue the
> dat_ep_free after the dat_ep_disconnect and then start the process of
> creating and connecting to a new end point then it works fine.
> I've noticed in the dapls_ib_disconnect (not the openib one) that
> the call to VAPI_destroy_qp is ifdef'd out. in the openib
> dapls_ib_disconnect there is no call at all to VAPI_destroy_qp. Is
> this intentional? It seems that the dat_ep_disconnect should cleanup
> the underlying queue pair and a dat_ep_free shouldn't be required.
>
> Thanks in advance for any help you can provide,
> Mark Kowalski
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from
> real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Dapl-devel mailing list
> Dapl-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dapl-devel
>
More information about the general
mailing list