[ofa-general] RE: CM goes to timewait state without waiting for disconnect reply

Amir Vadai amirv at mellanox.co.il
Thu Apr 17 11:25:56 PDT 2008


When the client closes the connection it calls ib_destroy_cm_id() who calls cm_destroy_id().
In my scenario it happen when the CM is in state "Established". In this state ib_send_cm_dreq() is called.
This function sends a DREQ and change state to "DREQ sent".
After that the function returns and the switch is tried again this time we're in state "DREQ sent".
There the state is changed into "TimeWait".

It means that when calling ib_destroy_cm_id() - the CM sends a DREQ and goes immediately to state "TimeWait" without waiting for DREP.

It looks like it is the most usual situation and not a special one.

I'm looking at the code from the head of ofed git in openfabrics.

- Amir

-----Original Message-----
From: Sean Hefty [mailto:sean.hefty at intel.com] 
Sent: ה 17 אפריל 2008 19:14
To: Amir Vadai
Cc: general at lists.openfabrics.org
Subject: RE: CM goes to timewait state without waiting for disconnect reply

> In the spec, a normal flow to close a connection is at the client 
> side: State "Established" ---- send DREQ ---> State "DREQ sent" --- 
> receive DREP ---> State "TimeWait"  ---> State "Idle"

Yes - the CM kernel code follows this state machine. 

> According to the code and tests I did, it seems that ib_cm doesn't 
> wait for DREP and goes directly from "DREQ sent" into "TimeWait".

This can happen in specific situations, such as errors, if the user destroys the cm_id without waiting for the DREP (treated as a DREQ timeout), or if both sides initiate a DREQ.

> I think that this is a bug, am I right?

I don't see that the code follows the behavior that you're describing.

In ib_send_cm_dreq(), the cm_id state changes to DREQ_SENT.

In cm_drep_handler() (called when a DREP is received), the cm_id state is verified to be DREQ_SENT, then transitioned to TIMEWAIT.

If you can describe the test details more, I can try to find the most likely code path that's being hit.  It's possible that you're hitting one of the situations mentioned above.

- Sean




More information about the general mailing list