[ofa-general] Questions about IPOIB handle last WQE event

Shirley Ma mashirle at us.ibm.com
Fri Jul 25 18:07:57 PDT 2008


Hello Roland,

>  > Do the post send work request whenever receiving DREQ, which meant the
>  > remote TX QP has already teared down, there is no new post recv
>  > completions any more after DREQ.
> 
> After a DREQ is received, then the local QP is transitioned to the error
> state.  However, we don't know when all the receives queued up have
> completed (with flush error status).  Also, we may want to clean up a QP
> when we didn't receive a DREQ (remote side crashed, or we just have an
> idle connection).

There are three senarios we need to destroy QPs.
1. During connection establishement
2. Received DREQ
3. Idle connection, or remote side crashed, shutdown

Currently, IPoIB-CM uses the same approach to destroy QP for all these
cases, which is putting the QP to error status, and wait for async event
LAST WQE to destroy the QP from post send last WQ context.

Putting QP to error status could return error, so async event might not
be generated. What I want to propose here:

1. During connection establishment: like REJ, REP failure, we should
destory the QP immediately without putting QP to error since the QP is
not RTU.

2. Received DREQ, we should destroy it in post send WR last WR context,
we don't need to reply on async event.

3. Idle conneciton or remote side crashed, shutdown, we should put the
QP to error, then destroy later. And if number of connections run out,
we can GC the less recent used connection for the new connection.

Then we have a common approach for both nonSRQ and SRQ. We can remove
async event handler. I have tested above patches. But I would like to
hear your thoughts before sumitting them.

Thanks
Shirley




More information about the general mailing list