[openib-general] [PATCH] IB/cma: add rdma_establish
Or Gerlitz
ogerlitz at voltaire.com
Tue Sep 12 01:33:22 PDT 2006
Sean Hefty wrote:
> Michael S. Tsirkin wrote:
>> Sean, did we decide what to do for upstream yet?
>> I would say we need something like the below for 2.6.19 too
>> (probably just need to update node type check).
>> And, I like it that this approach leaves all matters of policy
>> to users (such as whether move QP to RTS after asynchronous event
>> or after completion event).
> I will go with a patch similar to this one. It seems the most flexible.
Just to make sure, you come to say that you would merge this patch
instead the one that had the CM track local qp numbers and install a
callback for the consumer QP to catch the async event etc?
Also i'd like to make sure i follow what would happen:
T1) the consumer gets an rx completion on a QP associated with a non
established CMA ID
[also on some point along time the async handler is called with a
COMM_EST async event for this QP]
T2) the consumer calls rdma_establish()
T3) the consumer cma callback is called with ESTABLISHED event and is
now able to post sends to the QP
Indeed the **patch** for itself is somehow simpler, but the consumer
must get established event before posting sends to the qp so they need
to either queue RX-es or modify the QP to RTS before sending the REP.
As i said before this is fine with our iser target as we queue the sole
possible RX (login request) till getting the established.
Is rdma_established() --> cm_establish() callable from non interruptible
context? our target does a context jump once the cq handler is called so
it does the actual processing in thread level, but there may be other
consumers attempting to call rdma_establish from the hard-irq cq
callback context.
Also does the patch ensures only one ESTABLISHED event would be called
for the id, no matter if rdma_establish() and an RTU reception happen in
parallel?
>> As a side note, reasons for frequent loss of RTU must be investigated.
> A lost RTU shouldn't be any more likely than a lost REQ or REP. Is the RTU
> never showing up? I will look into the ib_cm and see if there's an issue that
> would cause an RTU not to be retried.
Indeed, my initial suspect was that heavy CPU load on the server node
prevents the mad/cm threads to be scheduled in, but as REQ messages do
appear i also thought we should see if a "retried" REP cause a resend on
the RTU.
More information about the general
mailing list