[openib-general] new IB CM reject reason

Thu Feb 1 10:39:22 PST 2007

> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [openib-general] new IB CM reject reason
> 
> > No, I don't think "application crashed" makes sense as an element of wire protocol.
> > I think an optional logging of errors in kernel CM would be a much better
> > solution. I know I had to add some printks it each time I was debugging SDP.
> 
> The "application crashed" scenario is what high-lighted the issue.  The problem 
> is that the CM must provide a reject reason.  Which reject reason do you use? 
> My suggestion was for a reject reason of other/unknown/none given (pick one).

I'm actually happy with what existing code does (consumer reject).
I would like to highlight the lack of ability to make CM errors
go to system log as a weekness in current CM code, which hinders debugging.

Would you be interested in a patch making it possible to enable logging CM errors
and/or all CM events?

> > 2. Another objection is that this feature seems to invite misuse where applications
> > will use REJ reason as a hint on whether remote side crashed. But REJ could be
> > lost. Wouldn't this confuse the remote side?
> 
> Currently, the CM issues the reject using "consumer defined", since nothing else 
> maps any better under this condition.  But the reject isn't consumer defined... 
>   By doing this, an application that expects specific private data in the reject 
> message won't find it, which is just as likely to confuse the remote side.  This 
> is why I think an unknown/unspecified reject reason is needed.
> 
> How an application interprets a reject with 'unknown' reason is up to the 
> application, but I do think this is better than the application trying to guess 
> whether 'consumer defined' really does mean consumer defined.

Are we talking about code 28? My spec lists it as "consumer reject".
The meaning of *private data* is consumer defined.

                   The consumer decided to reject the communica- 
                   tion or EE context setup establishment attempt for
                   reasons other than those listed in the other REJ
                   codes. Typically this happens based upon infor-
                   mation being conveyed in the PrivateData field of
                   a message. It can also happen because the Con-
                   sumer decided for reasons unrelated to any CM
                   message it received to terminate the communica-
                   tion or EE context setup establishment attempt.
                   This would therefore be the appropriate Reason
                   code to use if the Consumer decided to destroy
                   the QP or EEC in the midst of the communication
                   or EE context setup establishment attempt.

So this really *does* seem to be what spec intended for exactly our case.

Now, I do not really object to inventing new rejection reasons: for example,
maybe we can invent one that lets us stick the errno value in private data
somehow - but it's not like there's no solution inside the spec,
and inventing a whole new reject reason just for userspace consumers
seems like a narrow approach to me.

-- 
MST