[ofa-general] Re: IPOIB CM (NOSRQ)[PATCH V4] patch for review

Pradeep Satyanarayana pradeeps at linux.vnet.ibm.com
Fri May 11 12:19:46 PDT 2007


Michael S. Tsirkin wrote:
>> Quoting Pradeep Satyanarayana <pradeeps at linux.vnet.ibm.com>:
>> Subject: Re: IPOIB CM (NOSRQ)[PATCH V4] patch for review
>>
>> If there are no other issues than the small restructure suggestion that
>> Michael had, can this patch be merged into the for-2.6.22 tree?
> 
> I'm not sure.
> 
> I haven't the time, at the moment, to go over the patch again in depth.
> Have the issues from this message been addressed?
> 
> http://www.mail-archive.com/general@lists.openfabrics.org/msg02056.html
> 
> Just a quick review, it seems that two most important issues have
> apparently not been addressed yet:
> 
> 1. Testing device SRQ capability twice on each RX packet is just too ugly,
>    and it *should* be possible to structure the code
>    by separating common functionality in separate
>    functions instead of scattering if (srq) tests around.

I have restructured the code as suggested. In the latest code, there are
only two places where SRQ capability is tested upon receipt of a packet:
a) ipoib_cm_handle_rx_wc()
b)ipoib_cm_post_receive()

Instead of the suggested change to ipoib_cm_handle_rx_packet() it is
possible to change ipoib_cm_post_receive() and call the  srq and nosrq
versions directly, without mangling the code. However, I do not believe 
that this should be stopping us from the code being merged. This can 
handled as a separate patch.


> 
> 2. Once the number of created connections exceeds
>    the constant that you allow, all attempts to communicate
>    with this node over IP over IB will fail.
>    A way needs to be designed to switch to the datagram mode,
>    and to retry going back to connected after some time.
>    [We actually have this theoretical issue in SRQ
>     as well - it is just much more severe in the nonSRQ case].

Firstly, this has now been changed to send a REJ message to the remote
side indicating that there no more free QPs. It is up to the application
to handle the situation. Previously, this was flagged as an error that
appeared in /var/log/messages.

However, here are a few other things we need to consider. Lets us
compute the amount of memory consumed when we run into this situation:

In CM mode we use 64K packets. Assuming, the rx_ring has 256 entries and
the current limitation of 1024 QPs, NOSRQ only will consume 16GB of 
memory. All else remaining the same if we change the rx_ring size to 
1024, NOSRQ will consume 64GB of memory.

This is huge and my guess is that on most systems, the application will 
run out of memory before it runs out of RC QPs (with NOSRQ).

Aside from this I would like to understand how do we switch just the 
"current" QP to datagram mode; we would not want to switch all the
existing QPs to datagram mode -that would be unacceptable. Also, we
should not prevent subsequent connections using RC QPs. Is there 
anything in the IB spec about this?

I think solving this is a fairly big issue and not just specific to
NOSRQ. NOSRQ is just exacerbating the situation. This can be dealt with
all at once with SRQ and NOSRQ, if need be.

Hence, I do not see these as impediments to the merge.

Pradeep




More information about the general mailing list