[ofa-general] What should a ULP pass as ib_create_cq(..., comp_vector) ?

Wed Jul 11 13:48:49 PDT 2007

general-bounces at lists.openfabrics.org wrote:
> At 12:57 PM 7/11/2007, Roland Dreier wrote:
>> However on another level your question gets to the reason why we
>> haven't implemented support for multiple completion event vectors.
>> Namely, it's not clear how consumers, kernel or userspace, can make a
>> good choice of which vector to assign a given CQ to.
> 
> Got it, thanks. But aren't the vectors shared across all
> consumers on an HCA? As such, it seems problematic to expect
> consumers to make optimal choices, since they have no way of
> knowing what other consumers are doing.
> 
> In any case, all NFS/RDMA does is to check the completion
> status, queue the event and schedule a tasklet, so there is
> little or no parallelism to be gained in the upcall. I'd
> prefer to not have to wait for other ULPs on the same vector, of
> course. 
> 

What a single Consumer could do is to clump as many of their CQs
as possible into a single "bag" where serialization of notifications
for these CQs would have little detrimental impact on the application.
As you point out, for most applications this is all of their CQs.

This would presume that when the Consumer supplied too many that the
lower layers would simply say "tough" and combine some of them
(achieving
less than optimal results, but better than having the OS assign
notification
queues on a totally arbitrary basis).

To use the actual number implies that it would be meaningful for *each*
application to divide its CQs over that set, without any mechanism to
balance applications themselves. That would seem to imply that a typical
Consumer would have a large number of CQs, when I've never understood
the need for more than one per core per application.

At the minimum, if the actual number were published by the device, would
the kernel consumers actually be able to distribute their CQs over the
set?
Tom, I definitely agree that userland consumers have absolutely no way
to
do that reasonably, but do you think it is plausible for the kernel to
do
so far kernel-resident consumers? If not, what would be needed to bridge
that gap? Or is the need for parallelism so small amongst kernel
completion
handlers that the kernel does not need this feature?