[openib-general] [PATCH] [CM] add private data comparison tomatch REQs with listens

Caitlin Bestler caitlinb at broadcom.com
Fri Dec 2 15:01:47 PST 2005


Rimmer, Todd wrote:
>> -----Original Message-----
>> From: Tillier, Fabian
>> Sent: Friday, December 02, 2005 5:21 PM
>> To: 'Caitlin Bestler'; 'Sean Hefty'
>> Cc: openib-general at openib.org
>> Subject: RE: [openib-general] [PATCH] [CM] add private data
>> comparison tomatch REQs with listens 
>> 
>> 
>>> From: Caitlin Bestler [mailto:caitlinb at broadcom.com]
>>> Sent: Friday, December 02, 2005 12:13 PM
>>> 
>>> Sean Hefty wrote:
>>>> Fab Tillier wrote:
>>>>>> Just listen on the Service ID / Port and let the ULP sort them
>>>>>> out by destination IP address.
>>>>> 
>>>>> That only works if there is a single kernel module providing the
>>>>> extra checks. Multiple user-mode ULPs cannot do the checking in
>>>>> user-mode - the checking must be done in the kernel to figure out
>>>>> which user-mode client to hand the request to.
>>>>> 
>>>>> I think putting in restrictions to the comparisons possible is
>>>>> fine, as the functionality of having the CM facilitate some sort
>>>>> of filtering is useful.
>>>> 
>>>> My concern with pushing this to the ULP is that it requires the
>>>> ULP to track service IDs for reference counting purposes and adds
>>>> additional synchronization to the ULP that could have been handled
>>>> by the CM. 
>>>> 
>>>> I'm looking at what the full effect of implementing this in the ULP
>>>> would be.
>>> 
>>> I'm still missing something.
>>> 
>>> I don't see how filtering in the CM is of benefit in either case.
>>> The work either belongs in the Hypervisor or in the Daemon, not the
>>> CM. 
>> 
>> Your focus is strictly on TCP socket semantics, but we're talking
>> about IB CM functionality - the IB CM does more than just provide
>> TCP socket semantics. 
>> 
>> Imagine a user-mode IB application (not virtualization mind you, but
>> just an app) that wants to listen on a given SID (because the SID
>> defines the application), but wants to discriminate incoming
>> requests based on some content in the private data.  Multiple
>> instances of that application can only work properly if the CM
>> performs the private data comparison to properly dispatch the
>> incoming requests to the right user-mode process. 
>> 
>> If the CM doesn't provide the private data compare functionality,
>> then the app developer needs to create a kernel agent to perform this
>> functionality for the app.  The functionality is simple enough, and
>> has potential value to multiple clients, that it makes sense to have
>> the IB CM provide it. 
>> 
>> - Fab
> 
> I agree, to give you a good practical example, MPI needs to
> listen for incoming connections.
> 
> It is wasteful to have MPI create separate SIDs for each rank
> (especially when there can be thousands of ranks in many jobs
> all running in the same cluster parts of which on the same
> node) and then listen on 1000s of SIDs in each process.
> 
> Instead it makes sense to use a single SID for the entire job
> (possibly using the global Job ID as part of the SID), and
> have the private data of the REQ indicate the destination
> rank of the request.  Then each rank in the MPI job can
> listen for the combination of the global Job ID's SID and
> private data where the destination rank matches itself (using
> 1 listening CEP per process) and let the CM filter by both
> criteria and deliver the REQs to the appropriate processes.
> 
> The above scheme works very well and minimizes CM resource
> use for large MPI jobs.
> 
> I'm sure other interesting and useful examples can be found as well.
> 

MPI works over plain TCP right now, and yet there is no such
feature in INETD or in current socket listens. And they do not
allocate a TCP Port to listen for each connection. Rather the
same listen just accepts each connection and either creates
the process or passes the handle to a process.


There are many reasons why an established RDMA connection 
cannot be passed between processes, but I know of know 
reason why a Connection Request cannot be passed to a child
or third process where it can be accepted.

Why not emulate the existing solution rather than creating
a new interface that is transport specific?


Or conversely, if you truly think this is of general utility,
why not implement it in INETD as well?




More information about the general mailing list