[ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

Steve Wise swise at opengridcomputing.com
Wed Aug 15 07:42:41 PDT 2007



David Miller wrote:
> From: Sean Hefty <mshefty at ichips.intel.com>
> Date: Thu, 09 Aug 2007 14:40:16 -0700
> 
>> Steve Wise wrote:
>>> Any more comments?
>> Does anyone have ideas on how to reserve the port space without using a 
>> struct socket?
> 
> How about we just remove the RDMA stack altogether?  I am not at all
> kidding.  If you guys can't stay in your sand box and need to cause
> problems for the normal network stack, it's unacceptable.  We were
> told all along the if RDMA went into the tree none of this kind of
> stuff would be an issue.

I think removing the RDMA stack is the wrong thing to do, and you 
shouldn't just threaten to yank entire subsystems because you don't like 
the technology.  Lets keep this constructive, can we?  RDMA should get 
the respect of any other technology in Linux.  Maybe its a niche in your 
opinion, but come on, there's more RDMA users than say, the sparc64 
port.  Eh?

> 
> These are exactly the kinds of problems for which people like myself
> were dreading.  These subsystems have no buisness using the TCP port
> space of the Linux software stack, absolutely none.
> 

Ok, although IMO its the correct solution.  But I'll propose other 
solutions below.  I ask for your feedback (and everyones!) on these 
alternate solutions.

> After TCP port reservation, what's next?  It seems an at least
> bi-monthly event that the RDMA folks need to put their fingers
> into something else in the normal networking stack.  No more.
>

The only other change requested and commited, if I recall correctly, was 
for netevents, and that enabled both Infiniband and iWARP to integrate 
with the neighbour subsystem.  I think that was a useful and needed 
change.  Prior to that, these subsystems were snooping ARP replies to 
trigger events.  That was back in 2.6.18 or 2.6.19 I think...

> I will NACK any patch that opens up sockets to eat up ports or
> anything stupid like that.

Got it.

Here are alternate solutions that avoid the need to share the port space:

Solution 1)

1) admins must setup an alias interface on the iwarp device for use with 
rdma.  This interface will have to be a separate subnet from the "TCP 
used" interface.  And with a canonical name that indicates its "for rdma 
only".  Like eth2:iw or eth2:rdma.  There can be many of these per device.

2) admins make sure their sockets/tcp services don't use the interface 
configured in #1, and their rdma service do use said interface.

3) iwarp providers must translation binds to ipaddr 0.0.0.0 to the 
associated "for rdma only" ip addresses.  They can do this by searching 
for all aliases of the canonical name that are aliases of the TCP 
interface for their nic device.  Or: somehow not handle incoming 
connections to any address but the "for rdma use" addresses and instead 
pass them up and not offload them.

This will avoid the collisions as long as the above steps are followed.


Solution 2)

Another possibility would be for the driver to create two net devices 
(and hence two interace names) like "eth2" and "iw2", and artificially 
separate the RDMA stuff that way.

These two solutions are similar in that they create a "rdma only" interface.

Pros:
- is not intrusive into the core networking code
- very minimal changes needed and in the iwarp provider's code, who are 
the ones with this problem
- makes it clear which subnets are RDMA only

Cons:
- relies on system admin to set it up correctly.
- native stack can still "use" this rdma-only interface and the same 
port space issue will exist.


For the record, here are possible port-sharing solutions Dave sez he'll NAK:

Solution NAK-1)

The rdma-cma just allocates a socket and binds it to reserve TCP ports.

Pros:
- minimal changes needed to implement (always a plus in my mind :)
- simple, clean, and it works (KISS)
- if no RDMA is in use, there is no impact on the native stack
- no need for a seperate RDMA interface

Cons:
- wastes memory
- puts a TCP socket in the "CLOSED" state in the pcb tables.
- Dave will NAK it :)

Solution NAK-2)

Create a low-level sockets-agnostic port allocation service that is 
shared by both TCP and RDMA.  This way, the rdma-cm can reserve ports in 
an efficient manor instead of doing it via kernel_bind() using a sock 
struct.

Pros:
- probably the correct solution (my opinion :) if we went down the path 
of sharing port space
- if no RDMA is in use, there is no impact on the native stack
- no need for a separate RDMA interface

Cons:

- very intrusive change because the port allocations stuff is tightly 
bound to the host stack and sock struct, etc.
- Dave will NAK it :)


Steve.



More information about the general mailing list