[ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

Steve Wise swise at opengridcomputing.com
Thu Aug 2 09:26:43 PDT 2007


Sean Hefty wrote:
>> In the RFC patch I posted, the socket is _just_ to allow binding to a 
>> port/addr.  Its not used for anything else.  From the native stack's 
>> perspective, its a TCP socket in the CLOSED state (but bound) I guess.
> 
> For RDMA, I think we're somewhere in between binding to an address, 
> versus mapping the address.  We map the address to an RDMA device, but 
> also use that address in connections.  So, we do a little more than 
> simply map the address to a device, but if the address migrates to 
> another device, we don't follow it.
> 
> I can't really think of any issues that might be caused by this, but I'm 
> not sure.  If an app is listening on an address the goes away, would a 
> new wildcard listen work?

no:

[root at vic10 ~]# ifconfig eth1 192.168.69.135 up
[root at vic10 ~]# netserver -L 192.168.69.135 -p 2222 -4
Starting netserver at port 2222
Starting netserver at hostname 192.168.69.135 port 2222 and family AF_INET
[root at vic10 ~]# ifconfig eth1 0.0.0.0 down
[root at vic10 ~]# netserver -L 0.0.0.0 -p 2222 -4
Starting netserver at port 2222
set_up_server could not establish a listen endpoint for  port 2222 with 
family AF_INET
[root at vic10 ~]#


> 
>> By active, do you mean in the ESTABLISHED state?
> 
> Yes

Well, excluding the political issues (patents/etc) and the general 
dislike for offload/toe by the linux community, it is technically 
doable, but not trivial.  The host stack would have to keep track of 
which connections are offloaded and which ones aren't.  For instance, it 
should handle (and fail) an app trying to send() on a socket that's in 
rdma mode.  Also the transition logic for pushing active connection to 
an rdma device would be very messy.  In part, this is due to the fact 
that there's no way to freeze the connection while you're offloading it. 
  So the host stack has to deal with incoming data, or outgoing data 
during the transition and pass that stuff to the rdma driver after the 
offload is complete.  Its messy, but doable.  MS Windows supports this 
exact design.  Note the iwarp HW must support this as well.  Ammasso 
doesn't.  Chelsio does.  I _think_ the other up and coming iwarp devices 
also support it because Windows supports it.

But I don't think we should consider this approach. I think we should 
minimally sync up with the native stack, like we are already doing. 
This is already done in a number of ways:

- using the host routing table to determine the local interface to be used.
- the association of a netdev device with an rdma device via the hwaddr.
- netevents that allow an iWARP devices to track neighbour/next hop changes.

The port space is one more that needs to be handled for iWARP.


Steve.



More information about the general mailing list