[ofa-general] [PATCH] rds: use separate ports for TCP and IB

Talpey, Thomas Thomas.Talpey at netapp.com
Tue Jun 3 06:21:35 PDT 2008


At 04:25 AM 6/2/2008, Olaf Kirch wrote:
>On Friday 30 May 2008 18:29:01 Talpey, Thomas wrote:
>> Yes, at the moment TCP to the iWARP NIC is like talking to a different
>> host. But, RDMA-aware versions of a given protocol still need a second
>> port, unless there is explicit upper layer support for initiating the MPA
>> exchange. We have the same issue with NFSv3/RDMA, and we have
>> applied for a second port (the application is still pending within IANA).
>
>This may also become a problem for RDS, but in a different sense.
>If you bind an RDS socket to a specific IP address, this also selects
>a transport. If you bind to an IP address owned by ib0, you select
>the IB transport. If you bind to an IP address owned by say eth0, you
>select the TCP transport.

I think this is a bit of an overloading, and will lead to issues down the road.
IP addresses really aren't about interfaces, they're about hosts. For example,
on TCP (etc), hosts will respond to any of their IP addresses on any of their
interfaces (IP routing often depends on this). I suspect it would be better to
make the transport selection explicit. IPv4/IPv6 selection may fall into the
same issue later.

>Now if the one and only NIC in the system is an iWARP NIC, which transport
>should we select? Both iWARP and TCP can be legitimate choices, depending
>on the hosts we're talking to (and how do we find out whether the remote
>node supports iWARP or not?)
>
>I think for now it's okay to default to iWARP and punt if the remote
>doesn't support it. But in the long run this is something that needs
>to be addressed - either in RDS, or in the way ofed handles iWARP NICs.

Here I think you've made an even bigger assumption - that not only does
your interface support iWARP, you're assuming the peer's does too! But
because the kernel doesn't plumb iWARP/TCP into the stack I guess it's
moot. Still...

Is RDS able to use MPA negotiation to mutually agree to enable RDMA?
If so, you might consider doing so, and simply use TCP unless you detect
otherwise. The NFSv4.1 protocol session negotiation supports this, for
example. FOr the existing NFS protocols, we rely on explicit instructions
from the mount command (-o proto=rdma), and make the connection in
RDMA mode.

>In fact it may not be such a bad idea to treat an iWARP NIC as two
>devices, and register one ethX device (owned exclusively by the normal
>stack) and one ibX device (owned mostly by the ofed stack, maybe with
>the exception of ARP and such). Would that work with protocols like
>NFS/RDMA neogtiation - ie can you negotiate specify a secondary port
>_and_ address?

NFS/RDMA does, in fact, specify a secondary port. The reason is because
NFSv4 (and v2 and v3) doesn't support negotiation, but v4 also does not
use the RPC portmapper. When listening on an iWARP NIC therefore, it
MUST use a different port from the default 2049, or else MPA would start
seeing NFSv4 requests immediately on connect! So, a second well-known
port is specified.

I'm still waiting for the official assignment of this number from IANA. BTW,
for simplicity, we simply use this port on all NFS/RDMA listens, even though
it's technically not required on IB because the rdma listening port space is
distinct from TCP's.

Tom.




More information about the general mailing list