[openib-general] [Bug 325] New: RDMA_CM and address translation broken on sles9sp3

Caitlin Bestler caitlinb at broadcom.com
Fri Jan 26 09:15:55 PST 2007


openib-general-bounces at openib.org wrote:
> https://bugs.openfabrics.org/show_bug.cgi?id=325
> 
>            Summary: RDMA_CM and address translation broken on sles9sp3
>            Product: OpenFabrics Linux
>            Version: 1.2
>           Platform: X86-64
>         OS/Version: SLES 9
>             Status: NEW
>           Severity: critical
>           Priority: P2
>          Component: RDMA CM
>         AssignedTo: bugzilla at openib.org
>         ReportedBy: swise at opengridcomputing.com
> 
> 
> rdma_translate_ip() and friends use
> ip_dev_find(local_ip_addr) to obtain a net_device pointer.
> Then the device type is used to determine if the rdma
> address   is iwarp or infiniband.
> 
> On sles9sp3, ip_dev_find(local_ip_addr) is returning the
> loopback device.  This causes rmda_copy_addr() to fail.

I suspect that this is the most obvious case of a more 
general problem. Specifically there is a higher priority
route to the destinatiaon IP address that is not RDMA
capable.

Essentially the selected route needs to be considered
"down" for RDMA traffic, so the less preferred route
can be taken.

The more specific problem could be addressed by making
the loopback device support OFA verbs, but since nobody
sells loopback devices there might not be a rush of
volunteers.

The other issue is that there may always be remote IP
addresses that are reachable for non-RDMA traffic but
not for RDMA traffic. All it takes is two network interfaces
that connect to two networks that have no routes between
them where one of them is not RDMA capable. Machines that
have Ethernet ports dedicated to an administrative network
are one obvious example.

Ultimately that step suggests that the test should not
be "if the rdma address is iwarp or infiniband" but "iwarp,
infiniband or not RDMA accessible".







More information about the general mailing list