[ofa-general] Infiniband back-to-back without OpenSM?
James Lentini
jlentini at netapp.com
Thu May 29 08:37:15 PDT 2008
On Wed, 28 May 2008, Hal Rosenstock wrote:
> On Wed, 2008-05-28 at 09:24 -0400, Talpey, Thomas wrote:
> > At 09:03 AM 5/28/2008, Hal Rosenstock wrote:
> > >On Wed, 2008-05-28 at 08:56 -0400, Talpey, Thomas wrote:
> > >> At 08:39 AM 5/28/2008, Hal Rosenstock wrote:
> > >> >Tom,
> > >> >
> > >> >On Wed, 2008-05-28 at 08:06 -0400, Talpey, Thomas wrote:
> > >> >> Is it possible to manually configure two Infiniband ports to operate
> > >> >> with one another in back-to-back mode, without running OpenSM
> > >> >> on one of them?
> > >> >
> > >> >This is possible but something would need to do at least some subset of
> > >> >what the SM does depending on the precise requirements and the limits
> > >> >placed on the environment supported without a "full blown" SM.
> > >>
> > >> Okay ... but IMO the only thing we need is a LID. Or at least, in my
> > >experience
> > >> all I've needed is a LID.
> > >
> > >The port also needs to be walked from init to active which takes
> > >coordination at both ends of the b2b link.
> >
> > Yep. But, it has all it needs with a LID, right? No messages need to be
> > exchanged, for instance.
>
> It's more than a LID and messages do need to be exchanged (mini SM ->
> SMA) to walk the port from INIT to ACTIVE. This needs to be coordinated
> on both sides of the link so they move in rough concert.
>
> > >> In a previous effort, we simply stole the low octet of an IP address, so we'd
> > >> "ifconfig ib0 1.2.3.X" and it would jam lid=X into the interface.
> > >Worked great.
> > >> If necessary, we would set a manual arp entry (using iproute) to avoid having
> > >> to broadcast.
> > >
> > >That could be done if that is what is desired and can be relied upon
> > >(that ib0 is configured and we only care about the first port).
> > >
> > >Is it just ARP support that is needed ?
> >
> > Well, ARP is the precursor to establishing an IP send and a TCP connection,
> > which we need to do also.
>
> I was just asking about other broadcast/multicast needs. Sounds like
> this is not the case.
>
> > But, if the resulting ipaddr-hwaddr mapping is
> > installed, then ARP is unnecessary and the IP layer can send without using it.
> >
> > When we did this before, we'd install a "permanent" ARP entry, in a two-line
> > shell script. Roughly, for peers configuring lids X and Y, it would do
> >
> > peer X:
> > ifconfig ib0 1.2.3.X
> > ip neigh add 1.2.3.Y nud permanent lladdr a.b.c.d.e.f....Y (i.e. Y's guid)
> >
> > peer Y:
> > ifconfig ib0 1.2.3.Y
> > ip neigh add 1.2.3.X nud permanent lladdr a.b.c.d.e.f....X
> >
> > And we'd be up and running for both IP and RDMA connections. We fixed a
> > bug in the old iproute2 command to allow the long IB link addresses.
> >
> > I'm thinking that using IPOIB to drive this kind of manual setup is one way
> > to approach it. It certainly would be simple, and worked for us before there
> > was an OFA stack.
>
> This would still work.
>
> > Maybe I'm getting ahead of myself though, still wondering if there's a way
> > to do it with what we have.
>
> The closest thing is OpenSM run once mode but I think you've been
> describing a b2b mini SM command which wouldn't be hard to implement.
Unreleated to NFS/RDMA, I wrote a small kernel module that used MADs
to assign a lid, and then transitioned the port to ARMED and ACTIVE.
This worked for enabling IB communication, but not IPoIB. In
retrospect, I probably could have implemented the same functionality
in userspace.
> -- Hal
>
> > Tom.
> >
> > >
> > >> >> We have done this on other IB implementations by manually assigning
> > >> >> LIDs, but I discover that the "lid" entry below
> > >> >/sys/class/infiniband/<device>
> > >> >> is not writable, at least for mthca.
> > >> >
> > >> >This can be done via MADs so user_mad kernel module would be needed to
> > >> >do this.
> > >>
> > >> Okay, all kernel modules can be assumed to be in place. How do we tell it
> > >> to manage the LID, with a shell command?
> > >
> > >A new "command" would be needed.
> > >
> > >-- Hal
> > >
> > >> >> Also, I expect that the ipoib driver will
> > >> >> be unable to join the broadcast group, so will be unwilling to
> > >come up fully.
> > >> >
> > >> >Is IPoIB a requirement ?
> > >>
> > >> I think so, for two reasons. One, principle of least surprise - the user will
> > >> expect to be able to ping, telnet etc if it has connectivity. Two,
> > >for NFS/RDMA
> > >> we require TCP and UDP connections in order to perform the mount and do
> > >> locking and recovery. We could do those over a parallel ethernet connection,
> > >> but that's kind of not the point.
> > >>
> > >> >
> > >> >> With ethernet, and maybe iWARP, just a simple ifconfig can do this. So why
> > >> >> not IB?
> > >> >
> > >> >The simple answer is that it is the nature of IB management (being
> > >> >different than ethernet).
> > >>
> > >> Which, IMO, we need to boil down to simplest-possible, for at least some
> > >> workable configuration.
> > >>
> > >> Thanks for the ideas!
> > >>
> > >> Tom.
> > >>
> > >> >
> > >> >-- Hal
> > >> >
> > >> >> If you're wondering, my goal is give NFS/RDMA users a way to avoid having
> > >> >> to install the many userspace modules needed to do this, including
> > >> >libibverbs,
> > >> >> opensm, etc. There's a lot to get wrong, and things go missing. Seeking an
> > >> >> "easy" way to get started with just the kernel and some shell commands.
> > >> >>
> > >> >> Tom.
> > >> >>
> > >> >> _______________________________________________
> > >> >> general mailing list
> > >> >> general at lists.openfabrics.org
> > >> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > >> >>
> > >> >> To unsubscribe, please visit
> > >> >http://openib.org/mailman/listinfo/openib-general
> > >>
> > >
> > >_______________________________________________
> > >general mailing list
> > >general at lists.openfabrics.org
> > >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > >
> > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list