[ofa-general] Infiniband back-to-back without OpenSM?

Hal Rosenstock hrosenstock at xsigo.com
Wed May 28 06:34:10 PDT 2008


On Wed, 2008-05-28 at 09:24 -0400, Talpey, Thomas wrote:
> At 09:03 AM 5/28/2008, Hal Rosenstock wrote:
> >On Wed, 2008-05-28 at 08:56 -0400, Talpey, Thomas wrote:
> >> At 08:39 AM 5/28/2008, Hal Rosenstock wrote:
> >> >Tom,
> >> >
> >> >On Wed, 2008-05-28 at 08:06 -0400, Talpey, Thomas wrote:
> >> >> Is it possible to manually configure two Infiniband ports to operate
> >> >> with one another in back-to-back mode, without running OpenSM
> >> >> on one of them?
> >> >
> >> >This is possible but something would need to do at least some subset of
> >> >what the SM does depending on the precise requirements and the limits
> >> >placed on the environment supported without a "full blown" SM.
> >> 
> >> Okay ... but IMO the only thing we need is a LID. Or at least, in my 
> >experience
> >> all I've needed is a LID.
> >
> >The port also needs to be walked from init to active which takes
> >coordination at both ends of the b2b link.
> 
> Yep. But, it has all it needs with a LID, right? No messages need to be
> exchanged, for instance.

It's more than a LID and messages do need to be exchanged (mini SM ->
SMA) to walk the port from INIT to ACTIVE. This needs to be coordinated
on both sides of the link so they move in rough concert.

> >> In a previous effort, we simply stole the low octet of an IP address, so we'd
> >> "ifconfig ib0 1.2.3.X" and it would jam lid=X into the interface. 
> >Worked great.
> >> If necessary, we would set a manual arp entry (using iproute) to avoid having
> >> to broadcast.
> >
> >That could be done if that is what is desired and can be relied upon
> >(that ib0 is configured and we only care about the first port).
> >
> >Is it just ARP support that is needed ?
> 
> Well, ARP is the precursor to establishing an IP send and a TCP connection,
> which we need to do also.

I was just asking about other broadcast/multicast needs. Sounds like
this is not the case.

>  But, if the resulting ipaddr-hwaddr mapping is
> installed, then ARP is unnecessary and the IP layer can send without using it.
> 
> When we did this before, we'd install a "permanent" ARP entry, in a two-line
> shell script. Roughly, for peers configuring lids X and Y, it would do
> 
> peer X:
> 	ifconfig ib0 1.2.3.X
> 	ip neigh add 1.2.3.Y nud permanent lladdr a.b.c.d.e.f....Y (i.e. Y's guid)
> 
> peer Y:
> 	ifconfig ib0 1.2.3.Y
> 	ip neigh add 1.2.3.X nud permanent lladdr a.b.c.d.e.f....X
> 
> And we'd be up and running for both IP and RDMA connections. We fixed a
> bug in the old iproute2 command to allow the long IB link addresses.
> 
> I'm thinking that using IPOIB to drive this kind of manual setup is one way
> to approach it. It certainly would be simple, and worked for us before there
> was an OFA stack.

This would still work.

> Maybe I'm getting ahead of myself though, still wondering if there's a way
> to do it with what we have.

The closest thing is OpenSM run once mode but I think you've been
describing a b2b mini SM command which wouldn't be hard to implement.

-- Hal

> Tom.
> 
> >
> >> >> We have done this on other IB implementations by manually assigning
> >> >> LIDs, but I discover that the "lid" entry below 
> >> >/sys/class/infiniband/<device>
> >> >> is not writable, at least for mthca.
> >> >
> >> >This can be done via MADs so user_mad kernel module would be needed to
> >> >do this.
> >> 
> >> Okay, all kernel modules can be assumed to be in place. How do we tell it
> >> to manage the LID, with a shell command?
> >
> >A new "command" would be needed.
> >
> >-- Hal
> >
> >> >> Also, I expect that the ipoib driver will
> >> >> be unable to join the broadcast group, so will be unwilling to 
> >come up fully.
> >> >
> >> >Is IPoIB a requirement ?
> >> 
> >> I think so, for two reasons. One, principle of least surprise - the user will
> >> expect to be able to ping, telnet etc if it has connectivity. Two, 
> >for NFS/RDMA
> >> we require TCP and UDP connections in order to perform the mount and do
> >> locking and recovery. We could do those over a parallel ethernet connection,
> >> but that's kind of not the point.
> >> 
> >> >
> >> >> With ethernet, and maybe iWARP, just a simple ifconfig can do this. So why
> >> >> not IB?
> >> >
> >> >The simple answer is that it is the nature of IB management (being
> >> >different than ethernet).
> >> 
> >> Which, IMO, we need to boil down to simplest-possible, for at least some
> >> workable configuration.
> >> 
> >> Thanks for the ideas!
> >> 
> >> Tom.
> >> 
> >> >
> >> >-- Hal
> >> >
> >> >> If you're wondering, my goal is give NFS/RDMA users a way to avoid having
> >> >> to install the many userspace modules needed to do this, including 
> >> >libibverbs,
> >> >> opensm, etc. There's a lot to get wrong, and things go missing. Seeking an
> >> >> "easy" way to get started with just the kernel and some shell commands.
> >> >> 
> >> >> Tom.
> >> >> 
> >> >> _______________________________________________
> >> >> general mailing list
> >> >> general at lists.openfabrics.org
> >> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >> >> 
> >> >> To unsubscribe, please visit 
> >> >http://openib.org/mailman/listinfo/openib-general
> >> 
> >
> >_______________________________________________
> >general mailing list
> >general at lists.openfabrics.org
> >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 




More information about the general mailing list