<html>
<body>
<font size=3>At 11:17 PM 2/28/2005, Eric W. Biederman wrote:<br>
<blockquote type=cite class=cite cite="">"Yaron Haviv"
<yaronh@voltaire.com> writes:<br><br>
> > -----Original Message-----<br>
> > From: openib-general-bounces@openib.org
[<a href="mailto:openib-general" eudora="autourl">
mailto:openib-general</a>-<br>
> > bounces@openib.org] On Behalf Of Roland Dreier<br>
> > Sent: Monday, February 28, 2005 7:13 PM<br>
> > To: shaharf<br>
> > Cc: openib-general@openib.org<br>
> > Subject: Re: [openib-general] IB Address Translation
service<br>
> > <br>
> > This API seems overly complex and at the same time too
inflexible to<br>
> > me. However, rather than getting bogged down nitpicking
about APIs, I<br>
> > think we have to take a few steps back.<br>
> <br>
> I believe the API is very flexible, but we are pretty open to here
what<br>
> you think is needed in addition <br>
> <br>
> > First, let's understand the problem we're trying to
solve. Who are<br>
> > the consumers of this address translation service?<br>
> <br>
> The first problem is that most ULPs use valid IP addresses for<br>
> simplicity (DAPL, iSER, NFS/RDMA, SDP, MPI, etc') and someone needs
to<br>
> resolve it to an IB address and device to use IB. This should take
into<br>
> account cases where there are more than one HCAs in the system.<br>
> Preferable/optionally the ULP would like to know which partition to
use<br>
> if there is more than one, and leverage on the IP subnetting done
by<br>
> IPoIB.<br><br>
I am confused. In any sane network the translation is:<br>
Hostname -> address.<br><br>
IP because it spans multiple networks does:<br>
Hostname -> IP address -> hw address.<br><br>
IB because it can span multiple IB networks does:<br>
GUID+QPN -> LID + QPN.<br><br>
So what is wrong with simply doing:<br>
Hostname -> GUID<br>
???<br><br>
Then all the kernel needs to be passed GUID + QPN.<br><br>
I am certain MPI does not care about IP addresses. It is the
job<br>
of the mpi launcher to resolve where all of the pieces are.
Generally<br>
mpirun is done over IP and it just needs to collect the native
network<br>
addresses before it leaves.</font></blockquote><br>
That still does not eliminate the need to resolve some form of
address.<br><br>
<br>
<blockquote type=cite class=cite cite=""><font size=3>It would be brain
damaged for DAPL to require IP addresses. Not that<br>
DAPL hasn't shown some brain damage already.</font></blockquote><br>
I don't believe the IT API requires ATS. It is a bit more flexible
and matches better with applications I think.<br><br>
<br>
<blockquote type=cite class=cite cite=""><font size=3>Please, please
remember that IP addresses <br><br>
> It is possible to replicate the same code you have in SDP (which is
also<br>
> not complete) across all ULP's, I assume a better way is to provide
it<br>
> in one central place.<br><br>
How about not even worrying about it. It is an extra step that<br>
introduces latency and confusion. <br><br>
You can't do GUID -> IP because there is not a requirement on <br>
a 1 to 1 mapping. And in general there is no fixed IP -> GUID
mapping.<br><br>
What are the semantics in the upper levels when the IP -> GUID
mapping<br>
changes? Does you connection properly follow the IP to the new
GUID?</font></blockquote><br>
It should follow a new mapping if done right.<br><br>
<br>
<blockquote type=cite class=cite cite=""><font size=3>I don't see this
making sense anywhere except user space.<br><br>
> There are also two proposed address resolution mechanisms, one is
ARP<br>
> used by SDP, and one is ATS used by some DAPL consumers, and we
believe<br>
> it is better to combine them under the same API.<br><br>
Just FYI IPv6 doesn't use arp.</font></blockquote><br>
ND or ARP for this point is less an issue.<br><br>
<br>
<blockquote type=cite class=cite cite=""><font size=3>> The second
problem relates to mapping of IB GID to one or more Path<br>
> records<br>
> This is also something needed for ALL ULP's. today each ULP provides
the<br>
> minimal subset of path resolution functionality without taking
into<br>
> account topics such as partitioning, QoS, source routing and<br>
> multi-pathing.<br>
> Some of these require using special SA queries (such as SA
Multipath<br>
> Record query and QoSPath Query).<br>
> I don't think it make sense to put all this functionality into each
ULP<br>
> as well.<br><br>
That part is reasonable. Although the fact it is easy to knock<br>
OpenSM down concerns me. However that looks to be a separate<br>
problem.<br><br>
> Than we can also discuss, does it make sense to have each path<br>
> resolution call lead us to the sa, or does it make more sense to
cache<br>
> those paths.<br>
> And if we cache, doesn't it make more sense to cache/invalidate
the<br>
> routes to all ULP's rather implementing/having it in each ULP.<br>
> Also not sure how a 1000 node cluster functions without the
caching.<br>
> <br>
> And the last problem is related to reverse resolution from IB to
IP<br>
> addresses that is needed for DAPL, as well as for different
management<br>
> and diagnostic tools that want to know what is really that
node/port<br>
> behind that GID addresses.<br>
> <br>
> So how would you suggest to go about it ?<br>
> Duplicate all of that in each ULP ?<br>
> Refrain from implementing advanced routing, partitioning, QoS (we
cant<br>
> really maintain all that advanced code for each ULP) ? <br><br>
One small step at a time. Where each step is obviously
correct.<br><br>
One giant leap only works well for internal use. Not for
things<br>
that are heavily used.<br><br>
> Our idea is to provide those few helper functions that enable people
to<br>
> make full use of IB and its features without reading all the IB
spec,<br>
> and a Phd.<br>
> If you clear all the remarks from the library, you will see it is
very<br>
> slim, and for my understanding includes all the relevant input
and<br>
> output parameters for each of the 3 functions I mentioned.<br><br>
But an interface like that is usually provided by glibc not by the
kernel.<br>
At the mixing of levels in that proposed API is absolutely
horrible.<br><br>
<br>
Eric<br>
_______________________________________________<br>
openib-general mailing list<br>
openib-general@openib.org<br>
<a href="http://openib.org/mailman/listinfo/openib-general" eudora="autourl">
http://openib.org/mailman/listinfo/openib-general</a><br><br>
To unsubscribe, please visit
<a href="http://openib.org/mailman/listinfo/openib-general" eudora="autourl">
http://openib.org/mailman/listinfo/openib-general</a>
</font></blockquote></body>
</html>