<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<div>

<div id="x_compose-container" itemscope="" itemtype="https://schema.org/EmailMessage" style="direction:ltr">

<span itemprop="creator" itemscope="" itemtype="https://schema.org/Organization"><span itemprop="name"></span></span>

<div>

<div style="direction:ltr">My bad I did not mean device I meant the low level messaging stack like PSM and the like </div>

<div><br>

</div>

<div style="direction:ltr">Anyway, my suggestion was to do this if possible at OFI level so that all middle layers reap the benefit. </div>

<div><br>

</div>

<div class="x_acompli_signature">--<br>

Daniel Faraj<br>

HPE Performance Engineering<br>

651.683.7605 Office<br>

<a dir="ltr" href="mailto:daniel.faraj@hpe.com">daniel.faraj@hpe.com</a></div>

</div>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Hefty, Sean <sean.hefty@intel.com><br>

<b>Sent:</b> Tuesday, June 6, 2017 6:36:15 PM<br>

<b>To:</b> Faraj, Daniel; Jeff Hammond<br>

<b>Cc:</b> ofiwg@lists.openfabrics.org; libfabric-users@lists.openfabrics.org; Baron, John<br>

<b>Subject:</b> RE: [libfabric-users] feature requests</font>

<div> </div>

</div>

</div>

<font size="2"><span style="font-size:10pt;">

<div class="PlainText">> If MPI or other middle layer is to implement the multirail, why bother<br>

> even with OFI: implement directly on the device and no need for extra<br>

> OFI overhead.<br>

<br>

Well, if MPI wants to write to device registers, I say go for it.  Maybe write to assembly too, to avoid the extra C overhead.  :)<br>

<br>

More seriously, I'd like to start by analyzing what it would take to add multi-rail support over reliable-datagram endpoints, with the assumption that this would provide similar performance to what MPI could do.  This has the added benefit that it could work

 with completely different networks, though I'm not sure if that's a requirement.<br>

<br>

I guess the first thing to figure out is how addressing works when multi-rail is in use.  Would we need some sort of super-address that's a union of the underlying fabric addresses?<br>

</div>

</span></font>

</body>

</html>