[ofw][patch][ND provider] Improving latency of ms-mpi

Tzachi Dar tzachid at mellanox.co.il
Wed Aug 12 04:40:23 PDT 2009


I guess that we all have good points here, but I believe we should be
moving forward:

Since an important goal of this project is reaching low latency, I don't
see how we can not do inline send at all.

I agree that changing the api might break existing applications but:
        1) I don't think there is really a big number of applications
currently.
        2) The "break" actually means that inline will be used on a
place where it is not supposed to. This might mean "sub-optimal"
performance, but I believe that all applications will continue to work.
(We need to make sure that if the number used is too big, we will use a
good default (maybe 0).
        3) Giving applications control over the inline size (that is
using the parameter as in/out) will be the best technical solution going
forward.

Our suggestion is this: add a new environment variable. If that variable
doesn't exist the parameter is out only.
If it is set, the parameter is in/out. This will promise that older
applications still work as well. People who write according to the new
api will set this environment variable. Fab, can you make sure that the
new nd version will allow this parameter to be in/out?

Controlling inline or not will still be done using an environment
variable (as Leonid sent).

Does this makes sense?

Thanks
Tzachi





> -----Original Message-----
> From: Sean Hefty [mailto:sean.hefty at intel.com] 
> Sent: Wednesday, August 12, 2009 2:03 AM
> To: Leonid Keller; Fab Tillier; Tzachi Dar
> Cc: ofw at lists.openfabrics.org
> Subject: RE: [ofw][patch][ND provider] Improving latency of ms-mpi
> 
> My point was that NetworkDirect is a published API whose 
> definition is owned entirely by Microsoft.  All 
> implementations MUST adhere to the published ND API 
> specification, or they are not compliant.  It's not about 
> legality, who maintains a specific ND implementation, or 
> whether a specific change is considered an improvement over 
> what's there.
> 
> We are free to change IBAL, WinVerbs, or other APIs because 
> we own them.  The only cost of doing so is breaking existing 
> applications.  But for ND, our choice is to be compliant or not.
> 
> The way I've gone about requesting changes to ND is to send 
> comments to MS using the links at the bottom of the ND 
> documentation.  I expect that these messages get routed 
> directly to Fab, who rolls his eyes before hitting the delete key.
> :)
> 
> 
> >This change adds a new facility.
> >Tzachi and me have brought the reasons.
> >So it's good from engineer point view.
> >I don't know whether we may do it from legal one.
> >But we used to change Ibal API and ND sits in the same Open 
> Source Tree 
> >with IBAL.
> >Who is he maintainer of ND provider code ?
> >I thought it's Fab.
> >Fab, do you approve this API change ?
> >
> >
> >> -----Original Message-----
> >> From: Sean Hefty [mailto:sean.hefty at intel.com]
> >> Sent: Monday, August 10, 2009 9:05 PM
> >> To: Leonid Keller; Fab Tillier; Tzachi Dar
> >> Cc: ofw at lists.openfabrics.org
> >> Subject: RE: [ofw][patch][ND provider] Improving latency of ms-mpi
> >>
> >> >This latter parameter should be IN OUT, because the 
> driver takes its 
> >> >value as a hint.
> >> >It really re-calculates it, trying to maximize in the 
> limits of WQE 
> >> >size.
> >>
> >> Isn't this a MS defined API?  I don't believe we can change the 
> >> NDEndpoint APIs at all.
> >>
> >> - Sean
> >>
> >>
> 
> 



More information about the ofw mailing list