[ofiwg] DS/DA Runtime Model Discussion

Smith, Stan stan.smith at intel.com
Fri Feb 12 08:52:38 PST 2016


Hi Doug,
  I may have misled you in believing that clients of libfabric and/or Kfabric are responsible for transport locking issues, they are 'not'.

Libfabric/kFabric providers 'are' responsible for access serialization to hardware.

s.

-----Original Message-----
From: ofiwg-bounces at lists.openfabrics.org [mailto:ofiwg-bounces at lists.openfabrics.org] On Behalf Of Oucharek, Doug S
Sent: Wednesday, February 10, 2016 3:37 PM
To: Paul Grun <grun at cray.com>
Cc: ofiwg at lists.openfabrics.org
Subject: [ofiwg] DS/DA Runtime Model Discussion

This email is a followup to my comment in a previous DS/DA call about the runtime model being an important part of the DS/DA definition.

MPI seems to be the dominate user of fabrics in HPC.  As such, they have a huge impact on the design of the runtime model being followed by fabric developers and corresponding middleware (what I consider OFED/verbs, libfabrics, and DS/DA).  Currently, they seems to be pushing for bare metal access from the providers leaving the work of serialization/locking to the middleware or the applications themselves.

If DS/DA follows libfabrics in its development, I am concerned that the bare metal mindset will dominate here as well and that will leave “application anarchy” with regards to how serialization/locking is being done.  Mitigating the strategy of fabric users is something I would expect from the providers (the one common access point regardless of middleware).  The MPI push was to get this common point to back off and leave serialization/locking to the upper layers but we now do not have a common point to coordinate competing access to the fabric.

Should it not be a part of the middleware (libfabrics and DS/DA) to at the very least, put demands upon the providers so a common strategy for serialization/locking can be enforced for a specific fabric so the apps, like Lustre, don’t have to make significant code changes to get reasonable performance out of the fabric?  If we have to make significant changes for each new fabric released, the value of the middleware (be it OFED, libfabrics, or DS/DA) is severely diminished and we might as well just access the fabric drivers directly.

Discussion?  

Doug
_______________________________________________
ofiwg mailing list
ofiwg at lists.openfabrics.org
http://lists.openfabrics.org/mailman/listinfo/ofiwg


More information about the ofiwg mailing list