[Ofmfwg] Sunfish Redfish 2023 demonstration

Herrell, Russ W (Senior System Architect) russ.herrell at hpe.com
Tue Sep 26 08:59:21 PDT 2023


Inline

[hpesm_pri_grn_pos_rgb]

Russ  Herrell
Distinguished Technologist | Hewlett Packard Labs | System Architecture Lab
Hewlett Packard Enterprise | russ.herrell at hpe.com<mailto:russ.herrell at hpe.com> | Mobile: +1 970 420 1707
"The betterment of our society is not a job to be left to a few; it is a responsibility to be shared by all."  -- Dave Packard

From: CHRISTIAN PINTO <Christian.Pinto at ibm.com>
Sent: Tuesday, September 26, 2023 2:02 AM
To: Herrell, Russ W (Senior System Architect) <russ.herrell at hpe.com>; ofmfwg at lists.openfabrics.org
Cc: Aguilar, Michael J. <mjaguil at sandia.gov>; Cayton, Phil <phil.cayton at intel.com>; Doug Ledford <dledford at redhat.com>; Ahlvers, Richelle <richelle.ahlvers at intel.com>
Subject: Re: Sunfish Redfish 2023 demonstration

Thanks for the comments Russ,

Regarding the agents mockups we do have those that Michele built for the SDC demonstrator. They load a mockup CXL and NVMEoF fabric and can take a post and everything. SO when Sunfish identifies an agent it will redirect the post towards the said agent and it will get back an OK unless we do stupid stuff like re-creating an object with same id etc etc.
[>RWH:] That's very good news.  We need to make such modifications to the Sunfish autogen module so every /api_emulator/redfish/*_api.py file has this ability to check for three things when processing a client request for the handled object:
a) is there an agent aggregation Source associated with this specific object or the objects of the redfish Collection if the object doesn't yet exist? (this latter test can get pretty involved, we need to architect the Sunfish role in sorting this out)
b) does the client request require the associated agent be consulted as part of processing this request?
c) does this request impact other objects such that Sunfish must translate this request into additional requests for other objects?  (how responsible is Sunfish for tracking and modifying 'linked' objects?)

On the one system being part of both (multiple) fabrics you touch a good point that we have already discussed on our side. I see two possible directions that still have an unsolved issue, though.

  1.  We have an agent owning the system, let's say the CXL one. In that system then we also have a NIC with endpoint connected to the switch serving the NVMe fabric. The only thing that requires understanding is how in the first place is the CXL agent expressing a system with a component connected to something that is not in the in sight for the CXL agent (i.e., the port to which the NIC endpoint is connected to in the NVMe fabric)
[>RWH:] This is the case that I am suggesting for the mockups in the demo;  we would only know of host systems because the CXL agent and the NVMe fabric agent would list the FabricAdapters as subordinates of a 'placeholder system'.  I am proposing that both Agents will submit their inventory using the same Redfish IDs for the (host) system to which both the CXL switch and the NVMe fabric switch are attached.  IE, both the CXL agent and the NVMe fabric agent will declare their FabricAdapters are each hanging off the SAME ( /Systems/systemID_X ) system instance.   This merges those host systems and they bridge the two fabrics.  We just hardwire these common Redfish IDs in the mockups rather than architect and write the Sunfish code to do this by SC '23.

  1.  Have a third hardware agent that is only taking care of the systems, but then we have the above problem with both fabrics. As in, the system would be connected to two fabrics that do not exist prior to the registration with Sunfish.
[>RWH:] I agree for the demo we don't need a third agent that is aggregating the (host) systems, as we don't need those details to do our demo.

I believe that for this demo we can just assume (1) and "manually" resolve the missing link at registration time.
[>RWH:] I agree, but we don't need to resolve any missing links if we just use the same Redfish system names in the mockups of both Agents.  This would be the case if both Agents were using the same a priori 'grand plan'.  (Eventually, we need Sunfish to validate that objects that are reported to it by multiple agents are really the same objects.  But we don't want to tackle this by SC 23.)

Russ


We do have some example code and objects that I had prepared for SC22 that create a memorychunk out of a CXL device and attach it to to a system.
Let me also update the document with the creation and binding of the memory chunk.

I have created a private repository where we can collect all the demo material (including the updated document): https://github.com/OFMFWG/SC23-Material
All OFMFWG users have access to it. From now on I will post everything in there instead of attaching to emails.


Christian

Christian Pinto, Ph.D.
Research Scientist
IBM Research Europe - Ireland


From: Herrell, Russ W (Senior System Architect) <russ.herrell at hpe.com<mailto:russ.herrell at hpe.com>>
Date: Monday, 25 September 2023 at 19:07
To: CHRISTIAN PINTO <Christian.Pinto at ibm.com<mailto:Christian.Pinto at ibm.com>>, ofmfwg at lists.openfabrics.org<mailto:ofmfwg at lists.openfabrics.org> <ofmfwg at lists.openfabrics.org<mailto:ofmfwg at lists.openfabrics.org>>
Cc: Aguilar, Michael J. <mjaguil at sandia.gov<mailto:mjaguil at sandia.gov>>, Cayton, Phil <phil.cayton at intel.com<mailto:phil.cayton at intel.com>>, Doug Ledford <dledford at redhat.com<mailto:dledford at redhat.com>>, Ahlvers, Richelle <richelle.ahlvers at intel.com<mailto:richelle.ahlvers at intel.com>>
Subject: [EXTERNAL] RE: Sunfish Redfish 2023 demonstration
I agree with what I think is the purpose of the demo as described in the outline, which is to show: Sunfish aggregation of multiple Agent inventories (A CXL fabric Agent, and a Swordfish NVMe JBOD? Agent) The ability to query Sunfish to locate

I agree with what I think is the purpose of the demo as described in the outline, which is to show:

  1.  Sunfish aggregation of multiple Agent inventories (A CXL fabric Agent, and a Swordfish NVMe JBOD? Agent)
  2.  The ability to query Sunfish to locate composable resources (systems, storage and CXL FAM)
  3.  The ability to allocate MemoryChunks and storage volumes from the composable resources using Sunfish API (Redfish / Swordfish calls)
  4.  The ability to bind hosts (systems) to MemoryChunks and storage volumes via Redfish Connections

To do the above, we need to break out Step 3 into two steps, as I don't propose we start the demo with predefined MemoryChunks and Volumes:
               3.1)  Retrieve the MemoryDomains from the CXL fabric tree, create one or two MemoryChunks out of these 'memory pools', and do the same with a storage pool
               3.2)  Retrieve the list of Systems from the CXL fabric tree, create a connection between one and a new MemoryChunk or storage volume

If we wish to demonstrate binding a single host to a new storage volume and to a MemoryChunk, we are missing one more 'ability' in the Sunfish reference code:  We need the ability for Sunfish to notice that systems of the NVMeoF fabric and systems of the CXL fabric are the SAME systems.  I propose we hide the need to resolve the multiple names for the same host by just making the names the same from both Agents in the mockups. (If anyone asks, we just acknowledge that this reconciliation of multiple IDs is functionality which is required, but not ready for demonstration yet.)

So, we are missing the discovery of resource pools and the creation of explicit sub-sets of them in step 3.1 and the accompanying functionality in the code stacks.
We also do not have the correct mockups for the two Agents, which is another item that needs to be added to the 'missing' list.  Everything else looks good.

I suggest we work through the demo topology this Friday, and then create specific mockups that would be the proper models for the demo resources.  Once we have the demo topology fixed, we can talk through how the GUI can most easily display this inventory and enable the GUI user to manipulate the components to demo the capabilities we want to show off.


Thoughts?

Russ







From: CHRISTIAN PINTO <Christian.Pinto at ibm.com<mailto:Christian.Pinto at ibm.com>>
Sent: Monday, September 25, 2023 8:15 AM
To: ofmfwg at lists.openfabrics.org<mailto:ofmfwg at lists.openfabrics.org>
Cc: Aguilar, Michael J. <mjaguil at sandia.gov<mailto:mjaguil at sandia.gov>>; Cayton, Phil <phil.cayton at intel.com<mailto:phil.cayton at intel.com>>; Herrell, Russ W (Senior System Architect) <russ.herrell at hpe.com<mailto:russ.herrell at hpe.com>>; Doug Ledford <dledford at redhat.com<mailto:dledford at redhat.com>>; Ahlvers, Richelle <richelle.ahlvers at intel.com<mailto:richelle.ahlvers at intel.com>>
Subject: Sunfish Redfish 2023 demonstration

Hi All,

I have started working on a "script" for out demonstrator, mostly to identify what we have and what it is missing. What I have so far is attached to this email.
It appears the two main pieces we are missing are a GUI and the a rudimentary composition service. On Friday we should discuss who does what to make sure we arrive at SC with a demo.

Please, any comment or addition to the document are more than welcome.

Christian

Christian Pinto, Ph.D.
Research Scientist
IBM Research Europe - Ireland

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofmfwg/attachments/20230926/ed8ec6d0/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 412 bytes
Desc: image001.gif
URL: <http://lists.openfabrics.org/pipermail/ofmfwg/attachments/20230926/ed8ec6d0/attachment-0001.gif>


More information about the Ofmfwg mailing list