[openib-general] IBM eHCA testing..
IBMEHCA DD
IBMEHCAD at de.ibm.com
Mon Oct 10 00:23:59 PDT 2005
This is caused by a complex interaction of ib_mad, hcad_mod and pSeries
firmware.
As you might already have noticed a eHCA doesn't show up as a "port" but
as a switch in the fabric.
Reason for this is partition support and virtualisation in Infininband.
If you want to give each partition in a system a "own" IB adapter, it has
to have its "own" LID(s) and therefore it's own GUIDs.
IB standard only allows one way currently how to accomplish this: You need
a switch and multiple adapters behind.
So that's exactly how the eHCA shows up in the fabric. In our case system
firmware handles the SMA traffic for that "switch" and for all "adapters"
(running an SMA or SM on QP0 is currently not supported).
This brings up another problem: you definetly won't want to allocate LIDs
for all "potentially possible" operating system partitions (not to confuse
with IB partitioning), otherwise you could come close to the 48000
LIDs/subnet limit pretty quickly. So you need some kind of signal from the
operating system to system firmware,
which in the eHCA case is the H_DEFINE_AQP1 triggered by ib_create_qp
with IB_QPT_GSI parameter.
AFTER that call handshaking between system firmware and the SM will start,
here's a new adapter active on a switch port... what's your guid? here's
your LID, p_key, SM lid....
...and after all that it's possible to send and receive packets from the
fabric.
The openib stack expects that a port is fully functional after this
create_qp returns, and starts to do all sorts of modify QP and post send.
So the only choice we have there is to delay create_qp until the complete
handshaking between system firmware and the SM has finished (until we see
a IB_PORT_ACTIVE in hcad_mod). If we don't see that until
EHCA_PORT_ACTIVE_TIMEOUT we have to return an error code to openib,
otherwise we're seriously in trouble (tried that).
Shirley already pointed out on the mailinglist, that ib_mad and others
have different recovery depending on the success of
ib_create_qp(IB_QPT_GSI), especially ib_mad decides it's the best thing to
kill the complete adapter if that call fails on a single port.
so that's the full explanation of ehca_nr_ports and hopefully answers your
question....
Troy Benjegerdes <hozer at hozed.org>
08.10.2005 04:03
To
Shirley Ma <xma at us.ibm.com>
cc
Pradeep Satyanarayana <pradeep at us.ibm.com>, Troy Benjegerdes
<hozer at hozed.org>, IBMEHCA DD/Germany/IBM at IBMDE,
openib-general at openib.org, openib-general-bounces at openib.org
Subject
Re: [openib-general] IBM eHCA testing..
On Fri, Oct 07, 2005 at 09:33:27AM -0700, Shirley Ma wrote:
> Hi, Troy,
>
> There is INSTALL file in the EHCA driver package.
> In OpenPower 720 port 1 is at the top, port 2 is at the bottom.
> In P570, port1 is at the bottom, port2 is at the top.
Okay, I guess I should read more carefully ;)
What is the issue with needing to use port 1? Can that be fixed in the
driver, or does that need a firmware update?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051010/7f8829fe/attachment.html>
More information about the general
mailing list