[openib-general] IBM eHCA testing..

IBMEHCA DD IBMEHCAD at de.ibm.com
Mon Oct 10 00:23:59 PDT 2005


This is caused by a complex interaction of ib_mad, hcad_mod and pSeries 
firmware.

As you might already have noticed a eHCA doesn't show up as a "port" but 
as a switch in the fabric.
Reason for this is partition support and virtualisation in Infininband.

If you want to give each partition in a system a "own" IB adapter, it has 
to have its "own" LID(s) and therefore it's own GUIDs.
IB standard only allows one way currently how to accomplish this: You need 
a switch and multiple adapters behind.
So that's exactly how the eHCA shows up in the fabric. In our case system 
firmware handles the SMA traffic for that "switch" and for all "adapters" 
(running an SMA or SM on QP0 is currently not supported).

This brings up another problem: you definetly won't want to allocate LIDs 
for all "potentially possible" operating system partitions (not to confuse 
with IB partitioning), otherwise you could come close to the 48000 
LIDs/subnet limit pretty quickly. So you need some kind of signal from the 
operating system to system firmware,
which in the eHCA case is the H_DEFINE_AQP1 triggered  by ib_create_qp 
with IB_QPT_GSI parameter.
AFTER that call handshaking between system firmware and the SM will start, 
here's a new adapter active on a switch port... what's your guid? here's 
your LID, p_key, SM lid....
...and after all that it's possible to send and receive packets from the 
fabric.
The openib stack expects that a port is fully functional after this 
create_qp returns, and starts to do all sorts of modify QP and post send.
So the only choice we have there is to delay create_qp until the complete 
handshaking between system firmware and the SM has finished (until we see 
a IB_PORT_ACTIVE in hcad_mod). If we don't see that until 
EHCA_PORT_ACTIVE_TIMEOUT we have to return an error code to openib, 
otherwise we're seriously in trouble (tried that).

Shirley already pointed out on the mailinglist, that ib_mad and others 
have different recovery depending on the success of 
ib_create_qp(IB_QPT_GSI), especially ib_mad decides it's the best thing to 
kill the complete adapter if that call fails on a single port.

so that's the full explanation of ehca_nr_ports and hopefully answers your 
question....





Troy Benjegerdes <hozer at hozed.org> 
08.10.2005 04:03

To
Shirley Ma <xma at us.ibm.com>
cc
Pradeep Satyanarayana <pradeep at us.ibm.com>, Troy Benjegerdes 
<hozer at hozed.org>, IBMEHCA DD/Germany/IBM at IBMDE, 
openib-general at openib.org, openib-general-bounces at openib.org
Subject
Re: [openib-general] IBM eHCA testing..






On Fri, Oct 07, 2005 at 09:33:27AM -0700, Shirley Ma wrote:
> Hi, Troy,
>
> There is INSTALL file in the EHCA driver package.
> In OpenPower 720 port 1 is at the top, port 2 is at the bottom.
> In P570, port1 is at the bottom, port2 is at the top.

Okay, I guess I should read more carefully ;)

What is the issue with needing to use port 1? Can that be fixed in the
driver, or does that need a firmware update?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20051010/7f8829fe/attachment.html>


More information about the general mailing list