[openib-general] IBM eHCA testing..
Heiko J Schick
SCHICKHJ at de.ibm.com
Tue Oct 11 05:43:42 PDT 2005
Hello Troy,
this morning I've looked in detail into the problem you've reported on Oct
10 via the OpenIB mailing-list [1]. It seems that the kernel panic is an
IPoIB issues.
[1]: http://openib.org/pipermail/openib-general/2005-October/012353.html
The following things appens:
1. modprobe hcad_mod ehca_nr_ports=1
The eHCA InfiniBand Device Driver is loaded.
2. modprobe ib_mad
The ib_mad stack creates an AQP1. This will start the port
activation process.
By my count it will take more than 110 / 120 seconds to activate a
port.
Our device driver gets a timeout, which means that the port is NOT
active. and
ib_modify_qp will not work (for any QP, doesn't matter if it was
created in the ib_mad
stack or in the ib_ipoib stack).
3. modprobe ib_ipoib
All ressources for IPoIB are allocated (CQ, QPs, MR, etc.)
4. A user runs ifconfig ib0 xxx.xxx.xxx.xxx which executes the
following functions:
ipoib_open -> ipoib_ib_dev_open -> ipoib_qp_create. The user
should see the following
error message:
l2:/home/schickhj/ibt/linstack/ehca2/ehca2 # ifconfig ib0
192.168.8.8
SIOCSIFFLAGS: Invalid argument
5. The function ipoib_qp_create modifies the QP from Reset 2 Init 2
RTR 2 RTS.
If one of these three ib_modify_qp doesn't work, the IPoIB QP
(priv->qp) will be destroyed
(by the ipoib_qp_create error routine / out_fail) and priv->qp
will be NULL.
--> see /src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c
function ipoib_qp_create
6. A user runs (again) ifconfig ib0 xxx.xxx.xxx which executes
(again) the following functions:
ipoib_open -> ipoib_ib_dev_open -> ipoib_qp_create
7. ipoib_qp_create wants to modify the IPoIB QP (priv->qp) which is
NULL, because the
QP was destroy earlier in time by the error handling routine in
ipoib_qp_create (see 5.)
I think this error could also show up on Mellanox based IB cards when
ib_modify_qp failes in ipoib_qp_create.
In dmesg you should see:
(see 1.)
eHCA Infiniband Device Driver (Rel.: )
xics_enable_irq: irq=9029: ibm_int_on returned fffffffd
eHCA Infiniband Device Driver (Rel.: )
(see 2.)
PU0000 000b0078:ehca_define_sqp HCAD_ERROR Port 1 is not active.
PU0000 000b0387:ehca_create_qp HCAD_ERROR ehca_define_sqp() failed
rc=ffffffffffffffff
PU0000 000b03ae:ehca_create_qp <<< failed ret=ffffffea
ib_mad: Couldn't create ib_mad QP1
ib_mad: Couldn't open ehca0 port 1
PU0001 00060103:ehca_parse_ec EHCA port 1 is available.
PU0000 000b00bd:plpar_hcall_7arg_7ret HCAD_ERROR HCALL77_IN r3=168
r4=1001000503000004 r5=200100000000002c r6=8a40000000000000 3ed48000 r8=0
r9=0 r10=0
PU0000 000b00c4:plpar_hcall_7arg_7ret HCAD_ERROR HCALL77_OUT
r3=ffffffffffffffd3 r4=0 r5=0 r6=0 r7=4 r8=0 r9=800000000005aa18 r10=0
(see 4.)
PU0000 000b0564:internal_modify_qp HCAD_ERROR hipz_h_modify_qp() failed
rc=ffffffffffffffd3 ehca_qp=c000000003ba4e00 qp_num=2c
ib0: failed to modify QP to init, ret = -22
ib0: ipoib_qp_create returned -22
Mit freundlichen Gruessen / Kind Regards
Heiko Joerg Schick
IBM Deutschland Entwicklung GmbH
I/Ox Microcode Development
Linux Infiniband Device Drivers
Schoenaicher Str. 220
71032 Boeblingen
E-Mail: schickhj at de.ibm.com
External: 49-7031-16-0 x4219, t/l: 120-4219
More information about the general
mailing list