[openib-general] IBM eHCA testing..

Heiko J Schick SCHICKHJ at de.ibm.com
Tue Oct 11 05:43:42 PDT 2005


Hello Troy,

this morning I've looked in detail into the problem you've reported on Oct 
10 via the OpenIB mailing-list [1]. It seems that the kernel panic is an 
IPoIB issues.

[1]:  http://openib.org/pipermail/openib-general/2005-October/012353.html

The following things appens:

1.      modprobe hcad_mod ehca_nr_ports=1
        The eHCA InfiniBand Device Driver is loaded.

2.      modprobe ib_mad
        The ib_mad stack creates an AQP1. This will start the port 
activation process. 
        By my count it will take more than 110 / 120 seconds to activate a 
port. 
        Our device driver gets a timeout, which means that the port is NOT 
active. and
        ib_modify_qp will not work (for any QP, doesn't matter if it was 
created in the ib_mad 
        stack or in the ib_ipoib stack).

3.      modprobe ib_ipoib
        All ressources for IPoIB are allocated (CQ, QPs, MR, etc.)

4.      A user runs ifconfig ib0 xxx.xxx.xxx.xxx which executes the 
following functions:
        ipoib_open -> ipoib_ib_dev_open -> ipoib_qp_create. The user 
should see the following 
        error message:
 
        l2:/home/schickhj/ibt/linstack/ehca2/ehca2 # ifconfig ib0 
192.168.8.8
        SIOCSIFFLAGS: Invalid argument

5.      The function ipoib_qp_create modifies the QP from Reset 2 Init 2 
RTR 2 RTS.
        If one of these three ib_modify_qp doesn't work, the IPoIB QP 
(priv->qp) will be destroyed
        (by the ipoib_qp_create error routine / out_fail) and priv->qp 
will be NULL.
 
        --> see /src/linux-kernel/infiniband/ulp/ipoib/ipoib_verbs.c 
function ipoib_qp_create

6.      A user runs (again) ifconfig ib0 xxx.xxx.xxx which executes 
(again) the following functions:
        ipoib_open -> ipoib_ib_dev_open -> ipoib_qp_create

7.      ipoib_qp_create wants to modify the IPoIB QP (priv->qp) which is 
NULL, because the
        QP was destroy earlier in time by the error handling routine in 
ipoib_qp_create (see 5.)

I think this error could also show up on Mellanox based IB cards when 
ib_modify_qp failes in ipoib_qp_create.

In dmesg you should see:

(see 1.)
eHCA Infiniband Device Driver (Rel.: )
xics_enable_irq: irq=9029: ibm_int_on returned fffffffd
eHCA Infiniband Device Driver (Rel.: )

(see 2.)
PU0000 000b0078:ehca_define_sqp HCAD_ERROR  Port 1 is not active.
PU0000 000b0387:ehca_create_qp HCAD_ERROR  ehca_define_sqp() failed 
rc=ffffffffffffffff
PU0000 000b03ae:ehca_create_qp <<< failed ret=ffffffea
ib_mad: Couldn't create ib_mad QP1
ib_mad: Couldn't open ehca0 port 1
PU0001 00060103:ehca_parse_ec  EHCA port 1 is available.
PU0000 000b00bd:plpar_hcall_7arg_7ret HCAD_ERROR  HCALL77_IN r3=168 
r4=1001000503000004 r5=200100000000002c r6=8a40000000000000 3ed48000 r8=0 
r9=0 r10=0
PU0000 000b00c4:plpar_hcall_7arg_7ret HCAD_ERROR  HCALL77_OUT 
r3=ffffffffffffffd3 r4=0 r5=0 r6=0 r7=4 r8=0 r9=800000000005aa18 r10=0 

(see 4.)
PU0000 000b0564:internal_modify_qp HCAD_ERROR  hipz_h_modify_qp() failed 
rc=ffffffffffffffd3 ehca_qp=c000000003ba4e00 qp_num=2c
ib0: failed to modify QP to init, ret = -22
ib0: ipoib_qp_create returned -22

Mit freundlichen Gruessen / Kind Regards
Heiko Joerg Schick

IBM Deutschland Entwicklung GmbH
I/Ox Microcode Development
Linux Infiniband Device Drivers

Schoenaicher Str. 220
71032 Boeblingen
E-Mail: schickhj at de.ibm.com
External: 49-7031-16-0 x4219,   t/l: 120-4219




More information about the general mailing list