[openib-general] ehca error message translation request..

Christoph Raisch RAISCH at de.ibm.com
Fri Mar 24 06:25:42 PST 2006


Troy,
what you see here is some of our error capturing to figure out what could
be going on on the hypervisor interface in case there's an error...
see more comments below.

There's a table with hypervisor call error codes in ehca_kernel.h (will be
moved to base kernel)

Gruss / Regards . . . Christoph Raisch
openib-general-bounces at openib.org wrote on 24.03.2006 02:09:39:

>
> Can someone please translate? babelfish doesn't talk ibmese..
>
> [8270280.043608] eHCA Infiniband Device Driver (Rel.: SVNEHCA_0002)
> [8297399.067840] PU0002 000e0139:ehca_hcall_7arg_7ret HCAD_ERROR
> opcode=168 ret=ffffffffffffffd3 arg1=1000000103000004
> arg2=2000000000000009 arg3=ac0000000000000 arg4=7c46000 arg5=0 arg6=0
> arg7=0 out1=0 out2=0 out3=0 out4=0 out5=0 out6=800000000005aa18 out7=0
> [8297399.067914] PU0002 000b04a7:internal_modify_qp HCAD_ERROR
> hipz_h_modify_qp() failed rc=ffffffffffffffd3 ehca_qp=c0000001dae4ec80
> qp_num=9

did you see a "ehca0: port 1 is active" shortly before or after that?
ffd3=-45=H_R_STATE opcode=168=H_MODIFY_QP
queue was in a state where the requested state change was not allowed.
There should be a comment that the port is down in one of the next lines
after that trace


> [8297447.131758] eHCA Infiniband Device Driver (Rel.: SVNEHCA_0002)
> [8297454.299214] PU0002 00060100:parse_ec  ehca0: port 1 is active.
> [8297479.282491] PU0002 000e0139:ehca_hcall_7arg_7ret HCAD_ERROR
> opcode=160 ret=ffffffffffffffd4 arg1=1000000103000004 arg2=5
> arg3=1001dbb0 arg4=1 arg5=c0000000000000 arg6=7be03e0 arg7=0 out1=0
> out2=0 out3=0 out4=0 out5=0 out6=0 out7=0
> [8297479.282531] PU0002 00090443:ehca_reg_mr HCAD_ERROR  hipz_alloc_mr
> failed, rc=ffffffffffffffd4 hca_hndl=1000000103000004 mr_hndl=0
> [8297479.282561] PU0002 00090463:ehca_reg_mr <<< retcode=ffffffea
> shca=c0000003cbcad000 e_mr=c0000001dac7ee80 iova_start=000000001001dbb0
> size=1 acl=3 e_pd=c000000007be03e0 pginfo=c0000001d8287a90 num_pages=1
> [8297479.282595] PU0002 00090173:ehca_reg_user_mr <<<
> rc=ffffffffffffffea pd=c000000007be03e0 region=c0000000071e7aa8
> mr_access_flags=3 udata=c0000001d8287bb0

you try to register a 1 byte memory region here from userspace, by the way,
is this what you plan to do?
ffd4=-44 H_NOT_ENOUGH_RESOURCES
You shouldn't see this in normal operation.
Which firmware version do you have? You can either find out on the HMC or
by the entry screen of ASM.

> [8297610.812988] PU0007 000e0139:ehca_hcall_7arg_7ret HCAD_ERROR
> opcode=160 ret=ffffffffffffffd4 arg1=1000000103000004 arg2=5
> arg3=1001b000 arg4=1000 arg5=80000000000000 arg6=b178f420 arg7=0 out1=0
> out2=0 out3=0 out4=0 out5=0 out6=0 out7=0
> [8297610.813031] PU0007 00090443:ehca_reg_mr HCAD_ERROR  hipz_alloc_mr
> failed, rc=ffffffffffffffd4 hca_hndl=1000000103000004 mr_hndl=0
> [8297610.813061] PU0007 00090463:ehca_reg_mr <<< retcode=ffffffea
> shca=c0000003cbcad000 e_mr=c0000003af268080 iova_start=000000001001b000
> size=1000 acl=1 e_pd=c0000003b178f420 pginfo=c0000001db31ba90
> num_pages=1
> [8297610.813097] PU0007 00090173:ehca_reg_user_mr <<<
> rc=ffffffffffffffea pd=c0000003b178f420 region=c0000003cbe08d28
> mr_access_flags=1 udata=c0000001db31bbb0

another 4k byte MR...
ffd4=-44 H_NOT_ENOUGH_RESOURCES

> [8297633.828665] PU0007 000e0139:ehca_hcall_7arg_7ret HCAD_ERROR
> opcode=160 ret=ffffffffffffffd4 arg1=1000000103000004 arg2=5
> arg3=1001b000 arg4=1000 arg5=80000000000000 arg6=b178f3a0 arg7=0 out1=0
> out2=0 out3=0 out4=0 out5=0 out6=0 out7=0
> [8297633.828703] PU0007 00090443:ehca_reg_mr HCAD_ERROR  hipz_alloc_mr
> failed, rc=ffffffffffffffd4 hca_hndl=1000000103000004 mr_hndl=0
> [8297633.828733] PU0007 00090463:ehca_reg_mr <<< retcode=ffffffea
> shca=c0000003cbcad000 e_mr=c0000003af268a80 iova_start=000000001001b000
> size=1000 acl=1 e_pd=c0000003b178f3a0 pginfo=c0000001dac77a90
> num_pages=1
> [8297633.828768] PU0007 00090173:ehca_reg_user_mr <<<
> rc=ffffffffffffffea pd=c0000003b178f3a0 region=c0000003b2b38928
> mr_access_flags=1 udata=c0000001dac77bb0

another 4k byte MR...
ffd4=-44 H_NOT_ENOUGH_RESOURCES

> [8297638.644845] PU0007 000e0139:ehca_hcall_7arg_7ret HCAD_ERROR
> opcode=160 ret=ffffffffffffffd4 arg1=1000000103000004 arg2=5
> arg3=1001b000 arg4=1000 arg5=80000000000000 arg6=b178f3a0 arg7=0 out1=0
> out2=0 out3=0 out4=0 out5=0 out6=0 out7=0
> [8297638.644883] PU0007 00090443:ehca_reg_mr HCAD_ERROR  hipz_alloc_mr
> failed, rc=ffffffffffffffd4 hca_hndl=1000000103000004 mr_hndl=0
> [8297638.644912] PU0007 00090463:ehca_reg_mr <<< retcode=ffffffea
> shca=c0000003cbcad000 e_mr=c0000003af268a80 iova_start=000000001001b000
> size=1000 acl=1 e_pd=c0000003b178f3a0 pginfo=c0000001dac77a90
> num_pages=1
> [8297638.644947] PU0007 00090173:ehca_reg_user_mr <<<
> rc=ffffffffffffffea pd=c0000003b178f3a0 region=c0000003b2b38928
> mr_access_flags=1 udata=c0000001dac77bb0

another 4k byte MR...
ffd4=-44 H_NOT_ENOUGH_RESOURCES

> [8297641.252159] PU0007 000e0139:ehca_hcall_7arg_7ret HCAD_ERROR
> opcode=160 ret=ffffffffffffffd4 arg1=1000000103000004 arg2=5
> arg3=1001b000 arg4=1000 arg5=80000000000000 arg6=b178f3a0 arg7=0 out1=0
> out2=0 out3=0 out4=0 out5=0 out6=0 out7=0
> [8297641.252197] PU0007 00090443:ehca_reg_mr HCAD_ERROR  hipz_alloc_mr
> failed, rc=ffffffffffffffd4 hca_hndl=1000000103000004 mr_hndl=0
> [8297641.252226] PU0007 00090463:ehca_reg_mr <<< retcode=ffffffea
> shca=c0000003cbcad000 e_mr=c0000003af268a80 iova_start=000000001001b000
> size=1000 acl=1 e_pd=c0000003b178f3a0 pginfo=c0000001dac77a90
> num_pages=1
> [8297641.252263] PU0007 00090173:ehca_reg_user_mr <<<
> rc=ffffffffffffffea pd=c0000003b178f3a0 region=c0000003b2b38928
> mr_access_flags=1 udata=c0000001dac77bb0

another 4k byte MR...
ffd4=-44 H_NOT_ENOUGH_RESOURCES


> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list