[ofa-general] RE: local QP operation error after long run
Tang, Changqing
changquing.tang at hp.com
Thu Aug 30 07:04:08 PDT 2007
Here is the ibv_devinfo output:
$ ibv_devinfo
hca_id: mthca0
fw_ver: 4.7.400
node_guid: 0017:08ff:ffd0:efc0
sys_image_guid: 0017:08ff:ffd0:efc3
vendor_id: 0x1708
vendor_part_id: 25208
hw_ver: 0xA0
board_id: HP_0050000001
phys_port_cnt: 2
port: 1
state: PORT_DOWN (1)
max_mtu: 2048 (4)
active_mtu: 512 (2)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
port: 2
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 5
port_lid: 129
port_lmc: 0x00
--CQ
> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il]
> Sent: Thursday, August 30, 2007 8:39 AM
> To: Tang, Changqing
> Cc: Roland Dreier; Michael S. Tsirkin; general at lists.openfabrics.org
> Subject: Re: local QP operation error after long run
>
> What hardware/firmware are you using?
>
> Quoting Tang, Changqing <changquing.tang at hp.com>:
> Subject: local QP operation error after long run
>
>
> HI,
> I have an ISV application running for nearly three
> hours, and then it has following error from libibverbs.so:
>
> local QP operation err (QPN 440446, WQE @ 00000103, CQN 10008c, index
> 236192)
> [ 0] 00440446
> [ 4] 00000000
> [ 8] 00000000
> [ c] 00000000
> [10] 026f0000
> [14] 00000000
> [18] 00000103
> [1c] ff100000
>
> local QP operation err (QPN 440442, WQE @ 00000103, CQN 10008c, index
> 236193)
> [ 0] 00440442
> [ 4] 00000000
> [ 8] 00000000
> [ c] 00000000
> [10] 026f0000
> [14] 00000000
> [18] 00000103
> [1c] ff100000
>
> Can you guys indicate what the possible reason is ? this is
> an OFED 1.1 system. Could it be a memory corruption ?
>
> Thanks
> --CQ, HP-MPI
>
>
>
> > -----Original Message-----
> > From: general-bounces at lists.openfabrics.org
> > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Roland
> > Dreier
> > Sent: Wednesday, August 29, 2007 9:50 PM
> > To: Sasha Khapyorsky
> > Cc: general at lists.openfabrics.org
> > Subject: Re: [ofa-general] ib_umad method mask problems on
> big-endian
> > 64-bitarchs
> >
> > > It looks that using uint32_t for addr in set_bit() function is
> > sufficient > fix. But for ppc64 this means that new OpenSM
> will break
> > with old > kernels, probably we will need to put some ugly
> #ifdef in
> > > osm_vendor_ibumad.c...
> >
> > Yes, that's a pain. Another possibility is to declare that the
> > declaration of the registration request should have been
> >
> > long method_mask[16 / sizeof (long)];
> >
> > and just add a compat_ioctl method to the ib_umad module to
> handle the
> > broken case of 32-bit big endian userspace on a 64-bit kernel.
> > However that breaks 64-bit big endian userspace that
> followed the old
> > ib_user_mad.h file correctly so overall I'm leaning towards
> the patch
> > I already posted.
> >
> > What do you think?
> >
> > - R.
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
> --
> MST
>
More information about the general
mailing list