[ofa-general] RE: local QP operation error after long run

Thu Aug 30 07:04:08 PDT 2007

Here is the ibv_devinfo output:

$ ibv_devinfo                                                           
hca_id: mthca0
        fw_ver:                         4.7.400
        node_guid:                      0017:08ff:ffd0:efc0
        sys_image_guid:                 0017:08ff:ffd0:efc3
        vendor_id:                      0x1708
        vendor_part_id:                 25208
        hw_ver:                         0xA0
        board_id:                       HP_0050000001
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                2048 (4)
                        active_mtu:             512 (2)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 5
                        port_lid:               129
                        port_lmc:               0x00

--CQ

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il] 
> Sent: Thursday, August 30, 2007 8:39 AM
> To: Tang, Changqing
> Cc: Roland Dreier; Michael S. Tsirkin; general at lists.openfabrics.org
> Subject: Re: local QP operation error after long run
> 
> What hardware/firmware are you using?
> 
> Quoting Tang, Changqing <changquing.tang at hp.com>:
> Subject: local QP operation error after long run
> 
> 
> HI,
> 	I have an ISV application running for nearly three 
> hours, and then it has following error from libibverbs.so:
> 
> local QP operation err (QPN 440446, WQE @ 00000103, CQN 10008c, index
> 236192)
>   [ 0] 00440446
>   [ 4] 00000000
>   [ 8] 00000000
>   [ c] 00000000
>   [10] 026f0000
>   [14] 00000000
>   [18] 00000103
>   [1c] ff100000
> 
> local QP operation err (QPN 440442, WQE @ 00000103, CQN 10008c, index
> 236193)
>   [ 0] 00440442
>   [ 4] 00000000
>   [ 8] 00000000
>   [ c] 00000000
>   [10] 026f0000
>   [14] 00000000
>   [18] 00000103
>   [1c] ff100000 
> 
> Can you guys indicate what the possible reason is ? this is 
> an OFED 1.1 system. Could it be a memory corruption ?
> 
> Thanks
> --CQ, HP-MPI
> 
> 
> 
> > -----Original Message-----
> > From: general-bounces at lists.openfabrics.org
> > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Roland 
> > Dreier
> > Sent: Wednesday, August 29, 2007 9:50 PM
> > To: Sasha Khapyorsky
> > Cc: general at lists.openfabrics.org
> > Subject: Re: [ofa-general] ib_umad method mask problems on 
> big-endian 
> > 64-bitarchs
> > 
> >  > It looks that using uint32_t for addr in set_bit() function is 
> > sufficient  > fix. But for ppc64 this means that new OpenSM 
> will break 
> > with old  > kernels, probably we will need to put some ugly 
> #ifdef in  
> > > osm_vendor_ibumad.c...
> > 
> > Yes, that's a pain.  Another possibility is to declare that the 
> > declaration of the registration request should have been
> > 
> > 	long	method_mask[16 / sizeof (long)];
> > 
> > and just add a compat_ioctl method to the ib_umad module to 
> handle the 
> > broken case of 32-bit big endian userspace on a 64-bit kernel.
> > However that breaks 64-bit big endian userspace that 
> followed the old 
> > ib_user_mad.h file correctly so overall I'm leaning towards 
> the patch 
> > I already posted.
> > 
> > What do you think?
> > 
> >  - R.
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> > 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 
> --
> MST
>