[openib-general] Re: Continue to experience problems in installing Gen2 on IA-32

Weikuan Yu yuw at cse.ohio-state.edu
Thu Aug 11 15:07:18 PDT 2005


Hi,

Thanks for your suggestions and help.

At the end of this email, I have included the output from our system 
when enabling CONFIG_INFINIBAND_MTHCA_DEBUG=y. Note that there are 
additional four lines of warning message during the initiation of the 
device. These are generated from init_port() function, due to the 
incorrect return status of a command to the firmware, INIT_IB.

We were suspicious of some of the INIT_IB flags or other parameters 
could have gone wrong, or have mismatches between our firmware and the 
gen2 code. So I went ahead and hacked on some of the INIT_IB 
parameters. At the end, it turns out that this patch could solve the 
problem on our system.

[yuw at p3 hw]$ svn diff mthca/
Index: mthca/mthca_qp.c
===================================================================
--- mthca/mthca_qp.c    (revision 2986)
+++ mthca/mthca_qp.c    (working copy)
@@ -575,7 +575,7 @@

         memset(&param, 0, sizeof param);

-       param.enable_1x = 1;
+       param.enable_1x = 0;
         param.enable_4x = 1;
         param.vl_cap    = dev->limits.vl_cap;
         param.mtu_cap   = dev->limits.mtu_cap;

So this suggests that the current code is trying to enable the device 
to do both 1x and 4x communication, which is not compatible with the 
firmware parameters we chose. Anyhow, this solves our problem. We are 
now running the gen2 code fine as tested with provided test programs, 
e.g., ibv_rc_pingpong. We will be happy to provide additional 
information if needed. BTW, we are using firmware 3.3.2 for tavor 
cards.

As always, your suggestions and help are greatly appreciated.

--Weikuan

+++++++++ dmesg output ++++++++++++++

ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005)
ib_mthca: Initializing  (0000:02:00.0)
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 26 (level, low) -> IRQ 185
ib_mthca 0000:02:00.0: Found bridge:  (0000:01:02.0)
ib_mthca 0000:02:00.0: FW version 000300030002, max commands 64
ib_mthca 0000:02:00.0: FW size 6143 KB (start bfa00000, end bfffffff)
ib_mthca 0000:02:00.0: HCA memory size 131071 KB (start b8000000, end 
bfffffff)
ib_mthca 0000:02:00.0: Max QPs: 16777216, reserved QPs: 1024, entry 
size: 256
ib_mthca 0000:02:00.0: Max SRQs: 1024, reserved SRQs: 16, entry size: 32
ib_mthca 0000:02:00.0: Max CQs: 16777216, reserved CQs: 128, entry 
size: 64
ib_mthca 0000:02:00.0: Max EQs: 64, reserved EQs: 1, entry size: 64
ib_mthca 0000:02:00.0: reserved MPTs: 16, reserved MTTs: 16
ib_mthca 0000:02:00.0: Max PDs: 16777216, reserved PDs: 0, reserved 
UARs: 1
ib_mthca 0000:02:00.0: Max QP/MCG: 16777216, reserved MGMs: 0
ib_mthca 0000:02:00.0: Flags: 00370347
ib_mthca 0000:02:00.0: profile[ 0]--10/20 @ 0x        b8000000 (size 0x 
4000000)
ib_mthca 0000:02:00.0: profile[ 1]-- 0/16 @ 0x        bc000000 (size 0x 
1000000)
ib_mthca 0000:02:00.0: profile[ 2]-- 7/18 @ 0x        bd000000 (size 0x 
  800000)
ib_mthca 0000:02:00.0: profile[ 3]-- 9/17 @ 0x        bd800000 (size 0x 
  800000)
ib_mthca 0000:02:00.0: profile[ 4]-- 3/16 @ 0x        be000000 (size 0x 
  400000)
ib_mthca 0000:02:00.0: profile[ 5]-- 4/16 @ 0x        be400000 (size 0x 
  200000)
ib_mthca 0000:02:00.0: profile[ 6]--12/15 @ 0x        be600000 (size 0x 
  100000)
ib_mthca 0000:02:00.0: profile[ 7]-- 8/13 @ 0x        be700000 (size 0x 
   80000)
ib_mthca 0000:02:00.0: profile[ 8]--11/11 @ 0x        be780000 (size 0x 
   10000)
ib_mthca 0000:02:00.0: profile[ 9]-- 2/10 @ 0x        be790000 (size 0x 
    8000)
ib_mthca 0000:02:00.0: profile[10]-- 6/ 5 @ 0x        be798000 (size 0x 
     800)
ib_mthca 0000:02:00.0: HCA memory: allocated 106082 KB/124928 KB (18846 
KB free)
ib_mthca 0000:02:00.0: Allocated EQ 1 with 65536 entries
ib_mthca 0000:02:00.0: Allocated EQ 2 with 128 entries
ib_mthca 0000:02:00.0: Allocated EQ 3 with 128 entries
ib_mthca 0000:02:00.0: Setting mask 00000000000f43fe for eqn 2
ib_mthca 0000:02:00.0: Setting mask 0000000000000400 for eqn 3
ib_mthca 0000:02:00.0: NOP command IRQ test passed
ib_mthca 0000:02:00.0: Command 09 completed with status 03
ib_mthca 0000:02:00.0: INIT_IB returned status 03.
ib_mthca 0000:02:00.0: Command 09 completed with status 03
ib_mthca 0000:02:00.0: INIT_IB returned status 03.


On Aug 11, 2005, at 3:32 PM, Dhabaleswar Panda wrote:

> Hal, Roland and James,
>
> Many thanks for your prompt replies!!
>
> We tried with the debug option. Thanks for this suggestion.
>
> It looks like one of the parameters (1X/4X) parameter for the card is
> not being set properly on the IA-32 system which is leading to the
> `disable' state for the card. By manually changing this parameter to
> 4X, one of the nodes is able to detect the card. We are trying this on
> other nodes. Not sure whether this is coming out because of the driver
> or the firmware in the card. We are looking into this further. One of
> my students will soon post all the details.
>
> Thanks again for all your help!!
>
> DK
>
>>     Dhabaleswar> Opetron systems and carry out experiments. There is
>>     Dhabaleswar> no problem. The problem is coming only for IA-32
>>     Dhabaleswar> systems. Even on EM64T systems, this problem comes
>>     Dhabaleswar> when operating it in IA-32 mode.
>>
>> Out of curiousity, do PCIe cards work with 32-bit kernels?
>>
>> As Hal said, please post the kernel log you get when loading drivers
>> built with CONFIG_INFINIBAND_MTHCA_DEBUG=y.
>>
>> Thanks,
>>   Roland
>>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
>




More information about the general mailing list