[ofa-general] Problems using OFED 1.4 on largesmp nodes

Liang Zhen Zhen.Liang at Sun.COM
Thu May 21 05:03:12 PDT 2009


Tziporet,

I get two x4600 and think they are same, but on the one failed to
startup when I run mstflint:
mstflint -d 03:00.0 q
Warning: memory access to device 03:00.0 failed: Input/output error.
Warning: Fallback on IO: much slower, and unsafe if device in use.
*** ERROR *** Can not open 03:00.0: Not a directory MFE_CR_ERROR

On the other one (which load driver without error):

mstflint -d 03:00.0 q
Image type: Failsafe
I.S. Version: 1
Chip Revision: A0
Description: Node Port1 Port2 Sys image
GUIDs: 00066a0098006abd 00066a00a0006abd 00066a01a0006abd 00066a0098006abd
Board ID: j (MT_00A0000001)
VSD: j
PSID: MT_00A0000001

mstflint -d 03:00.0 v

Failsafe image:

Invariant /0x00000028-0x0000095f (0x000938)/ (BOOT2) - OK

Primary Image /0x00010000-0x00010107 (0x000108)/ (Pointer Sector)- OK
/0x00030028-0x000308af (0x000888)/ (BOOT2) - OK
/0x000308b0-0x00034feb (0x00473c)/ (BOOT2) - OK
/0x00034fec-0x00035edb (0x000ef0)/ (Configuration) - OK
/0x00035edc-0x00035f0f (0x000034)/ (GUID) - OK
/0x00035f10-0x0003ed63 (0x008e54)/ (DDR) - OK
/0x0003ed64-0x0004d63b (0x00e8d8)/ (DDR) - OK
/0x0004d63c-0x00050573 (0x002f38)/ (DDR) - OK
/0x00050574-0x0005204f (0x001adc)/ (DDR) - OK
/0x00052050-0x0006accf (0x018c80)/ (DDR) - OK
/0x0006acd0-0x0007f23f (0x014570)/ (DDR) - OK
/0x0007f240-0x0007f253 (0x000014)/ (Configuration) - OK
/0x0007f254-0x0007f297 (0x000044)/ (Jump addresses) - OK
/0x0007f298-0x0007f33f (0x0000a8)/ (FW Configuration) - OK

Secondary Image /0x00020000-0x00020107 (0x000108)/ (Pointer Sector)- OK
/0x00080028-0x000808af (0x000888)/ (BOOT2) - OK
/0x000808b0-0x00084feb (0x00473c)/ (BOOT2) - OK
/0x00084fec-0x00085edb (0x000ef0)/ (Configuration) - OK
/0x00085edc-0x00085f0f (0x000034)/ (GUID) - OK
/0x00085f10-0x0008ed63 (0x008e54)/ (DDR) - OK
/0x0008ed64-0x0009d63b (0x00e8d8)/ (DDR) - OK
/0x0009d63c-0x000a0573 (0x002f38)/ (DDR) - OK
/0x000a0574-0x000a204f (0x001adc)/ (DDR) - OK
/0x000a2050-0x000baccf (0x018c80)/ (DDR) - OK
/0x000bacd0-0x000cf23f (0x014570)/ (DDR) - OK
/0x000cf240-0x000cf253 (0x000014)/ (Configuration) - OK
/0x000cf254-0x000cf297 (0x000044)/ (Jump addresses) - OK
/0x000cf298-0x000cf33f (0x0000a8)/ (FW Configuration) - OK

FW image verification succeeded. Image is bootable.


Thanks
Liang

Tziporet Koren wrote:
> Liang Zhen wrote:
>   
>> Hi Ole,
>> Have you got solution for this? I think we got exactly same problem on
>> 4600 with ofed-1.4.1-rc4:
>> lspci output:
>> 03:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe
>> 2.0 2.5GT/s] (rev a0)
>>
>> and error messages from dmesg:
>>
>> mlx4_core: Mellanox ConnectX core driver v1.0 (April 4, 2008)
>> mlx4_core: Initializing 0000:03:00.0
>> mlx4_core 0000:03:00.0: Requested number of MACs is too much for port 1,
>> reducing to 1.
>> mlx4_core 0000:03:00.0: command 0x13 failed: fw status = 0x1
>> mlx4_core 0000:03:00.0: SW2HW_EQ failed (-5)
>> mlx4_core 0000:03:00.0: Failed to initialize event queue table, aborting.
>> mlx4_core: probe of 0000:03:00.0 failed with error -5
>>
>>   
>>     
> Can you send me the FW version and board type
> Since the driver is not loading you can use mstflint to get this data
> Please use:
>
> The devices can be accessed by their PCI ID as displayed by lspci
> (bus:dev.fn).
> Example:
> # List all Mellanox devices
>   
>> /sbin/lspci -d 15b3:
>>     
> 02:00.0 Ethernet controller: Mellanox Technologies Unknown device 6368
> (rev a0)
>
> # Use mstflint tool to query the firmware on this device
>   
>> mstflint -d 02:00.0 q
>>     
>
> Tziporet
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>   




More information about the general mailing list