[Fwd: Re: [openib-general] kernel oops]
Hal Rosenstock
halr at voltaire.com
Fri Sep 2 14:04:42 PDT 2005
On Fri, 2005-09-02 at 16:59, Viswanath Krishnamurthy wrote:
> Here is the setup..
Thanks. A couple more questions:
> #svn info
> Path: .
>
> URL: https://openib.org/svn/gen2/trunk
> Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
> Revision: 3295
> Node Kind: directory
> Schedule: normal
> Last Changed Author: halr
> Last Changed Rev: 3295
> Last Changed Date: 2005-09-01 12:07:54 -0700 (Thu, 01 Sep 2005)
>
>
> Patch applied to core/at.c and kernel 2.6.13 recompiled.
>
>
> Machine A
> =========
> Running opensm
>
> Run ucmpost
>
> machine B
> =========
> ./ucmpost <ipaddr_of_machineA>
Are these back to back HCAs or is there a switch in between ?
> The problem is reproducible when you *cannot* ping each other
over IPoIB ?
> [root at subnetmgr4 ~]# ibv_devinfo
> hca_id: mthca0
> fw_ver: 1.0.1
> node_guid: 0002:c902:0040:0d00
> sys_image_guid: 0002:c902:0040:0d03
> max_mr_size: 0xffffffffffffffff
> page_size_cap: 0x0
> vendor_id: 0x02c9
> vendor_part_id: 25204
> hw_ver: 0x0
> phys_port_cnt: 1
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: invalid MTU (0) <
> What is this ??>
> active_mtu: invalid MTU (0)
If the program is right and those are the real values, somehow max_mtu
is trashed which causes active_mtu to be invalid which could break all
sorts of things...
> sm_lid: 1
> port_lid: 3
> port_lmc: 0x00
That's on the remote (from the SM) machine.
-- Hal
More information about the general
mailing list