[openib-general] gen2 opensm

Hal Rosenstock halr at voltaire.com
Tue Apr 5 10:42:16 PDT 2005


On Tue, 2005-04-05 at 13:26, Roland Fehrenbacher wrote:
> Hi,
> 
> I have tried the kernel 2.6.11 drivers on an x86-64 machine with a
> MT23108 card. The driver loads ok after
> $ modprobe ib_mthca; modprobe ib_umad
> 
> Since I use devfs, I have to manually create
> 
> $ mknod /dev/infiniband/umad0 c 231 0
> $ mknod /dev/infiniband/umad1 c 231 1
> $ mknod /dev/infiniband/issm0 c 231 64
> $ mknod /dev/infiniband/issm1 c 231 65

What are the permissions on those ? Are they crw ?

> I get 
> 
> $ /usr/local/ib/bin/ibstat
> CA 'mthca0'
>         CA type: MT23108
>         Number of ports: 2
>         Firmware version: 3.2.0
>         Hardware version: a1
>         Node GUID: 0x000000008815bcaa
>         System image GUID: 0x000000008815bcaa
>         Port 1:
>                 State: Initializing
>                 Physical state: LinkUp
>                 Rate: 10
>                 Base lid: 0
>                 LMC: 0
>                 SM lid: 0
>                 Capability mask: 0x00500a68
>                 Port GUID: 0x0000000000000000
>         Port 2:
>                 State: Down
>                 Physical state: Polling
>                 Rate: 2
>                 Base lid: 0
>                 LMC: 0
>                 SM lid: 0
>                 Capability mask: 0x00500a68
>                 Port GUID: 0x0000000000000000
> 
> which already looks strange (GUID 0 ???). 

It looks like the port GUIDs are not set in NVRAM.

> Running opensm then doesn't activate the ports:
> 
> Apr 05 19:18:25 [4000] -> OpenSM Rev:openib-1.0.0
> Apr 05 19:18:25 [4000] -> osm_opensm_init: Forcing single threaded dispatcher.
> Apr 05 19:18:25 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0x0000000030f2ffff,0x0000000000000000
> Apr 05 19:18:25 [4000] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0x0000000030f2ffff,0x0000000000000000
> Apr 05 19:18:25 [4000] -> osm_vendor_get_all_port_attr: assign CA  0x7fffffffd010ort 1 guid (0x65babaa) as the default port.

I see a bug in this message. I will fix it. Please sync OpenSM to at
least version 2111 and rerun.

> Apr 05 19:18:25 [4000] -> osm_vendor_bind: Binding to port 0x225dabaa.
> Apr 05 19:18:25 [4000] -> osm_vendor_bind: Binding to port 0x8000000.

Two binds. This looks wrong to me.

> Apr 05 19:18:25 [2400A] -> umad_receiver: Failed to obtain request madw for received MAD(method=81 attr=11) -- dropping.

The vendor layer couldn't find the matching request to a response which
came in. This is pretty fishy but probably related to the port issue.

> What could have gone wrong?

I would start with setting the port GUIDs for this HCA and see if the
problem persists.

-- Hal

> 
> Roland
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



More information about the general mailing list