[ofa-general] Mellanox Gen3, Linux and ibpanic - "Resource Temporarily unavailable"

Robert Dunkley Robert at saq.co.uk
Tue Nov 25 06:39:21 PST 2008


Hi Eric,

Thanks for the response. OpenSM is running and set to start on bootup on
MachineB:
ps aux | grep open
root      5616  0.0  0.1 142004  1396 ?        Sl   13:39   0:00
/usr/sbin/opensm -t 200 -f /var/log/opensm.log -g 0

The log on Machine B just logs this every 10 seconds:
Nov 25 14:34:21 148541 [477A7940] 0x01 ->
__osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal
OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING
Nov 25 14:34:31 153173 [477A7940] 0x80 -> SM port is down

Ibstat confirms port is in polling state on MachineB. MachineA however
is in a bad state, I tried the openibd restart command, it accepted the
command but after 5 minutes shows no progress of doing anything and is
just at the cursor. Is some sort of forced restart of openibd possible?

Thanks,

Rob


-----Original Message-----
From: Baur, Eric [mailto:Eric.Baur at gs.com] 
Sent: 25 November 2008 14:31
To: Robert Dunkley
Subject: RE: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource
Temporarily unavailable"

Robert-

Is OpenSM set to start on boot? 
		chkconfig --list | grep opensmd

If not: 	chkconfig opensmd on 
and: 		/etc/init.d/opensmd start

You can also restart openib without rebooting the machines.
		/etc/init.d/openibd restart

-Eric

-----Original Message-----
From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Robert
Dunkley
Sent: Tuesday, November 25, 2008 9:21 AM
To: general at lists.openfabrics.org
Subject: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource
Temporarily unavailable"

Hi everyone,

I'm using a setup of two machines (Lets call them A and B) directly
connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 Mellanox
PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED 1.3
installed, Machine B runs OpenSM. 

All was working fine. I shutdown Machine A did some maintenance and then
powered it on again, everything is OK again. I then shutdown Machine B
(The one running OpenSM), this seemed to really upset Machine A. After
booting Machine B again, Machine B looks OK with the port down and in
polling state. Machine A however gives the following error if I run
ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed:
(Resource temporarily unavailable)

I don't want to reboot Machine A as it must synch data with Machine B
over the Infiniband link first. Does anyone have any idea how to fix
machine A? 

Thanks,

Rob

The SAQ Group

Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ
SEMTEC Limited Trading as SAQ is Registered in England & Wales
Company Number: 06481952

 

http://www.saqnet.co.uk AS29219

SAQ Group Delivers high quality, honestly priced communication and I.T.
services to UK Business.

DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit :
Backups : Managed Networks : Remote Support.

Find us in http://www.thebestof.co.uk/petersfield

_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general



More information about the general mailing list