[openib-general] Problem with 2.4.24 and gen1

Ken MacInnis kcm at psc.edu
Mon Nov 1 09:40:15 PST 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

ACPI was already not in the kernel.  Appending 'noapic disableapic' did
work to load the Tavor code. :)  Thanks for the hint!

However, now OpenSM is still misbehaving:

- -------------------------------------------------
OpenSM Rev:B1-rc1
Command Line Arguments:
~ Log File: /tmp/osm.log
- -------------------------------------------------

Error from osm_opensm_init (1)

Error from osm_opensm_bind (0x2A)



[1099330621:000868906][4000] -> OpenSM Rev:B1-rc1
[1099330621:000868958][4000] -> osm_opensm_init: Forcing single threaded
dispatcher.
[1099330621:000869383][4000] -> osm_report_notice: Received Generic
Notice type:3 num:66 from LID:0x
0000 GUID:0xfe80000000000000,0x0000000000000000
[1099330621:000869402][4000] -> osm_report_notice: Received Generic
Notice type:3 num:66 from LID:0x
0000 GUID:0xfe80000000000000,0x0000000000000000
[1099330621:000869445][4000] -> __osm_vendor_get_ca_ids: ERR 3D09: No
available channel adapters.
[1099330621:000869456][4000] -> osm_vendor_get_all_port_attr: ERR 3D13:
Fail to get CA Ids .
[1099330621:000869484][4000] -> __osm_vendor_get_ca_ids: ERR 3D11: : Bad
parameter in calling: EVAPI
_list_hcas.
[1099330621:000869493][4000] -> osm_vendor_get_guid_ca_and_port: ERR
3D16: Fail to get CA Ids .
[1099330621:000869503][4000] -> osm_vendor_bind: ERR 5005: Fail to find
port number of port guid:0x0
000000000000000
[1099330621:000869515][4000] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor
specific bind() failed.
[1099330621:000869526][4000] -> osm_sm_bind: ERR 2E10: SM MAD Controller
bind() failed (IB_ERROR).


Any ideas on this?  I did make very sure to check that userland and
opensm was in sync with the kernel bits I'm using.  The 0s in the LID
and GUID are concerning me.

I may end up trying the newer OpenIB stack for fun (ha), and see if that
works better.

Ken

Tziporet Koren wrote:
| Hi,
|
| The problem is that the driver does not get the interrupt for the command
| completion,
| and thus you get the error: "Command not completed after timeout".
|
| It is related to the OS & system you are using. What is the
distribution you
| are using? We once saw such problems with older versions of SuSE.
|
| Try to add append="acpi=off" to the lilo you are using or add also
| disableapic in the same append line.

| -----Original Message-----
| From: Ken MacInnis [mailto:kcm at psc.edu]
| Sent: Sunday, October 31, 2004 8:20 PM
| To: openib-general at openib.org
| Subject: [openib-general] Problem with 2.4.24 and gen1

| I've got a fairly modified kernel here I'm trying to get a OpenIB stack
| running on.  It's a vanilla 2.4.24 kernel with Lustre and other patches
| in it, but I'm seeing this when I modprobe ib_tavor:
|
| Oct 31 13:13:05 samwise kernel:  THH(1): cmdif.c[1190]: Command not
| completed after timeout: cmd=TAV


- --
Ken MacInnis - Systems Engineer, PSC - http://www.psc.edu/~kcm/
kcm at psc dot edu - +1 412 268 9833 (w) - +1 412 268 5832 (f)
Pittsburgh Supercomputing Center - 4400 Fifth Ave - Pittsburgh, PA 15213
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFBhnT/nT0C17PQhv4RAicqAJ9hRiudNE1Bfof+BDrG09XfA5jD/wCcDH/D
UT/E1V7i0yO6pPPOx9oobNQ=
=R5wl
-----END PGP SIGNATURE-----



More information about the general mailing list