[openib-general] Re: openib and mellanox hca problem

Michael S. Tsirkin mst at mellanox.co.il
Wed Feb 8 02:44:02 PST 2006


If you really suspect timing issues, you can always
increase timeouts: look for msleep in mthca_reset.c and try bumping up
the numbers.

Anyway - could you please enable mthca debug in menuconfig?
This would give us some more information on whats going on.


Quoting r. Ranjit Pandit <rpandit at silverstorm.com>:
> Subject: Re: openib and mellanox hca problem
> 
> Michael,
> 
> I have seen this problem before..
> See following mail thread
> 
> http://www.mail-archive.com/openib-general@openib.org/msg13861.html
> 
> Commenting out call to mthca_reset() in mthca_main.c worked around the
> problem on my system, and as far as I can tell, did not have any
> negative impact.
> 
> It will be good if someone reviews the reset path in mthca.
> 
> Ranjit
> 
> 
> On 2/7/06, Michael Di Domenico <mdidomenico at gmail.com> wrote:
> > I'm trying to build a system using the openib drivers with a mellanox
> > hca card.  I don't have much information about the card itself, it's
> > in a server right now...
> >
> > But I downloaded openib today from the svn source, installed it onto a
> > fresh copy of Fedora Core 4 with Kernel version 2.6.15.3...
> > Everything seemed to compile fine and install okay.  I've been
> > following the instructions from the wiki page thus far without a
> > problem.  I get upto this step
> >
> > modprobe ib_mthca
> >
> > and get the below error in /var/log/messages.  Strangely enough all
> > the modules load, and i do a udevstart, but i never get a
> > /dev/infiniband directory and /sys/class/infiniband directory is
> > empty.
> >
> > Does anyone know how i might fix this, or point me to some better
> > documentation then what is on the wiki?
> >
> > Thanks
> > - Michael
> >
> >
> > Feb  7 16:59:37 linux14-ts kernel: ib_mthca: Mellanox InfiniBand HCA
> > driver v0.06 (June 23, 2005)
> > Feb  7 16:59:37 linux14-ts kernel: ib_mthca: Initializing 0000:07:00.0
> > Feb  7 16:59:37 linux14-ts kernel: ACPI: PCI Interrupt 0000:07:00.0[?]
> > -> GSI 26 (level, low) -> IRQ 217
> > Feb  7 16:59:48 linux14-ts kernel: ib_mthca 0000:07:00.0: PCI device
> > did not come back after reset, aborting.
> > Feb  7 16:59:48 linux14-ts kernel: ib_mthca 0000:07:00.0: Failed to
> > reset HCA, aborting.
> > Feb  7 16:59:48 linux14-ts kernel: ACPI: PCI interrupt for device
> > 0000:07:00.0 disabled
> >
> >
> > --- lspci output
> > 06:03.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev ff)
> > (prog-if ff)
> >         !!! Unknown header type 7f
> >
> > 07:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev ff)
> > (prog-if ff)
> >         !!! Unknown header type 7f

-- 
Michael S. Tsirkin
Staff Engineer, Mellanox Technologies



More information about the general mailing list