[ofa-general] MPI IB Errors

John Leidel john.leidel at gmail.com
Thu Aug 23 10:41:34 PDT 2007


I found the error in our machine.  We had an intermittent connection in one
node's HCA card.  I just happened to have looked at that  node when the HCA
was not found in `lscpi` or `proc`.  I reset the card on its bus and
kaboom... success.  Thanks everyone for all your help.

On 8/23/07, John Leidel <john.leidel at gmail.com> wrote:
>
> Whats especially odd is that I can get a full bandwidth ping pong test
> running fine [970MB/s++], then rerun the test and have it fail saying it
> can't find the IB HCA.
>
>
> On 8/23/07, Tziporet Koren <tziporet at mellanox.co.il> wrote:
> >
> >  John Leidel wrote:
> >
> > > Unfortunately, the RDMA module load didn't help... a simple
> > "hello_world" application still returns ::
> > >
> > >  libibverbs: Fatal: no infiniband class devices found.
> > >  No IB device found
> > >
> > >  I went and verified that all the nodes see the HCAs... an lspci on
> > all nodes reports ::
> > >
> > > 07:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex
> > (Tavor compatibility mode) (rev a0)
> > >        Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex
> > (Tavor compatibility mode)
> >
> > Can you run:
> > /etc/init.d/openibd restart
> > and send the /var/log/messages
> >
> > Thanks
> > Tziporet
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070823/4c1cccc9/attachment.html>


More information about the general mailing list