[openib-general] oops with ipath and mvapich (branch 1.0)

ralphc at pathscale.com ralphc at pathscale.com
Fri Mar 17 12:23:25 PST 2006


>
> On Fri, 17 Mar 2006, Bryan O'Sullivan wrote:
>
>> > In our own testing, we were unable to use the SRQ support via the
>> ipath
>> > driver. Perhaps someone from Pathscale could comment on the stability
>> of
>> > SRQ support?
>>
>> Our SRQ code should be stable; we've no customer-reported bugs open
>> against it.  If you run into specific problems, you should let us know,
>> as we can't fix things that people don't tell us about :-)
>>
>> Please give me some information about the problems you're seeing, and
>> I'll be happy to replicate and fix whatever is going wrong for you.
>
> Thanks. We will be sending a report later with the issues we are seeing.
>
> We're currently testing on the 1.0 Branch from the SVN, is this the
> version we should test or the one from the Pathscale website?
>
> Matthew Koop

I tried to reproduce the problem here.
I started with a FC4 system, downloaded a 2.6.15.6 kernel from kernel.org,
replaced drivers/infiniband by:
svn/gen2/tags/openib-1.0-rc1/src/linux-kernel/infiniband,
patched net/ipv4/fib_frontend.c to export ib_dev_find(),
make, install, reboot.

In svn/gen2/tags/openib-1.0-rc1/src/usespace/{libibverbs,libipathverbs}
I did:
./autogen.sh
./configure --prefix=/usr --libdir=/usr/lib64
make install

Then I built svn/gen2/tags/openib-1.0-rc1/src/userspace/mpi/mvapich-gen2
and verified that osu_latency and osu_bw worked OK.

Then I built svn/gen2/trunk/src/userspace/mpi/mvapich-gen2.
I was able to get osu_latency & osu_bw to run using:
mpirun_rsh -np 2 -debug -hostfile ./mf ~/mvapich1/bin/osu_latency
but when I run it without -debug, I get a "Connection refused"
error message.

I tried osu_bcast and osu_bibw with -debug but they just print
the initial header and then hang.




More information about the general mailing list