[openib-general] osm unreliable unless -d1
Jean-Christophe Hugly
jice at pantasys.com
Fri Mar 3 18:28:28 PST 2006
Hi Guys,
I have been having trouble with gen2's osm for a while. I finally
isolated the faulty behaviour to one easy test case:
run osm somewhere.
then one whatever workstation has an HCA connected to the same subnet,
do this:
i=1
while true; do
modprobe -r ib_mthca
sleep 3
modprobe ib_mthca
ibstat
echo $i
sleep 3
i=`expr $i + 1`
done
For me, after i reaches 7 or 8, the port no-longer gets initialized and
ibstat reports:
State: Initializing
Physical state: LinkUp
On the other hand if you run osm with -d1 option (mostly
single-threaded), then it seems to work indefinitely.
I did this with osm r5594, compiled and running on suse10 (dual xeon)
with openib of the same rev. The "client side" is the same os and rev;
cpus are 4 opterons.
I have not started to look for faulty mutexes, yet. Where the fixes
recently proposed in that area committed as of 5594 ?
--
Jean-Christophe Hugly <jice at pantasys.com>
PANTA
More information about the general
mailing list