[ofa-general] perfquery causes kernel to be stuck in ib_unregister_mad_agent() function

Jean-Francois.Neyroud Jean-Francois.Neyroud at bull.net
Tue Apr 29 01:17:38 PDT 2008


If I attemp to query at the same time the performance counters on all 
nodes on a cluster ( 40 nodes) .
perfquery causes kernel to be stuck in ib_unregister_mad_agent() function.

Impossible to send CTRL-C or CTRL-Z to perfquery, it is stuck in the kernel.
# pgrep perfquery
27578
# cat /proc/27578/wchan
ib_unregister_mad_agent

I have this problem with OFED-1.2.5 or 1.3 and with mthca or ConnectX, 
not tested with others HCA and OFED.

Reproduceur with 2 nodes and without switch:

# for i in `seq 1 100`; do perfquery >/dev/null 2>&1 & done

# pgrep perfquery | while read pid; do echo "$pid: `cat /proc/$pid/wchan`"; echo; done | dshbak -c
----------------
[14936,14938-15029]
----------------
 0
----------------

----------------
----------------
14937
----------------
 flush_cpu_workqueue


Does anyone know this problem ?

Jean-Francois.



More information about the general mailing list