[openib-general] recursion depth exceeded in ipoib_workqueue

Jack Morgenstein jackm at mellanox.co.il
Mon Sep 19 08:21:56 PDT 2005


environment:
HCA Port 1 of Host 1 is connected back-to-back to HCA port 1 of Host 2.
A shell script running on Host 1 loads and unloads the openib driver.  On
Host 2, the openib driver is up and opensm is running.

Host 1:  while date ; do
           /etc/init.d/openibd start
	     sleep 3
           /etc/init.d/openibd stop
	     sleep 1
           done

           NOTES: 
              a. sleeps were inserted to give time to opensm on host 2 to
respond to changes
		  b. openibd script attached


Problem -- recursion depth exceeded in ipoib_workqueue:
/var/log/messages from Host 1
------------------------------
ib_mthca: Initializing  (0000:04:00.0)
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 29 (level, low) -> IRQ 185
run_workqueue: recursion depth exceeded: 4

Call Trace:<ffffffff80147a47>{flush_cpu_workqueue+87}
<ffffffff803f76d6>{wait_for_completion+230}
        <ffffffff80131d50>{default_wake_function+0}
<ffffffff8013fc39>{lock_timer_base+41}
        <ffffffff88078ba3>{:ib_ipoib:ipoib_mcast_stop_thread+99}
        <ffffffff88078cdc>{:ib_ipoib:ipoib_mcast_restart_task+44}
        <ffffffff80147abd>{flush_cpu_workqueue+205}
<ffffffff88078cb0>{:ib_ipoib:ipoib_mcast_restart_task+0}
        <ffffffff8013fc39>{lock_timer_base+41}
<ffffffff88078ba3>{:ib_ipoib:ipoib_mcast_stop_thread+99}
        <ffffffff88078cdc>{:ib_ipoib:ipoib_mcast_restart_task+44}
        <ffffffff80147abd>{flush_cpu_workqueue+205}
<ffffffff88078cb0>{:ib_ipoib:ipoib_mcast_restart_task+0}
        <ffffffff8013fc39>{lock_timer_base+41}
<ffffffff88078ba3>{:ib_ipoib:ipoib_mcast_stop_thread+99}
        <ffffffff88078cdc>{:ib_ipoib:ipoib_mcast_restart_task+44}
        <ffffffff88078cb0>{:ib_ipoib:ipoib_mcast_restart_task+0}
        <ffffffff8014795e>{worker_thread+478}
<ffffffff80131d50>{default_wake_function+0}
        <ffffffff8012f2d3>{__wake_up_common+67}
<ffffffff80131d50>{default_wake_function+0}
        <ffffffff8014beb0>{keventd_create_kthread+0}
<ffffffff80147780>{worker_thread+0}
        <ffffffff8014beb0>{keventd_create_kthread+0}
<ffffffff8014c009>{kthread+217}
        <ffffffff8010e50e>{child_rip+8}
<ffffffff8014beb0>{keventd_create_kthread+0}
        <ffffffff8014bf30>{kthread+0} <ffffffff8010e506>{child_rip+0}

Please Note:
  -- Set Multicast List posts the restart task to the ipoib_workqueue
(ipoib_main.c:675)
  -- ipoib_mcast_restart_task (ipoib_multicast.c) calls
ipoib_mcast_stop_thread(), which calls flush_workqueue(ipoib_workqueue)
       -- so the restart task flushes the work queue its running from.
  -- Linux prevents the deadlock by testing if the flush is called from the
same thread (see linux/workqueue.c:223).  If it is, Linux flushes remaining
tasks in the work queue (without waiting).  This both breaks serialization
of tasks in the work queue, and can cause the recursion overflow seen above.

Jack

 <<openibd>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050919/feab5d9d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: openibd
Type: application/octet-stream
Size: 24304 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050919/feab5d9d/attachment.obj>


More information about the general mailing list