[openib-general] Bugzilla Bug 329: HCA_FATAL_EVENT cause to OpenSM to stop functioning

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Wed Jan 31 02:16:33 PST 2007


Hi Hal.

I noticed the following bug in Bugzilla:

	Bugzilla Bug 329: HCA_FATAL_EVENT cause to opensm to stop functioning
	  https://bugs.openfabrics.org/show_bug.cgi?id=329

	When there is a HCA fatal event on the host that opensm is running on it,
	the opensm stop to function (After the event, the driver restart the device,
	and the port does not return to active state).

	If the opensm run in sweep mode , after the event you can see that the opensm
	stop sweeping.

I remember that a couple of months ago I sent a patch that takes care of this problem:
 - in case of IBV_EVENT_DEVICE_FATAL, osm was forced to exit
 - in case of IBV_EVENT_PORT_ERROR, osm initiated heavy sweep

The problem with my patch was that it made osm to depend on uverbs module.
To resolve this problem, support should be added in umad, and then osm could
use this support.

Do you know if some work in this area was done in umad?

-- Yevgeny




More information about the general mailing list