[openib-general] Bugzilla Bug 329: HCA_FATAL_EVENT cause to OpenSM to stop functioning

Hal Rosenstock halr at voltaire.com
Wed Jan 31 06:48:15 PST 2007


Hi Yevgeny,

On Wed, 2007-01-31 at 05:16, Yevgeny Kliteynik wrote:
> Hi Hal.
> 
> I noticed the following bug in Bugzilla:
> 
> 	Bugzilla Bug 329: HCA_FATAL_EVENT cause to opensm to stop functioning
> 	  https://bugs.openfabrics.org/show_bug.cgi?id=329
> 
> 	When there is a HCA fatal event on the host that opensm is running on it,
> 	the opensm stop to function (After the event, the driver restart the device,
> 	and the port does not return to active state).
> 
> 	If the opensm run in sweep mode , after the event you can see that the opensm
> 	stop sweeping.
> 
> I remember that a couple of months ago I sent a patch that takes care of this problem:
>  - in case of IBV_EVENT_DEVICE_FATAL, osm was forced to exit
>  - in case of IBV_EVENT_PORT_ERROR, osm initiated heavy sweep
> 
> The problem with my patch was that it made osm to depend on uverbs module.
> To resolve this problem, support should be added in umad, and then osm could
> use this support.
> 
> Do you know if some work in this area was done in umad?

This has been on the list but unfortunately there has been no time yet
to work on the local events support in libibumad.

-- Hal

> -- Yevgeny





More information about the general mailing list