[ofa-general] Lost out-of-svc trap notifications during SM handover

Lan Tran Lan.Tran at 3leafsystems.com
Fri Nov 9 09:01:37 PST 2007


Hi Sasha, 

I'm seeing a problem with missing out-of-svc trap notifications when a Master SM port is disabled. I'm taking a look into it now, but if you have any pointers or ideas of what might be going on or how to resolve it, that would be much appreciated! 

I am subscribing to be informed of out-of-service trap events (i.e. trap 65), registering my own callback. When I disable an IB port of a remote node that is running the Standby SM, then, as expected, my trap callback function is called. But when I disable the IB port of the remote node that is the Master SM, my trap 65 callback is never called.  From looking at the opensm logs it seems what is happening is: 
1) I disable port running Master SM 
2) SM handover starts  
   --> during Standby SM's heavy sweep, osm_drop_mgr_process() detects that the old Master SM port is down ... but at this point no subscribers to be informed because they are all subscribed with the old Master SM  
   ---> Standby SM enters Master SM state, so now new Master SM  
3) Several seconds later, I subscribe with the new Master SM for trap 65 notification (I do this whenever I receive IB_EVENT_CLIENT_REREGISTER event), but this is too late as the report notice for the dropped old Master SM port already occurred earlier. 

It seems I need to somehow make sure that I have subscribed for a trap 65 notification with the to-be new Master SM when it decides to report that the old Master SM port goes down. Not quite sure if this is possible though :) 

Thanks again!
Lan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3438 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20071109/52dfebaf/attachment.bin>


More information about the general mailing list