[ofa-general] [BUG REPORT] mlx4: Incorrect event is generated when SM is changing

Moni Shoua monis at Voltaire.COM
Wed Dec 3 06:27:12 PST 2008


I have a small fabric with ConnectX HCAs and I run some tests with 2 Open SMs that
take over of each other (either by discovering that the MASTER is not responding or by coming up
with higher priority). 

I noticed that in 2.6.28-rc6 I get in IPoIB a LID_CHANGE event (while expecting CLIENT_REREGISTER) even though
LID was not changed. I looked where the events are generated in drivers/infiniband/hw/mlx4/mad.c:smp_snoop()

                        if (pinfo->clientrereg_resv_subnetto & 0x80)
                                event.event    = IB_EVENT_CLIENT_REREGISTER;
                        else
                                event.event    = IB_EVENT_LID_CHANGE;

and see that clientrereg bit is off.

I checked this simultaneously with a host running kernel 2.6.27 and see that the bit is on and 
 CLIENT_REREGISTER is generated.

Things get worse of this patch (http://lists.openfabrics.org/pipermail/general/2008-November/055680.html)
is applied since no event would be dispatched at all (BTW Jack, I think you forgot to send the second half for mthca)

I'll appreciate a clue where does the reregister bit turned off.

thanks

 MoniS



More information about the general mailing list