[ofa-general] [BUG REPORT] mlx4: Incorrect event is generated when SM is changing
Moni Shoua
monis at Voltaire.COM
Wed Dec 3 06:27:12 PST 2008
I have a small fabric with ConnectX HCAs and I run some tests with 2 Open SMs that
take over of each other (either by discovering that the MASTER is not responding or by coming up
with higher priority).
I noticed that in 2.6.28-rc6 I get in IPoIB a LID_CHANGE event (while expecting CLIENT_REREGISTER) even though
LID was not changed. I looked where the events are generated in drivers/infiniband/hw/mlx4/mad.c:smp_snoop()
if (pinfo->clientrereg_resv_subnetto & 0x80)
event.event = IB_EVENT_CLIENT_REREGISTER;
else
event.event = IB_EVENT_LID_CHANGE;
and see that clientrereg bit is off.
I checked this simultaneously with a host running kernel 2.6.27 and see that the bit is on and
CLIENT_REREGISTER is generated.
Things get worse of this patch (http://lists.openfabrics.org/pipermail/general/2008-November/055680.html)
is applied since no event would be dispatched at all (BTW Jack, I think you forgot to send the second half for mthca)
I'll appreciate a clue where does the reregister bit turned off.
thanks
MoniS
More information about the general
mailing list