[ofa-general] RE: [BUG REPORT] mlx4: Incorrect event is generated when SM is changing
Jack Morgenstein
jackm at mellanox.co.il
Wed Dec 3 07:24:40 PST 2008
I did post the mthca patch:
http://lists.openfabrics.org/pipermail/general/2008-November/055681.html
- Jack
> -----Original Message-----
> From: Moni Shoua [mailto:monis at Voltaire.COM]
> Sent: Wednesday, December 03, 2008 4:27 PM
> To: Roland Dreier; Jack Morgenstein
> Cc: OpenFabrics General; Olga Stern; Or Gerlitz
> Subject: [BUG REPORT] mlx4: Incorrect event is generated when
> SM is changing
>
>
> I have a small fabric with ConnectX HCAs and I run some tests
> with 2 Open SMs that take over of each other (either by
> discovering that the MASTER is not responding or by coming up
> with higher priority).
>
> I noticed that in 2.6.28-rc6 I get in IPoIB a LID_CHANGE
> event (while expecting CLIENT_REREGISTER) even though LID was
> not changed. I looked where the events are generated in
> drivers/infiniband/hw/mlx4/mad.c:smp_snoop()
>
> if (pinfo->clientrereg_resv_subnetto & 0x80)
> event.event =
> IB_EVENT_CLIENT_REREGISTER;
> else
> event.event = IB_EVENT_LID_CHANGE;
>
> and see that clientrereg bit is off.
>
> I checked this simultaneously with a host running kernel
> 2.6.27 and see that the bit is on and
> CLIENT_REREGISTER is generated.
>
> Things get worse of this patch
> (http://lists.openfabrics.org/pipermail/general/2008-November/
055680.html)
is applied since no event would be dispatched at all (BTW Jack, I think
you forgot to send the second half for mthca)
I'll appreciate a clue where does the reregister bit turned off.
thanks
MoniS
More information about the general
mailing list