[ofa-general] RE: [BUG REPORT] mlx4: Incorrect event is generated when SM is changing

Jack Morgenstein jackm at mellanox.co.il
Wed Dec 3 08:01:13 PST 2008


> Things get worse if this patch 
>
(http://lists.openfabrics.org/pipermail/general/2008-November/055680.htm
l)
> is applied since no event would be dispatched at all

Does this mean that you do not want the patches applied?
(I have put them into the upcoming OFED 1.4)

- Jack


> -----Original Message-----
> From: Moni Shoua [mailto:monis at Voltaire.COM] 
> Sent: Wednesday, December 03, 2008 4:27 PM
> To: Roland Dreier; Jack Morgenstein
> Cc: OpenFabrics General; Olga Stern; Or Gerlitz
> Subject: [BUG REPORT] mlx4: Incorrect event is generated when 
> SM is changing
> 
> 
> I have a small fabric with ConnectX HCAs and I run some tests 
> with 2 Open SMs that take over of each other (either by 
> discovering that the MASTER is not responding or by coming up 
> with higher priority). 
> 
> I noticed that in 2.6.28-rc6 I get in IPoIB a LID_CHANGE 
> event (while expecting CLIENT_REREGISTER) even though LID was 
> not changed. I looked where the events are generated in 
> drivers/infiniband/hw/mlx4/mad.c:smp_snoop()
> 
>                         if (pinfo->clientrereg_resv_subnetto & 0x80)
>                                 event.event    = 
> IB_EVENT_CLIENT_REREGISTER;
>                         else
>                                 event.event    = IB_EVENT_LID_CHANGE;
> 
> and see that clientrereg bit is off.
> 
> I checked this simultaneously with a host running kernel 
> 2.6.27 and see that the bit is on and 
>  CLIENT_REREGISTER is generated.
> 
> Things get worse of this patch 
> (http://lists.openfabrics.org/pipermail/general/2008-November/
055680.html)
is applied since no event would be dispatched at all (BTW Jack, I think
you forgot to send the second half for mthca)

I'll appreciate a clue where does the reregister bit turned off.

thanks

 MoniS



More information about the general mailing list