[openib-general] kdapl locking problem

James Lentini jlentini at netapp.com
Thu Jun 16 08:08:32 PDT 2005


As Itamar observed, the first message is printed out by 
_raw_spin_lock() when the kernel is compiled for UP and spin lock 
debugging.

dapl_ep_disconnect() is trying to obtain a lock that is already 
locked. The message indicates that the lock was taken in 
dapl_evd_connection_callback().  There is no control flow path from 
dapl_evd_connection_callback() that reaches dapl_ep_disconnect().

I'm also unsure of how execution could have reached 
dapl_ep_disconnect() with the spin lock locked. We are using 
spin_lock_irqsave(). My understanding is that interrupts will be 
masked until spin_unlock_irqrestore() is called. That would imply that 
it is not possible for the control flow to change to another context 
that calls dapl_ep_disconnect().

The second message is a by-product of the first problem. 
dapl_ep_disconnect() unlocks the spin lock, so when control returns to 
dapl_evd_connection_callback(), the lock is already unlocked.

So we just need to fix the first problem. Are we using 
spin_lock_irqsave() incorrectly?

james

On Thu, 16 Jun 2005, Itamar Rabenstein wrote:

> Hi Hal,
> I am trying to understand what is going here and i still dont see how this
> happan .
>
> This prints are only set in UP mode .(is this your system UP?)
> the code is (function: dapl_evd_connection_callback):
> spin_lock_irqsave(&ep->common.lock, ep->common.flags);
> case on the event type
> disconnect:  dapl_ib_disconnect_clean(ep, TRUE);
> spin_unlock_irqrestore(&ep->common.lock, ep->common.flags);
>
> from some reason in the middle between the lock and the unlock there is a
> call to consumer
> function (dat_ep_disconnetc) that try to disconnect the same ep and the lock
> fail.
>
> the evd_cb function is either an interupt from the CM so i dont see how the
> consumer can call
> dat_ib_disconnect in the middle
> or the user called twice to dat_ib_disconnect on the same ep and youe kernel
> give preemption
>
> i dont understand both (;-)
>
> can you try to run it with some debug?
> at least ot know who called to dapl_evd_connection_callback ?
>
> Itamar
>
>
>> -----Original Message-----
>> From: Hal Rosenstock [mailto:halr at voltaire.com]
>> Sent: Tuesday, June 14, 2005 8:37 PM
>> To: James Lentini
>> Cc: openib-general at openib.org
>> Subject: [openib-general] kdapl locking problem
>>
>>
>> Hi,
>>
>> When running in loopback mode (client and server on same
>> machine (x86)):
>> kdapltest -T T -s <IP addr> -D mthca0a -d -t 2 -w 8 -i 20
>> client SR server SR
>> I see the following locking problem:
>>
>> Jun 14 13:30:08 localhost kernel:
>> drivers/infiniband/ulp/dat-provider/dapl_ep.c:1111:
>> spin_lock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:c44b1c
>> 18) already locked by
>> drivers/infiniband/ulp/dat-provider/dapl_evd.c/756
>> Jun 14 13:30:08 localhost kernel:
>> drivers/infiniband/ulp/dat-provider/dapl_evd.c:797:
>> spin_unlock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:c44b
>> 1c18) not locked
>>
>> -- Hal
>>
>> _______________________________________________
>> openib-general mailing list
>> openib-general at openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>



More information about the general mailing list