[openib-general] kdapl locking problem

Hal Rosenstock halr at voltaire.com
Thu Jun 16 09:58:36 PDT 2005


On Thu, 2005-06-16 at 04:39, Itamar Rabenstein wrote: 
> Hi Hal,
> I am trying to understand what is going here and i still dont see how this
> happan .
> 
> This prints are only set in UP mode .(is this your system UP?)

Yes.

> the code is (function: dapl_evd_connection_callback):
> spin_lock_irqsave(&ep->common.lock, ep->common.flags);
> case on the event type
> disconnect:  dapl_ib_disconnect_clean(ep, TRUE);
> spin_unlock_irqrestore(&ep->common.lock, ep->common.flags);
> 
> from some reason in the middle between the lock and the unlock there is a
> call to consumer 
> function (dat_ep_disconnetc) that try to disconnect the same ep and the lock
> fail.

Could this be a "local" disconnect race of some sort ?

> the evd_cb function is either an interupt from the CM so i dont see how the
> consumer can call 
> dat_ib_disconnect in the middle 
> or the user called twice to dat_ib_disconnect on the same ep and youe kernel
> give preemption





CONFIG_PREEMPT is not set in my kernel config.

> i dont understand both (;-)
> 
> can you try to run it with some debug?
> at least ot know who called to dapl_evd_connection_callback ?

All calls to dapl_evd_connection_callback are out of the CM except one
case in dapl_ep_disconnect. In the case of dapl_ep_disconnect, the lock
is obtained in dapl_ep_disconnect before the connection callback routine
would/might have been called. 

One instance:
Jun 16 12:47:33 localhost kernel: dapl_ep_disconnect: dapl_evd_connection_callback EP 0xc91e5bf8 CM ID 0x00000000 EP common lock 0xc91e5c08
Jun 16 12:47:34 localhost kernel: dapl_ep_disconnect: dapl_evd_connection_callback EP 0xc987ebf8 CM ID 0x00000000 EP common lock 0xc987ec08
Jun 16 12:47:34 localhost kernel: drivers/infiniband/ulp/dat-provider/dapl_ep.c:1110: spin_lock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:ce609c08) already locked by drivers/infiniband/ulp/dat-provider/dapl_cr.c/501
Jun 16 12:47:34 localhost kernel: drivers/infiniband/ulp/dat-provider/dapl_cr.c:512: spin_unlock

Another instance:
Jun 16 12:55:11 localhost kernel: dapl_ep_disconnect: dapl_evd_connection_callback EP 0xc64b1bf8 CM ID 0x00000000 EP common lock 0xc64b1c08
Jun 16 12:55:12 localhost kernel: drivers/infiniband/ulp/dat-provider/dapl_ep.c:1110: spin_lock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:ce5a6c08) already locked by drivers/infiniband/ulp/dat-provider/dapl_cr.c/501
Jun 16 12:55:12 localhost kernel: drivers/infiniband/ulp/dat-provider/dapl_cr.c:512: spin_unlock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:ce5a6c08) not locked
Jun 16 12:55:12 localhost kernel: dapl_ep_disconnect: dapl_evd_connection_callback EP 0xc07debf8 CM ID 0x00000000 EP common lock 0xc07dec08

Yet another instance:
Jun 16 12:55:12 localhost kernel: dapl_cm_active_cb_handler: TIMEWAIT EXIT dapl_evd_connection_callback EP 0xce609bf8 CM ID 0xc82dcdf8 EP common lock 0xce609c08
Jun 16 12:55:12 localhost kernel: dapl_ep_disconnect: dapl_evd_connection_callback EP 0xcbeb8bf8 CM ID 0x00000000 EP common lock 0xcbeb8c08
Jun 16 12:55:12 localhost kernel: drivers/infiniband/ulp/dat-provider/dapl_ep.c:1110: spin_lock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:cf35bc08) already locked by drivers/infiniband/ulp/dat-provider/dapl_cr.c/501
Jun 16 12:55:12 localhost kernel: drivers/infiniband/ulp/dat-provider/dapl_cr.c:512: spin_unlock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:cf35bc08) not locked

-- Hal

>  Itamar 
> 
> 
> > --Original Message--
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Tuesday, June 14, 2005 8:37 PM
> > To: James Lentini
> > Cc: openib-general at openib.org
> > Subject: [openib-general] kdapl locking problem
> > 
> > 
> > Hi,
> > 
> > When running in loopback mode (client and server on same 
> > machine (x86)):
> > kdapltest -T T -s <IP addr> -D mthca0a -d -t 2 -w 8 -i 20 
> > client SR server SR
> > I see the following locking problem:
> > 
> > Jun 14 13:30:08 localhost kernel: 
> > drivers/infiniband/ulp/dat-provider/dapl_ep.c:1111: 
> > spin_lock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:c44b1c
> > 18) already locked by 
> > drivers/infiniband/ulp/dat-provider/dapl_evd.c/756
> > Jun 14 13:30:08 localhost kernel: 
> > drivers/infiniband/ulp/dat-provider/dapl_evd.c:797: 
> > spin_unlock(drivers/infiniband/ulp/dat-provider/dapl_ep.c:c44b
> > 1c18) not locked
> > 
> > -- Hal
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list