[openib-general] SDP_CONN_LOCK

Thu Feb 17 15:25:39 PST 2005

Quoting r. Libor Michalek <libor at topspin.com>:
> Subject: Re: [openib-general] SDP_CONN_LOCK
> 
> On Fri, Feb 18, 2005 at 12:13:30AM +0200, Michael S. Tsirkin wrote:
> > Quoting r. Libor Michalek <libor at topspin.com>:
> > > On Thu, Feb 17, 2005 at 11:29:37PM +0200, Michael S. Tsirkin wrote:
> > > > Quoting r. Libor Michalek <libor at topspin.com>:
> > > > > On Thu, Feb 17, 2005 at 05:35:17PM +0200, Michael S. Tsirkin wrote:
> > > > > > Hi, Libor!
> > > > > > Could you please explain what are the SDP_CONN_LOCK
> > > > > > and friends doing in SDP?
> > > > > > 
> > > > > > It seems they just implement exclusive access to socket,
> > > > > > but if so, why is a simple semaphore or mutex not used?
> > > > > 
> > > > >   They do implement exclusive access to the socket, but they implement
> > > > > exclusive access from both process and irq context, which is why a
> > > > > semaphore was not used. In interrupt context SDP_CONN_LOCK_BH is used
> > > > > to lock the connection, look in sdp_cq_event_handler() for it's use,
> > > > > and in process context SDP_CONN_LOCK is used.
> > > > 
> > > > I dont really understand how it works.
> > > > When an interrupt arrives while users != 0, it seems you are
> > > > calling scheduler().
> > > > What is sdp_conn_internal_lock doing? I understand it is to be called
> > > > from interrupt context, but how can it call scheduler() then?
> > > 
> > >   SDP_CONN_LOCK and SDP_CONN_UNLOCK are never called from interrupt
> > > context, only from process context. The spinlock conn->lock.slock
> > > is called in interrupt context and process context before the 'users'
> > > variable is changed or read. If users != 0, then the connection is in
> > > use by process context. If the connection is in use then another lock
> > > request in process context will sleep until users == 0, if the connection
> > > is in use then a lock request in interrupt context will mark the
> > > connection as having had an event and return. When a process context
> > > lock holder releases the lock, it checks if there has been an interrupt
> > > event, which it then processes by draining and arming the CQ, and it
> > > wakes any process context waiting for the lock.
> > 
> > Cant all this be done by a semaphore in process context, and 
> > down_trylock in interrupt context?
> > The "connection has an event" bit could be protected by a spinlock.
> 
>   The CQ draining happens inside of the spinlock. If the CQ drain
> happened in process context inside of a semaphore, then there is
> a race condition for the interrupt between checking the semaphore
> and setting the spinlock protected "connection event bit". The
> CQ drain and the "event bit" have to happen under the same lock
> to avoid the race. 
> 
>   If you move the CQ drain and the "event bit" into the same spinlock
> where do you put the semaphore? In process context the spin would
> need to be inside the semaphore lock, but in interrupt context the
> semaphore down_trylock and up are inside the spinlock. Now there would
> be a race between spin_unlock() followed by up() in process context
> and spin_lock() followed by down_trylock() in interrupt context.
> 
>    spin_unlock()
>                   spin_lock()
>                   down_trylock()
>    up()
> 
> 
> -Libor
> 

Since CQ drain is protected by spinlock - move it outside the semaphore?

process:

down
stuff
up

lock
check bit and arm
drain
unlock

interrupt:

lock
down_trylock
set bit
up
unlock

Anyway, this flow is just an optimisation - so I wander how much does it help,
actually. My understanding this is a backport from gen1, but what was
a useful optimisation for gen1 may not be for gen2.
gen2 event flow is much faster than gen1 - it is possible
all the extra locking and complications are not worth it with mthca.
I know this is what my attempts to switch to polling in ip over ib show.

-- 
MST - Michael S. Tsirkin