[ofw] Support boot device in IB stack?

James Yang jyang at xsigo.com
Thu Feb 12 14:16:28 PST 2009


Hi Tzachi,

In crashdump, there is no synchronization problem. Everything is serialized by OS. That's the reason I hope we can skip spinlock in crashdump. I've been doing other crashdump driver before and don't recall DPC problem during the support of crashdump, so I believe OS will call dpc after you queue it. However the threads running in the background won't run in crashdump.

Thanks,
James


-----Original Message-----
From: Tzachi Dar [mailto:tzachid at mellanox.co.il] 
Sent: Thursday, February 12, 2009 1:57 PM
To: James Yang; Leonid Keller; ofw at lists.openfabrics.org
Subject: RE: [ofw] Support boot device in IB stack?

Hi James,

There are a few things to understand:
1) Spinlocks are used for synchronization. We can skip the spin locks,
But we can not skip the need for synchronization...

If someone was sending using a QP, another instance can not start
sending
because this will lead to data corruption.

Next, I don't really understand how the driver moves to polling.
What will happen if we queue a DPC? Will that execute or not?

I believe that if spinlocks are our only problem we will be able to
take them even at level 31. If we will see that this is not enough we 
will have to create another instance of the functions without locks.

Finding MS docs on this issue is indeed difficult. I did some search and
found:
http://download.microsoft.com/download/f/0/5/f05a42ce-575b-4c60-82d6-208
d3754b2d6/Writing-ATAport-Miniport.ppt
this has tips for crushdump:
"Create an unused copy of the miniport at boot time"
I guess that this makes sure that no one will be using the minidump when

The system crashes.

I have also found:
http://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d59
9bac8184a/RAID_design.doc#_Toc150055726 that says that:
"For more information about crashdump support in miniports, see the WDK
documentation"
I have looked at the WDK documentation and found nothing, but I guess
that this gives some
hope.

Lets see if we can find more information from ms about this issues.

Thanks
Tzachi


> -----Original Message-----
> From: James Yang [mailto:jyang at xsigo.com] 
> Sent: Thursday, February 12, 2009 9:03 PM
> To: Tzachi Dar; Leonid Keller; ofw at lists.openfabrics.org
> Subject: RE: [ofw] Support boot device in IB stack?
> 
> Tzachi,
> 
> I agree you analyze on the spinlock object. For BSOD, we can 
> just simply skip calling OS spinlock in cl_spinlock_acquire() 
> when we find out IRQL==31 because the code is not re-entry at 
> IRQL 31. At BSOD, everything has to rely on polling. Even OS 
> is polling interrupt. The only functions we want to support 
> are to make sure we can still send out packets.
> 
> -James
> 
> -----Original Message-----
> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> Sent: Thursday, February 12, 2009 1:47 AM
> To: Leonid Keller; James Yang; ofw at lists.openfabrics.org
> Subject: RE: [ofw] Support boot device in IB stack?
> 
> One more note please:
> 
> Spinlocks are very simple objects of the operating system.
> In short the only thing that they have is a place in memory 
> on which interlocked operations are being done. As such they 
> will work on any IRQL.
> 
> So, one might have two arguments against what I just said:
> 1) The docs say not to use them at elevated IRQL.
> 2) One can try using them at higher IRQL and see that the 
> system gets into a deadlock very fast.
> 
> So, what is the explanation to this issue?
> 
> The long answer is this:
> Spinlocks are not re-enterable. That means that if one is 
> holding a spinlock, and then he will try to acquire that 
> spinlock again, then he is stacked. This means that he will 
> spin for ever and have no chance of freeing the lock in the 
> first place.
> The operating system answer to this problem comes from two places:
> 1) Don't take a spinlock twice. (this means that make sure 
> that your code is not using the lock twice).
> 2) Make sure that your code is not pre-emptied. This is 
> usually achieved by the fact that all users of a spinlock are 
> running at a DISPATCH_LEVEL and if one is running at a higher 
> level he can not take the spinlock.
> (Actually it means something else: find the highest IRQL that 
> is used to
> 
> take the spinlock and make sure that all are using that level).
> 
> So, where does it put us?
> In short our code relies on synchronization. We are using 
> spinlocks to achieve this synchronization. In the general 
> case, we can not live without synchronization.
> 
> So what happens in the case of BSOD: obviously there are no 
> functions that can be called at such an elevated level. This 
> means that in the general case the driver will stop working.
> The only thing that might really help is if we can know the 
> special rules that apply to the case of BSOD. We should also 
> know what we are being expected to do.
> Currently our driver is based on the fact that operations are 
> being completed by interrupts and DPCs. We *can* also work by 
> pooling instead. What we really should know is that is the 
> threading model that is happening at a BSOD time (will 
> interrupts work, will threads work, will running DPCs have a chance to
> finish) and what we should do (only write to a remote 
> destination? Also read and act accordingly?)
> 
> Thanks
> Tzachi
> 
> 
> > -----Original Message-----
> > From: Leonid Keller
> > Sent: Thursday, February 12, 2009 10:23 AM
> > To: James Yang; Tzachi Dar; ofw at lists.openfabrics.org
> > Subject: RE: [ofw] Support boot device in IB stack?
> > 
> > > In order to support crashdump, the IB related code must be
> > able to run at IRQL 31.
> > Could you explain why do you think so ?
> > As far as I know, at IRQL 31 all interrupts/dpc/threads are blocked.
> > When a crash happens, the only task of OS is to create a 
> dump, while 
> > preserving the current state.
> > So it looks strange to me, that OS will allow to any driver to 
> > continue working after a crash.
> > As to creating a crash dump at IRQL 31, it looks a bit 
> strange to me.
> > OS have to write the dump to HD, which presumes the work with disk 
> > driver at IRQL 2 at worst.
> > Am I missing something ? 
> > 
> > > -----Original Message-----
> > > From: ofw-bounces at lists.openfabrics.org 
> > > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of James Yang
> > > Sent: Thursday, February 12, 2009 1:59 AM
> > > To: Tzachi Dar; ofw at lists.openfabrics.org
> > > Subject: RE: [ofw] Support boot device in IB stack?
> > > 
> > > Hi Tzachi,
> > > 
> > > The boot should be OK if #1 and #3 are resolved. In order
> > to support
> > > crashdump, the IB related code must be able to run at IRQL
> > 31. It will
> > > be good to know what functions are OK to run and what are
> > not. By the
> > > way, according to DDK, "Callers of KeAcquireSpinLock must
> > be running
> > > at IRQL <= DISPATCH_LEVEL."
> > > 
> > > There is no formal document on crashdump. The most
> > important thing is
> > > that it's running at IRQL 31, and multiple threading may
> > not work at
> > > this time.
> > > 
> > > iSCSI is one of the solutions, but we also have to support vhba 
> > > directly. And in order to get MS WHQL for vhba, we must support 
> > > crashdump. Do you get any request to support crashdump on
> > iSCSI/ipoib?
> > > 
> > > Thanks,
> > > James
> > > 
> > > -----Original Message-----
> > > From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> > > Sent: Wednesday, February 11, 2009 12:55 AM
> > > To: James Yang; ofw at lists.openfabrics.org
> > > Subject: RE: [ofw] Support boot device in IB stack?
> > > 
> > > Hi James,
> > > 
> > > As for 1,3 this are technical changes that can easily be done.
> > > 
> > > Issue #2 seems much more problematic to me.
> > > First I must say that I haven't studied the topic 
> thoroughly, so I 
> > > might be getting a wrong impression of the entire issue.
> > > One more thing to start with, is that I'm not sure that
> > this request
> > > is a must. What I mean by that is that booting will take 
> place just 
> > > fine.
> > > The one thing that will not work is creating a crash dump
> > file later.
> > > 
> > > In any case, spinlocks don't really have a problem with 
> higher IRQL 
> > > (the way I understand it), but many other commands have. 
> So if the 
> > > code is only trying to send/receive data with qps that are open I 
> > > guess that things should work.
> > > Trying to open new QPs will not work (we have to wait). 
> > > Please also note that there are only about 5 commands that can be 
> > > called from IRQL>DISPATCH_LEVEL.
> > > 
> > > Can you please send a reference to the document that says
> > what are the
> > > demands after the system has crashed?
> > > 
> > > Another question is this, will booting using iSCSI be a
> > better option
> > > for you? (not that it was done using ipoib).
> > > 
> > > Thanks
> > > Tzachi
> > > 
> > > > -----Original Message-----
> > > > From: ofw-bounces at lists.openfabrics.org 
> > > > [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of 
> James Yang
> > > > Sent: Tuesday, February 10, 2009 11:31 PM
> > > > To: ofw at lists.openfabrics.org
> > > > Subject: [ofw] Support boot device in IB stack?
> > > > 
> > > > Hi,
> > > > 
> > > > We would like to support boot device especially SAN boot
> > > over IB. So
> > > > far I can see the following issues in current code:
> > > > 1) Many functions are page-able. If these functions are
> > running at
> > > > boot or shut down time, the disk may not be ready, the
> > paging won't
> > > > work.
> > > > 2) Spinlock may not work for crashdump, whose
> > > > IRQL>DISPATCH_LEVEL. Any other functions need to be changed
> > > > when IRQL>DISPATCH_LEVEL?
> > > > 3) All related drivers should be boot start driver.
> > > > 
> > > > Are there any other potential problem? Is there any plan
> > to support
> > > > SAN boot?
> > > > 
> > > > Please advice.
> > > > 
> > > > Thanks,
> > > > James
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > _______________________________________________
> > > > ofw mailing list
> > > > ofw at lists.openfabrics.org
> > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> > > > 
> > > 
> > > _______________________________________________
> > > ofw mailing list
> > > ofw at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> > > 
> 
> 




More information about the ofw mailing list