[ofw] RE: OFED 1.3/WinOF 1.1/Win2k3R2X64 BSOD

Leonid Keller leonid at mellanox.co.il
Mon Jun 30 09:18:57 PDT 2008


This fix, i believe, is a part of the large patch, which Fab is adding
now part-by-part.
Fab, is that right and when to your estimation this part will come come
to the trunk ?


________________________________

	From: Eleanor Witiak [mailto:eleanor.witiak at qlogic.com] 
	Sent: Monday, June 30, 2008 6:07 PM
	To: Leonid Keller
	Cc: AndInc at aol.com; sean.hefty at intel.com;
ofw at lists.openfabrics.org
	Subject: RE: OFED 1.3/WinOF 1.1/Win2k3R2X64 BSOD
	
	

	PR 1029, which patch 1223 fixed, did get a BSOD of "Bad Pool
Caller" same as crash below.  Also part of the crash's trace stack below
is similar to what I got; however, Mike's crash does not have SRP on the
stack as mine did.  Mike, can you try your test again with my patch?

	 

	Leonid: Also, while working on PR 1029, I ran into an IBAL
problem that I sent to you.  I have attached our mail correspondence.  I
have created a temp patch in IBAL (without my patch 1223) just to see if
it also fixed my "Bad Pool Caller" BSOD and it did.  In addition, I have
also run with the same temp IBAL patch and it also got rid of the BSOD
while trying to reproduce PR 1037.  I think that Mike's crash might be
running into this problem.  Is your patch ready?  If so, I would love to
test with it.

	 

	Thanks,

	Eleanor

	 

	
________________________________


	From: Leonid Keller [mailto:leonid at mellanox.co.il] 
	Sent: Monday, June 30, 2008 10:18 AM
	To: Eleanor Witiak
	Cc: AndInc at aol.com; sean.hefty at intel.com;
ofw at lists.openfabrics.org
	Subject: RE: OFED 1.3/WinOF 1.1/Win2k3R2X64 BSOD

	 

	Thanks, but i meant to ask, whether this crash looks like the
one, you've solved in 1223 ?

		 

		
________________________________


		From: Eleanor Witiak [mailto:eleanor.witiak at qlogic.com] 
		Sent: Monday, June 30, 2008 4:40 PM
		To: Leonid Keller; AndInc at aol.com; sean.hefty at intel.com;
ofw at lists.openfabrics.org
		Subject: RE: OFED 1.3/WinOF 1.1/Win2k3R2X64 BSOD

		Yes, the patch did come after the 1.1 release.  The
patch revision # is 1223; the affected files are srp_connection.c and
srp_session.c.

		 

		Eleanor

		 

		
________________________________


		From: Leonid Keller [mailto:leonid at mellanox.co.il] 
		Sent: Monday, June 30, 2008 4:34 AM
		To: AndInc at aol.com; sean.hefty at intel.com;
ofw at lists.openfabrics.org; Eleanor Witiak
		Subject: RE: OFED 1.3/WinOF 1.1/Win2k3R2X64 BSOD

		 

		a) don't know;

		b) may be caused by a);

		c) may be caused by b).

		 

		A very important patch of Eleanor (WinOF 1223),
preventing BSOD upon sudden srpt disconnection, has come after closing
the release.

		Eleanor, could you check whether it's the case.

		 

		Here is some more information, based on the sent
minidumps:

		 

		1: kd> !analyze -v

		BAD_POOL_CALLER (c2)
		The current thread is making a bad pool request.
Typically this is at a bad IRQL level or double freeing the same
allocation, etc.
		Arguments:
		Arg1: 0000000000000007, Attempt to free pool which was
already freed
		Arg2: 000000000000121a, (reserved)
		Arg3: 00000000012b0011, Memory contents of the pool
block
		Arg4: fffffadf99483c50, Address of the block of pool
being deallocated

		 

		Debugging Details:
		------------------

		 

		
		POOL_ADDRESS:  fffffadf99483c50 

		 

		FREED_POOL_TAG:  priv

		 

		BUGCHECK_STR:  0xc2_7_priv

		 

		CUSTOMER_CRASH_COUNT:  1

		 

		DEFAULT_BUCKET_ID:  DRIVER_FAULT_SERVER_MINIDUMP

		 

		PROCESS_NAME:  System

		 

		CURRENT_IRQL:  0

		 

		LAST_CONTROL_TRANSFER:  from fffff800011aa769 to
fffff8000102e950

		 

		STACK_TEXT:  
		fffffadf`90d7bbc8 fffff800`011aa769 : 00000000`000000c2
00000000`00000007 00000000`0000121a 00000000`012b0011 : nt!KeBugCheckEx
		fffffadf`90d7bbd0 fffffadf`8f554621 : fffffadf`99483c50
00000000`00000080 fffffadf`99483c50 00000000`00000080 :
nt!ExFreePoolWithTag+0x401
		fffffadf`90d7bc90 fffffadf`8f51f568 : fffffadf`9c813c00
fffffadf`9bddd3e8 fffffadf`99483c78 fffffadf`9bddd3c8 :
ibbus!async_destroy_cb+0x171
[d:\openib-windows-svn\1177\gen1\trunk\core\al\al_common.c @ 686]
		fffffadf`90d7bce0 fffffadf`8f521a1d : fffffadf`9c8764e0
fffffadf`9bddd2b0 fffffadf`9bed0040 fffff800`011b5500 :
ibbus!__cl_async_proc_worker+0x98
[d:\openib-windows-svn\1177\gen1\trunk\core\complib\cl_async_proc.c @
153]
		fffffadf`90d7bd10 fffffadf`8f522108 : 00000000`00000000
fffffadf`9c8764e0 fffffadf`9c8764e0 fffff800`011b5500 :
ibbus!__cl_thread_pool_routine+0x4d
[d:\openib-windows-svn\1177\gen1\trunk\core\complib\cl_threadpool.c @
66]
		fffffadf`90d7bd40 fffff800`0124b972 : 00000000`00000000
fffffadf`9beaf040 fffffadf`9beaf040 fffffadf`9c168bf0 :
ibbus!__thread_callback+0x28
[d:\openib-windows-svn\1177\gen1\trunk\core\complib\kernel\cl_thread.c @
49]
		fffffadf`90d7bd70 fffff800`010202d6 : fffff800`011b1180
fffffadf`9bed0040 fffff800`011b5500 fffffadf`9c8b81c0 :
nt!PspSystemThreadStartup+0x3e
		fffffadf`90d7bdd0 00000000`00000000 : 00000000`00000000
00000000`00000000 00000000`00000000 00000000`00000000 :
nt!KxStartSystemThread+0x16

		 

		FOLLOWUP_IP: 
		ibbus!async_destroy_cb+171
[d:\openib-windows-svn\1177\gen1\trunk\core\al\al_common.c @ 686]

		SYMBOL_STACK_INDEX:  2

		 

		SYMBOL_NAME:  ibbus!async_destroy_cb+171

		 

			
________________________________


			From: AndInc at aol.com [mailto:AndInc at aol.com] 
			Sent: Friday, June 27, 2008 2:14 AM
			To: sean.hefty at intel.com; Leonid Keller;
ofw at lists.openfabrics.org
			Subject: OFED 1.3/WinOF 1.1/Win2k3R2X64 BSOD

			A simple sequential/random IOMeter script of
small block writes produces a BSOD in this environment. Trace is below,
very repeatable, two similar failures in the trace. Any clues about
what's causing the (a) error (b) disconnect and (c) BSOD?

			 

			Thanks,

			 

			Mike Anderson

			 

			[15513.043769] local QP operation err (QPN
0c004a, WQE index 39b8, vendor syndrome 6f, opcode = 5e)
			[15513.043777] CQE contents 000c004a 00000000
00000000 00000000 00000000 00000000 39b86f02 0000005e
			[15513.043779] ib_srpt: failed send status= 2
			[15513.043783] ib_srpt: failed send status= 5
			[15513.043786] ib_srpt: failed send status= 5
			[15513.043801] ib_srpt: failed send status= 5
			[15513.043851] ib_srpt: failed send status= 5
			[15513.043855] ib_srpt: failed send status= 5
			[15513.043857] ib_srpt: failed send status= 5
			[15513.043860] ib_srpt: failed send status= 5
			[15513.043873] ib_srpt: QP event 16 on cm_id=
ffff8100ba389800 sess_name= 0x0002c9030000a50c0002c9030000a3ec state= 1
			[15513.043877] ib_srpt: Schedule
CM_DISCONNECT_WORK
			[15513.043967] ib_srpt: srpt_cm_drep_recv[1636]
cm_id= ffff8100ba389800
			[15513.044220] ib_srpt: srpt_release_channel:
Release sess= ffff8101c27d3cf0 sess_name=
0x0002c9030000a50c0002c9030000a3ec active_cmd= 7
			[15513.044223] [6160]:
scst_unregister_session:4639:Unregistering session ffff8101c27d3cf0
(wait 0)
			[15739.551108] ib_srpt: ASYNC event= 10 on
device= mlx4_0
			[15831.623484] ib_srpt: ASYNC event= 17 on
device= mlx4_0
			[15831.624195] ib_srpt: ASYNC event= 11 on
device= mlx4_0
			[15831.624400] ib_srpt: ASYNC event= 11 on
device= mlx4_0
			[15831.636997] ib_srpt: ASYNC event= 9 on
device= mlx4_0
			[15833.127349] ib_srpt: Host login
i_port_id=0x2c9030000a50c:0x2c9030000a3ec
t_port_id=0x2c9030000a50c:0x2c9030000a50c it_iu_len=996
			[15833.128607] ib_srpt: srpt_create_ch_ib[1228]
max_cqe= 4095 max_sge= 29 cm_id= ffff8101b38b0a00
			[15833.128927] [6823]: scst:
scst_init_session:4509:Using security group "Default" for initiator
"0x0002c9030000a50c0002c9030000a3ec"
			[15833.128938] [6823]:
scst_init_session:4512:Assigning session ffff810100467c30 to acg Default
			[15833.128951] [6823]:
scst_alloc_add_tgt_dev:405:host=9, channel=0, id=0, lun=0, SCST lun=0
			[15833.128958] [6823]:
scst_alloc_set_UA:2486:Adding new UA to tgt_dev ffff8101c953de60
			[15833.128980] ib_srpt: Establish connection
sess= ffff810100467c30 name= 0x0002c9030000a50c0002c9030000a3ec cm_id=
ffff8101b38b0a00
			[15833.132787] [6818]: scst:
scst_set_pending_UA:2420:Setting pending UA cmd ffff810100ba66d0
			[15841.612022] ib_srpt: ASYNC event= 11 on
device= mlx4_0
			[16046.074918] igb: eth1: igb_watchdog_task: NIC
Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
			[16056.648672] eth1: no IPv6 routers present
			[17209.196025] local QP operation err (QPN
0e004a, WQE index 3d40, vendor syndrome 6f, opcode = 5e)
			[17209.196032] CQE contents 000e004a 00000000
00000000 00000000 00000000 00000000 3d406f02 000000de
			[17209.196033] ib_srpt: failed send status= 2
			[17209.196037] ib_srpt: failed send status= 5
			[17209.196040] ib_srpt: failed send status= 5
			[17209.196044] ib_srpt: failed send status= 5
			[17209.196069] ib_srpt: QP event 16 on cm_id=
ffff8101b38b0a00 sess_name= 0x0002c9030000a50c0002c9030000a3ec state= 1
			[17209.196074] ib_srpt: Schedule
CM_DISCONNECT_WORK
			[17209.196078] ib_srpt: srpt_xmit_response[1960]
tag= 10296991 channel in bad state 2
			[17209.196083] ib_srpt: failed send status= 5
			[17209.196089] [6820]: scst:
scst_xmit_response:2590:***ERROR*** Target driver ib_srpt
xmit_response() returned fatal error
			[17209.196099] ib_srpt: srpt_xmit_response[1960]
tag= 10296992 channel in bad state 2
			[17209.196104] [6819]: scst:
scst_xmit_response:2590:***ERROR*** Target driver ib_srpt
xmit_response() returned fatal error
			[17209.196157] ib_srpt: srpt_xmit_response[1960]
tag= 10296993 channel in bad state 2
			[17209.196160] [6817]: scst:
scst_xmit_response:2590:***ERROR*** Target driver ib_srpt
xmit_response() returned fatal error
			[17209.196173] ib_srpt: srpt_cm_drep_recv[1636]
cm_id= ffff8101b38b0a00
			[17209.196179] ib_srpt: srpt_xmit_response[1960]
tag= 10296994 channel in bad state 2
			[17209.196182] [6814]: scst:
scst_xmit_response:2590:***ERROR*** Target driver ib_srpt
xmit_response() returned fatal error
			[17209.196265] ib_srpt: srpt_xmit_response[1960]
tag= 10296995 channel in bad state 2
			[17209.196269] [6818]: scst:
scst_xmit_response:2590:***ERROR*** Target driver ib_srpt
xmit_response() returned fatal error
			[17209.196277] ib_srpt: srpt_xmit_response[1960]
tag= 10296996 channel in bad state 2
			[17209.196278] [6818]: scst:
scst_xmit_response:2590:***ERROR*** Target driver ib_srpt
xmit_response() returned fatal error
			[17209.196308] ib_srpt: srpt_xmit_response[1960]
tag= 10296997 channel in bad state 2
			[17209.196309] [6815]: scst:
scst_xmit_response:2590:***ERROR*** Target driver ib_srpt
xmit_response() returned fatal error
			[17209.197269] ib_srpt: srpt_release_channel:
Release sess= ffff810100467c30 sess_name=
0x0002c9030000a50c0002c9030000a3ec active_cmd= 3
			[17209.197272] [6158]:
scst_unregister_session:4639:Unregistering session ffff810100467c30
(wait 0)
			linux-gen24:~ #  

			 

			
________________________________


			Gas prices getting you down? Search AOL Autos
for fuel-efficient used cars
<http://autos.aol.com/used?ncid=aolaut00050000000007> .

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20080630/e27e09f8/attachment.html>


More information about the ofw mailing list