[ofw] RE: IPoIB partition patch

Tzachi Dar tzachid at mellanox.co.il
Thu Jul 17 03:59:55 PDT 2008


Hi Slava,
 
It would be nice if the guids can be byte swapped so they will fit the
guids that are reported by vstat and opensm.
 
I have tried to work with the new partition code but with no real
success.
As a meter of fact, applying the code that you have sent prevents on
many cases the machine from going up.
This is true for Connectx cards as well as Infinihost III cards (a.k.a.
Sinai)
 
The main problem that I was able to discover is that if the function
__ib_mgr_init() returns IB_NOT_FOUND we get into a state that we can
never came out of. That is the port will never come up / or stay down.
Getting into this situation can happen very easily if you are not
connected to a managment switch ( or probably not connected to a cable
at all). 
In this case, the check if( ca_attr->p_port_attr->link_state ==
IB_LINK_ACTIVE) will always fail, and the result will be that we will
always be stacked with returning IB_NOT_FOUND.
 
Please note that before your checkin in the case that the state was not
up, we have continued in creating in the normal running of this
function.
 
After I have fixed that the machine can go up, but partition doesn't
still work. I reach the code
  for(index = 0; index < ca_attr->p_port_attr->num_pkeys; index++)
  {
   if(cl_hton16(p_port->p_adapter->guids.port_guid.pkey) ==
ca_attr->p_port_attr->p_pkey_table[index])
    break;
  }
but ca_attr->p_port_attr->p_pkey_table seems to contain only 0xffff and
0x0000. I don't see the pkeys that I have added on the card.
 
After more checks I have came to conclusion that even before applying
your patch ipoib partions have stopped working on the current version of
the stack.
 
Thanks
Tzachi
 
And one last thing: while running with your patch, and trying to disable
the ibbus, I have also reached the following bugcheck. (not 100% sure if
it is related to your changes or not).
 
MODULE_NAME: ibbus
 
FAULTING_MODULE: 80800000 nt
 
DEBUG_FLR_IMAGE_TIMESTAMP:  487eecc5
 
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx"
referenced memory at "0x%08lx". The memory could not be "%s".
 
FAULTING_IP: 
nt!IoInvalidateDeviceRelations+13
8080cd0f 8b86b0000000    mov     eax,dword ptr [esi+0B0h]
 
EXCEPTION_RECORD:  f78e6958 -- (.exr fffffffff78e6958)
ExceptionAddress: 8080cd0f (nt!IoInvalidateDeviceRelations+0x00000013)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: 004600e1
Attempt to read from address 004600e1
 
CONTEXT:  f78e6654 -- (.cxr fffffffff78e6654)
eax=89c39d78 ebx=89cbac30 ecx=00000000 edx=89c730b8 esi=00460031
edi=89cbace8
eip=8080cd0f esp=f78e6a20 ebp=f78e6a24 iopl=0         nv up ei pl nz na
po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000
efl=00010202
nt!IoInvalidateDeviceRelations+0x13:
8080cd0f 8b86b0000000    mov     eax,dword ptr [esi+0B0h]
ds:0023:004600e1=????????
Resetting default scope
 
DEFAULT_BUCKET_ID:  WRONG_SYMBOLS
 
BUGCHECK_STR:  0x7E
 
LAST_CONTROL_TRANSFER:  from ba81a6f5 to 8080cd0f
 
STACK_TEXT:  
WARNING: Stack unwind information not available. Following frames may be
wrong.
f78e6a24 ba81a6f5 00460031 00000000 00000000
nt!IoInvalidateDeviceRelations+0x13
f78e6a50 ba821e5e 89d66008 00000001 00000001 ibbus!free_port_mgr+0x325
[q:\openib\trunk\core\bus\kernel\bus_port_mgr.c @ 400]
f78e6a74 ba821b38 89d66028 f78e6a90 00000001 ibbus!__destroy_cb+0xfe
[q:\openib\trunk\core\complib\cl_obj.c @ 779]
f78e6a94 ba8213f4 89d66008 00000001 00000002 ibbus!__destroy_obj+0x108
[q:\openib\trunk\core\complib\cl_obj.c @ 709]
f78e6aa8 ba8e80d1 89d66008 00000001 00000001 ibbus!cl_obj_destroy+0x54
[q:\openib\trunk\core\complib\cl_obj.c @ 313]
f78e6ac4 ba81f93e 8a0dd5f0 8a0dd6a8 f78e6af0
ibbus!fdo_release_resources+0x131
[q:\openib\trunk\core\bus\kernel\bus_pnp.c @ 384]
f78e6ad8 ba8f8474 8a0dd5f0 89cbac30 f78e6b9c ibbus!cl_do_remove+0xde
[q:\openib\trunk\core\complib\kernel\cl_pnp_po.c @ 693]
f78e6af8 ba8f54c4 8a0dd5f0 89cbac30 f78e6b9c ibbus!__remove+0x104
[q:\openib\trunk\core\complib\kernel\cl_pnp_po.c @ 661]
f78e6ba4 8083f9d0 8a0dd5f0 89cbac30 f78e6c34 ibbus!cl_pnp+0xe44
[q:\openib\trunk\core\complib\kernel\cl_pnp_po.c @ 260]
f78e6bb8 808f6a25 8a3ad280 8a3ad280 8a3ad130 nt!IofCallDriver+0x38
f78e6be4 808e20b5 8a0dd5f0 f78e6c10 00000000
nt!ObMakeTemporaryObject+0x549
f78e6c38 8080beae 8a3ad280 00000002 00000000
nt!IoForwardIrpSynchronously+0x1414
f78e6c60 808e149b e1614330 00000016 e22e34d8 nt!IoDetachDevice+0xcd
f78e6c78 808e18cc 8a3ad130 00000002 e22e34d8
nt!IoForwardIrpSynchronously+0x7fa
f78e6cac 808e1732 8a3ad280 022e34d8 00000002
nt!IoForwardIrpSynchronously+0xc2b
f78e6d40 808e19b6 f78e6d7c 8a37544c e2292868
nt!IoForwardIrpSynchronously+0xa91
f78e6d5c 808e7879 f78e6d7c 8a38b660 808b70dc
nt!IoForwardIrpSynchronously+0xd15
f78e6d80 8083f72e 88351230 00000000 8a38b660 nt!ExCreateCallback+0x258
f78e6dac 8092ccff 88351230 00000000 00000000 nt!KeRemoveQueue+0x2cb
f78e6ddc 80841a96 8083f671 00000001 00000000 nt!ObAssignSecurity+0x228
00000000 00000000 00000000 00000000 00000000
nt!PsGetCurrentThreadWin32ThreadAndEnterCriticalRegion+0xd5
 

FOLLOWUP_IP: 
ibbus!free_port_mgr+325 [q:\openib\trunk\core\bus\kernel\bus_port_mgr.c
@ 400]
ba81a6f5 8b55f8          mov     edx,dword ptr [ebp-8]
 
FAULTING_SOURCE_CODE:  
   396:    IoInvalidateDeviceRelations(
   397:     p_ext->h_ca->obj.p_ci_ca->verbs.p_hca_dev, BusRelations );
   398: 
   399:    /* Release the reference on the CA object. */
>  400:    deref_al_obj( &p_ext->h_ca->obj );
   401:   }
   402:   BUS_TRACE( BUS_DBG_PNP, ("Deleted device %s: PDO %p, ext
%p\n",
   403:    p_ext->cl_ext.vfptr_pnp_po->identity,
p_ext->cl_ext.p_self_do, p_ext ) );
   404:   IoDeleteDevice( p_ext->cl_ext.p_self_do );
   405:  }
 

SYMBOL_STACK_INDEX:  1
 
SYMBOL_NAME:  ibbus!free_port_mgr+325
 
FOLLOWUP_NAME:  MachineOwner
 
IMAGE_NAME:  ibbus.sys
 
STACK_COMMAND:  .cxr 0xfffffffff78e6654 ; kb
 
BUCKET_ID:  WRONG_SYMBOLS
 
Followup: MachineOwner
---------



________________________________

	From: Slava Strebkov [mailto:slavas at voltaire.com] 
	Sent: Thursday, July 17, 2008 8:37 AM
	To: Tzachi Dar
	Cc: ofw at lists.openfabrics.org
	Subject: RE: [ofw] RE: IPoIB partition patch
	
	

	Hi Tzachi,

	Please see my answers below.

	 

	-----Original Message-----
	From: Tzachi Dar [mailto:tzachid at mellanox.co.il] 
	Sent: Wednesday, July 16, 2008 11:48 PM
	To: Slava Strebkov
	Cc: ofw at lists.openfabrics.org
	Subject: RE: [ofw] RE: IPoIB partition patch

	 

	 

	Hi Slava,

	 

	I have been looking at the partition patch and I have two
questions:

	 

	1) The function bus_add_pkey can be called twice. Once from
proxy_ioctl

	and once from ioc_ioctl. I guess that one of them is redundant.
Is that

	true?

	Yes, proxy_ioctl is redundant, sorry I forgot to remove. 

	 

	2) I have noticed that when entering guids I have to do a byte
swap of

	the GUID. That is if I look at the GUID as it is printed by
vstat than I

	have to do the swap manually. Is there any reason for that?

	 

	I took GUID from Device manager (device ID). That's the only
reason for byte swap.

	 

	Thanks

	Tzachi

	 

	Thanks

	Slava

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20080717/a73186b2/attachment.html>


More information about the ofw mailing list