[ofw][patch][IBBUS] Fix for a bug from Bugzilla 1367

Smith, Stan stan.smith at intel.com
Tue Apr 14 11:43:07 PDT 2009


Hello,
  Your ControlDeviceObject (CDO) implementation is very clean, nice job.
I suspect the vstat failure I was seeing may have been due to the fact I was creating the CDO  in the add_device() routine where you, and I agree, create the CDO in the DriverEntry() routine.
Bottom line - I do not see the previous vstat failure with your CDO implementation.

Two small points:

1) When disabling an HCA via the device manager, the device manager view fails to refresh and show the HCA as disabled.
One can force the refresh with a scan for hardware changes.

 2) When all HCA devices have been 'disabled', running vstat.exe crashes the system (Svr2003, (MT23108 InfiniHost) * 2).

The bus_drv_unload() routine is not called for the case where all HCA devices have been disabled. Consequently the symbolic link to the CDO is still valid thus allowing vstat ioctl() access to IBAL layers with no HCAs and eventually a system crash.

A few observations on managing the DOS name symbolic link.

1) There are numerous instances and initializations of dos_name throughout ibbus code.
    One could make 'dos_name' a global to reduce code with a single init and cleanup dos_name usage.

2) make the ControlDeviceObject 'dev_name' a global (persistent) as it's needed to create the dos_name symbolic link upon 1st HCA enable operation in bus_add_device() and needed in bus_drv_unload() to free UNICODE string memory.

3) Initialize the unicode string globals dos_name and dev_name once in DriverEntry()

4) In bus_pnp.c:bus_add_device() for the case of the 1st HCA being added (ic==1), create the dos_name symbolic link; see #5 as the CDO is already present if enabling from a disabled state.

5) The dos_name symbolic link should be destroyed in bus_pnp.c:fdo_release_resources() when the last HCA is disabled as this will prevent vstat from crashing the system; the CDO remains valid.

6) bus_drv_unload() should free Unicode string memory for globals g_CDO_dos_name & g_CDO_dev_name


Stan.



________________________________
From: Leonid Keller [mailto:leonid at mellanox.co.il]
Sent: Saturday, April 11, 2009 2:41 PM
To: Smith, Stan
Cc: ofw at lists.openfabrics.org
Subject: RE: [ofw][patch][IBBUS] Fix for a bug from Bugzilla 1367

This solution is based on the assumption, that applications work with IBAL and not with concrete HCAs.
The Control Device has to convey an IOCTL to IBAL and it will find the way to pass it down to the appropriate device.
I tried VSTAT with one and two cards, it worked correctly.
Could you bring an example of IOCTL, that breaks this assumption ?


________________________________
From: Smith, Stan [mailto:stan.smith at intel.com]
Sent: Tuesday, April 07, 2009 7:19 PM
To: Leonid Keller
Cc: ofw at lists.openfabrics.org
Subject: RE: [ofw][patch][IBBUS] Fix for a bug from Bugzilla 1367

Hi Leo,
  This patch does not apply, per chance could you send the patched files: bus_driver.c & bus_pnp.c  so I can continue testing?

I'm also concerned about tracking of a 'valid' PDO in ioctl handling.
The CDO gets
+    cl_init_pnp_po_ext( g_ControlDeviceObject, NULL,
+     NULL, bus_globals.dbg_lvl, NULL, NULL );
How can the follow bus_driver.c bus_ioctl.c code function without a valid PDO?
My experiments showed vstat quit working unless the CDO 'always' contained a valid p_next_do?
Which exposed the issue of when removing and HCA, one needed to make sure the HCA/PDO being removed was not the one used by the CDO. If so, then the CDO.p_next_do needed to be refreshed with a 'valid' PDO.
If the HCA being removed is the 'last' HCA, then vstat should indeed fail.

 /* Only pass down if not handled and not PDO device. */
 if( status == STATUS_INVALID_DEVICE_REQUEST && p_ext->cl_ext.p_next_do )
 {
  IoSkipCurrentIrpStackLocation( p_irp );
  status = IoCallDriver( p_ext->cl_ext.p_next_do, p_irp );
 }


thanks,

Stan.

________________________________
From: Leonid Keller [mailto:leonid at mellanox.co.il]
Sent: Sunday, April 05, 2009 6:47 AM
To: Smith, Stan
Cc: ofw at lists.openfabrics.org
Subject: [ofw][patch][IBBUS] Fix for a bug from Bugzilla 1367

Bugzilla 1367: If N cards are installed on a machine, a disabling of the first one will
prevent the work of all applications with the rest cards.

Explanation:
   IBBUS driver creates only first instance of IBAL with a name. All the rest
are unnamed. After removing the first card, the named instance is gone and
applications, which are binding to IBAL by name, can't find it and can't work.

Solution: The fix creates IBBUS Control Device of "\\DosDevices\\Global\\ibal<file://\\DosDevices\\Global\\ibal>" name from IBBUS Driver Entry.
All other devices are created unnamed.



Index: bus/kernel/bus_driver.c
===================================================================
--- bus/kernel/bus_driver.c (revision 2076)
+++ bus/kernel/bus_driver.c (working copy)
@@ -60,6 +60,7 @@


 char node_desc[IB_NODE_DESCRIPTION_SIZE];
+PDEVICE_OBJECT g_ControlDeviceObject=NULL;

 bus_globals_t bus_globals = {
  BUS_DBG_ERROR,
@@ -981,6 +982,11 @@

  RtlInitUnicodeString( &dos_name, L"\\DosDevices\\Global\\ibal<file://\\DosDevices\\Global\\ibal>" );
  IoDeleteSymbolicLink( &dos_name );
+ if (g_ControlDeviceObject)
+ {
+  IoDeleteDevice(g_ControlDeviceObject);
+  g_ControlDeviceObject = NULL;
+ }
  cur_conf = bus_globals.p_pkey_conf;
  while(cur_conf)
  {
@@ -1055,7 +1061,46 @@
  g_bfi_InstanceCount = 0;
  memset(  __out_bcount(sizeof(g_bus_filters)) (void*)g_bus_filters, 0,
    sizeof(g_bus_filters) );
+
+ {
+  UNICODE_STRING dev_name, dos_name;
+  bus_fdo_ext_t *p_ext=NULL;

+  RtlInitUnicodeString( &dev_name, AL_DEVICE_NAME );
+  RtlInitUnicodeString( &dos_name, L"\\DosDevices\\Global\\ibal<file://\\DosDevices\\Global\\ibal>" );
+
+  status = IoCreateDevice( p_driver_obj, sizeof(bus_fdo_ext_t),
+   &dev_name, FILE_DEVICE_BUS_EXTENDER,
+   FILE_DEVICE_SECURE_OPEN, FALSE, &g_ControlDeviceObject );
+  if( !NT_SUCCESS(status) )
+  {
+   g_ControlDeviceObject = NULL;
+   BUS_PRINT( BUS_DBG_ERROR,
+    ("Failed to create ControlDeviceObject, status %x.\n",status) );
+  }
+  else {
+   IoDeleteSymbolicLink( &dos_name );
+   status = IoCreateSymbolicLink( &dos_name, &dev_name );
+   if( !NT_SUCCESS(status) )
+   {
+    IoDeleteDevice( g_ControlDeviceObject );
+    g_ControlDeviceObject = NULL;
+    BUS_PRINT( BUS_DBG_ERROR,
+     ("Failed to create symlink for dos name.\n") );
+   }
+   else {
+    BUS_PRINT( BUS_DBG_ERROR,
+     ("SUCCEEDED to create symlink for ControlDeviceObject.\n") );
+
+    p_ext = g_ControlDeviceObject->DeviceExtension;
+    RtlZeroMemory(p_ext, sizeof *p_ext);
+    cl_init_pnp_po_ext( g_ControlDeviceObject, NULL,
+     NULL, bus_globals.dbg_lvl, NULL, NULL );
+    IoInitializeRemoveLock( &p_ext->cl_ext.stop_lock, 'dtci', 0, 1000 );
+   }
+  }
+ }
+
  BUS_EXIT( BUS_DBG_DRV );
  return STATUS_SUCCESS;
 }
Index: bus/kernel/bus_pnp.c
===================================================================
--- bus/kernel/bus_pnp.c (revision 2077)
+++ bus/kernel/bus_pnp.c (working copy)
@@ -60,6 +60,7 @@
 bus_filter_t   g_bus_filters[MAX_BUS_FILTERS];

 extern ib_al_handle_t gh_al; // NULL if AL needs init.
+extern PDEVICE_OBJECT g_ControlDeviceObject;


 static NTSTATUS
@@ -197,7 +198,7 @@
  /* Create the FDO device object to attach to the stack. */

  /* if 1st Bus Filter Instance, then create device names for user ioctl */
- if ( ic == 1 )
+ if ( ic == 1  && !g_ControlDeviceObject)
  {
   RtlInitUnicodeString( &dev_name, AL_DEVICE_NAME );
   RtlInitUnicodeString( &dos_name, L"\\DosDevices\\Global\\ibal<file://\\DosDevices\\Global\\ibal>" );
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20090414/b4d6e65c/attachment.html>


More information about the ofw mailing list