[ofw] partial review of mlx4 branch

Leonid Keller leonid at mellanox.co.il
Sun Oct 16 15:24:10 PDT 2011


Hi Sean,

Thank you for the comments.
I'm going to answer them one-by-one in another mail.
For now - some general notes.

1. Winverbs
I don't think I broke IBA. Please, show me where, if I missed something.
Increasing of the version is needed to tell about new field 'Transport'.
As to dependency on IBAL/COMPLIB dlls:
It's really IBAT_EX is dependent on them, while Winverbs is dependent on IBAT_EX.
The latter is needed to add a seamless support to RoCE.
Winverbs.dll is supplied as a part of OFED suit; it was dependent on kernel IBAT service anyway and I don't think it is too bad if it will be dependent on other dlls of the suit.
>From the other side, it's not good when different user space components use IBAT service as they want.
Because the implementation of the service can change.
We really came to the idea of IBAT_EX.dll, which hides the implementation of IBAT service, because of the problems of current implementation.
In OFED stack it is implemented inside IPoIB driver, supports only IB transport and is present once per machine.
We need it to support two transports today - IB and RoCE - and more in the future.
We need to deal with situation when there is no IPoIB driver loaded.
We need to support configurations where several HCA cards with several transports are working simultaneously.
That's why we developed IBAT_EX and changed applications to use it.
You may change WinVerbs.dll back and implement RoCE support inside of it.
You may remove complib dependency from IBAT_EX, but it will still need IBAL.
(One can also replace calling ibal.dll functions by sending ioctls, but I personally do not like the idea).

2.Complib
I extended complib memory tracking mechanism to be able see memory leakage printed.
Could you suggest, how can I do it using standard system functions ?

3. IBA
What IBAs were broken ?
How do you suggest to extend functionality while keeping IBA intact ?
Or I misunderstood your idea ?



-----Original Message-----
From: ofw-bounces at lists.openfabrics.org [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Hefty, Sean
Sent: Tuesday, October 11, 2011 8:38 PM
To: ofw_list
Subject: [ofw] partial review of mlx4 branch

See below for comments on the changes in branches/mlx4 compared to the trunk.  Hopefully all of my comments are marked with 'SH:'.  I did not review the hw subdirectories.  The changes there are extensive.

The biggest concerns from my personal perspective were:

* winverbs cannot depend on the ibal or complib libraries
* ibverbs must maintain binary compatibility with existing applications
* we must support a mix of old and new libraries

The biggest concern that I believe OFA should have is:

* Binary compatibility with existing applications must be maintained.
  This includes all library interfaces as well as the user to kernel ABI.


- Sean



diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/al/kernel/al_cm_cep.c branches\mlx4/core/al/kernel/al_cm_cep.c
--- trunk/core/al/kernel/al_cm_cep.c	2011-09-13 09:15:33.785667700 -0700
+++ branches\mlx4/core/al/kernel/al_cm_cep.c	2011-10-10 16:59:00.857865300 -0700
@@ -677,6 +677,16 @@ __reject_req(
 	p_mad->timeout_ms = 0;
 	p_mad->resp_expected = FALSE;
 
+	/* Switch src and dst in GRH */
+	if(p_mad->grh_valid)
+	{
+		ib_gid_t dest_gid = {0};

SH: no need to initialize

+		memcpy(&dest_gid, &p_mad->p_grh->src_gid, sizeof(ib_gid_t));
+		memcpy(&p_mad->p_grh->src_gid, &p_mad->p_grh->dest_gid, sizeof(ib_gid_t));
+		memcpy(&p_mad->p_grh->dest_gid, &dest_gid, sizeof(ib_gid_t));
+	}
+	
 	__cep_send_mad( p_port_cep, p_mad );
 
 	AL_EXIT( AL_DBG_CM );
@@ -3390,7 +3400,7 @@ __cep_queue_mad(
 	// TODO: Remove - manage above core kernel CM code
 	/* NDI connection request case */
 	if ( p_cep->state == CEP_STATE_LISTEN &&
-		(p_cep->sid & ~0x0ffffffI64) == IB_REQ_CM_RDMA_SID_PREFIX )
+		(p_cep->sid & IB_REQ_CM_RDMA_SID_PREFIX_MASK) == IB_REQ_CM_RDMA_SID_PREFIX )
 	{ /* Try to complete pending IRP, if any */
 		mad_cm_req_t* p_req = (mad_cm_req_t*)ib_get_mad_buf( p_mad );
 		ib_cm_rdma_req_t *p_rdma_req = (ib_cm_rdma_req_t *)p_req->pdata;
@@ -3401,7 +3411,7 @@ __cep_queue_mad(
 			 (p_rdma_req->ipv != 0x40 && p_rdma_req->ipv != 0x60) )
 		{
 			AL_PRINT_EXIT( TRACE_LEVEL_ERROR, AL_DBG_ERROR, 
-				("NDI connection req is rejected: maj_min_ver %d, ipv %#x \n", 
+				("RDMA CM connection req is rejected: maj_min_ver %d, ipv %#x \n", 
 				p_rdma_req->maj_min_ver, p_rdma_req->ipv ) );
 			return IB_UNSUPPORTED;
 		}
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/al/kernel/al_ndi_cm.c branches\mlx4/core/al/kernel/al_ndi_cm.c
--- trunk/core/al/kernel/al_ndi_cm.c	2011-09-13 09:15:33.836672800 -0700
+++ branches\mlx4/core/al/kernel/al_ndi_cm.c	2011-10-10 16:59:00.909870500 -0700
@@ -461,7 +461,8 @@ static VOID __ndi_acquire_lock(
 	nd_csq_t *p_ndi_csq = (nd_csq_t*)Csq;
 
 	KeAcquireSpinLock( &p_ndi_csq->lock, pIrql );
-} +}
+
 
 #ifdef NTDDI_WIN8
 static IO_CSQ_RELEASE_LOCK __ndi_release_lock;
@@ -1111,7 +1112,7 @@ __ndi_fill_cm_req(
 
 	memset( p_cm_req, 0, sizeof(*p_cm_req) );
 
-	p_cm_req->service_id = IB_REQ_CM_RDMA_SID_PREFIX | (p_req->prot << 16) | p_req->dst_port;
+	p_cm_req->service_id = ib_cm_rdma_sid( p_req->prot, p_req->dst_port );
 	p_cm_req->p_primary_path = p_path_rec;
 
 	p_cm_req->qpn = qpn;
@@ -1964,9 +1965,12 @@ ndi_listen_cm(
 	p_csq->state = NDI_CM_LISTEN;
 	__ndi_release_lock( &p_csq->csq, irql );
 
-	if( (p_listen->svc_id & 0xFFFF) == 0 )
+	if( ib_cm_rdma_sid_port( p_listen->svc_id ) == 0 )
 	{
-		p_listen->svc_id |= (USHORT)cid | (USHORT)(cid >> 16);
+		p_listen->svc_id = ib_cm_rdma_sid(
+			ib_cm_rdma_sid_protocol( p_listen->svc_id ),
+			(USHORT)cid | (USHORT)(cid >> 16)
+			);
 	}
 
 	ib_status = al_cep_listen( h_al, cid, p_listen );
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/al/kernel/al_pnp.c branches\mlx4/core/al/kernel/al_pnp.c
--- trunk/core/al/kernel/al_pnp.c	2011-09-13 09:15:33.881677300 -0700
+++ branches\mlx4/core/al/kernel/al_pnp.c	2011-10-10 16:59:00.957875300 -0700
@@ -1438,6 +1438,10 @@ __pnp_check_ports(
 			( (p_new_port_attr->link_state == IB_LINK_ARMED) ||
 			(p_new_port_attr->link_state == IB_LINK_ACTIVE) ) )
 		{
+
+			AL_PRINT( TRACE_LEVEL_INFORMATION, AL_DBG_PNP,
+				("pkey or gid changes\n") );
+
 			/* A different number of P_Keys indicates a change.*/
 			if( p_old_port_attr->num_pkeys != p_new_port_attr->num_pkeys )
 			{
@@ -1486,6 +1490,8 @@ __pnp_check_ports(
 		if( (p_old_port_attr->lid != p_new_port_attr->lid) ||
 			(p_old_port_attr->lmc != p_new_port_attr->lmc) )
 		{
+			AL_PRINT( TRACE_LEVEL_INFORMATION, AL_DBG_PNP,
+				("lid/lmc changed \n") );
 			event_rec.pnp_event = IB_PNP_LID_CHANGE;
 			__pnp_process_port_forward( &event_rec );
 		}
@@ -1493,6 +1499,8 @@ __pnp_check_ports(
 		if( (p_old_port_attr->sm_lid != p_new_port_attr->sm_lid) ||
 			(p_old_port_attr->sm_sl != p_new_port_attr->sm_sl) )
 		{
+			AL_PRINT( TRACE_LEVEL_INFORMATION, AL_DBG_PNP,
+				("sm_lid/sm_sl changed \n") );
 			event_rec.pnp_event = IB_PNP_SM_CHANGE;
 			__pnp_process_port_forward( &event_rec );
 		}
@@ -1500,6 +1508,8 @@ __pnp_check_ports(
 		if( p_old_port_attr->subnet_timeout !=
 			p_new_port_attr->subnet_timeout )
 		{
+			AL_PRINT( TRACE_LEVEL_INFORMATION, AL_DBG_PNP,
+				("subnet_timeout changed \n") );
 			event_rec.pnp_event = IB_PNP_SUBNET_TIMEOUT_CHANGE;
 			__pnp_process_port_forward( &event_rec );
 		}
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/al/kernel/al_proxy.c branches\mlx4/core/al/kernel/al_proxy.c
--- trunk/core/al/kernel/al_proxy.c	2011-09-13 09:15:34.109700100 -0700
+++ branches\mlx4/core/al/kernel/al_proxy.c	2011-10-10 16:59:01.211900700 -0700
@@ -424,6 +424,10 @@ proxy_pnp_port_cb(
 
 	AL_ENTER( AL_DBG_PROXY_CB );
 
+	AL_PRINT( TRACE_LEVEL_INFORMATION, AL_DBG_PNP,
+		("p_pnp_rec->pnp_event = 0x%x (%s)\n",
+		p_pnp_rec->pnp_event, ib_get_pnp_event_str( p_pnp_rec->pnp_event )) );
+
 	p_context = p_pnp_rec->pnp_context;
 
 	/*
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/al/kernel/al_sa_req.c branches\mlx4/core/al/kernel/al_sa_req.c
--- trunk/core/al/kernel/al_sa_req.c	2011-09-13 09:15:33.980687200 -0700
+++ branches\mlx4/core/al/kernel/al_sa_req.c	2011-10-10 16:59:01.096889200 -0700
@@ -234,26 +234,42 @@ sa_req_mgr_pnp_cb(
 	sa_req_svc_t				*p_sa_req_svc;
 	ib_av_attr_t				av_attr;
 	ib_pd_handle_t				h_pd;
-	ib_api_status_t				status;
+	ib_api_status_t				status = IB_SUCCESS;
+	ib_pnp_port_rec_t			*p_port_rec = (ib_pnp_port_rec_t*)p_pnp_rec;
 
 	AL_ENTER( AL_DBG_SA_REQ );
 	CL_ASSERT( p_pnp_rec );
 	CL_ASSERT( p_pnp_rec->pnp_context == &gp_sa_req_mgr->obj );
 
+	AL_PRINT( TRACE_LEVEL_INFORMATION, AL_DBG_PNP,
+		("p_pnp_rec->pnp_event = 0x%x (%s)\n",
+		p_pnp_rec->pnp_event, ib_get_pnp_event_str( p_pnp_rec->pnp_event )) );
+
 	/* Dispatch based on the PnP event type. */
 	switch( p_pnp_rec->pnp_event )
 	{
 	case IB_PNP_PORT_ADD:
-		status = create_sa_req_svc( (ib_pnp_port_rec_t*)p_pnp_rec );
+		if ( p_port_rec->p_port_attr->transport == RDMA_TRANSPORT_RDMAOE )
+		{	// RoCE port
+			AL_PRINT( TRACE_LEVEL_WARNING, AL_DBG_ERROR,
+				("create_sa_req_svc is not called for RoCE port %d\n", p_port_rec->p_port_attr->port_num ) );

SH: Please change the print from warning / error to indicate that this is the normal behavior.

+		}
+		else
+		{
+			status = create_sa_req_svc( p_port_rec );
 		if( status != IB_SUCCESS )
 		{
 			AL_PRINT( TRACE_LEVEL_ERROR, AL_DBG_ERROR,
-				("create_sa_req_svc failed: %s\n", ib_get_err_str(status)) );
+					("create_sa_req_svc for port %d failed: %s\n", 
+					p_port_rec->p_port_attr->port_num, ib_get_err_str(status)) );
+			}
 		}
 		break;
 
 	case IB_PNP_PORT_REMOVE:
-		CL_ASSERT( p_pnp_rec->context );
+		// context will be NULL for RoCE port
+		if ( !p_pnp_rec->context )
+			break;

SH: Move this check to the top of the function to avoid duplicating it.  If the context is set by IB_PNP_PORT_ADD, just add that to the check.

 		p_sa_req_svc = p_pnp_rec->context;
 		ref_al_obj( &p_sa_req_svc->obj );
 		p_sa_req_svc->obj.pfn_destroy( &p_sa_req_svc->obj, NULL );
@@ -263,15 +279,15 @@ sa_req_mgr_pnp_cb(
 
 	case IB_PNP_PORT_ACTIVE:
 	case IB_PNP_SM_CHANGE:
-		CL_ASSERT( p_pnp_rec->context );
+		// context will be NULL for RoCE port
+		if ( !p_pnp_rec->context )
+			break;
 		AL_PRINT( TRACE_LEVEL_INFORMATION, AL_DBG_SA_REQ,
 			("updating SM information\n") );
 
 		p_sa_req_svc = p_pnp_rec->context;
-		p_sa_req_svc->sm_lid =
-			((ib_pnp_port_rec_t*)p_pnp_rec)->p_port_attr->sm_lid;
-		p_sa_req_svc->sm_sl =
-			((ib_pnp_port_rec_t*)p_pnp_rec)->p_port_attr->sm_sl;
+		p_sa_req_svc->sm_lid = p_port_rec->p_port_attr->sm_lid;
+		p_sa_req_svc->sm_sl = p_port_rec->p_port_attr->sm_sl;
 
 		/* Update the address vector. */
 		status = ib_query_av( p_sa_req_svc->h_av, &av_attr, &h_pd );
@@ -298,7 +314,9 @@ sa_req_mgr_pnp_cb(
 	case IB_PNP_PORT_INIT:
 	case IB_PNP_PORT_ARMED:
 	case IB_PNP_PORT_DOWN:
-		CL_ASSERT( p_pnp_rec->context );
+		// context will be NULL for RoCE port
+		if ( !p_pnp_rec->context )
+			break;
 		p_sa_req_svc = p_pnp_rec->context;
 		p_sa_req_svc->sm_lid = 0;
 		p_sa_req_svc->sm_sl = 0;
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/al/kernel/al_smi.c branches\mlx4/core/al/kernel/al_smi.c
--- trunk/core/al/kernel/al_smi.c	2011-09-13 09:15:33.822671400 -0700
+++ branches\mlx4/core/al/kernel/al_smi.c	2011-10-10 16:59:00.895869100 -0700
@@ -905,7 +905,10 @@ __complete_send_mad(
 
 	/* Construct a send work completion. */
 	cl_memclr( &wc, sizeof( ib_wc_t ) );
-	wc.wr_id	= p_mad_wr->send_wr.wr_id;
+	if (p_mad_wr) {
+		// Handling the special race where p_mad_wr that comes from spl_qp can be NULL
+		wc.wr_id	= p_mad_wr->send_wr.wr_id;
+	}

SH: Please provide more details on why this can happen.  I'm not asking for a code comment, just a response.  It may make sense to apply this change separate, so someone can find the details in the change log.

 	wc.wc_type	= IB_WC_SEND;
 	wc.status	= wc_status;
 
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/bus/kernel/bus_pnp.c branches\mlx4/core/bus/kernel/bus_pnp.c
--- trunk/core/bus/kernel/bus_pnp.c	2011-09-13 09:15:31.623451500 -0700
+++ branches\mlx4/core/bus/kernel/bus_pnp.c	2011-10-10 16:58:57.978577400 -0700
@@ -44,7 +44,6 @@
 #include "bus_port_mgr.h"
 #include "bus_iou_mgr.h"
 #include "complib/cl_memory.h"
-#include "al_cm_cep.h"
 #include "al_mgr.h"
 #include "bus_ev_log.h"
 
@@ -52,7 +51,6 @@
 #include "rdma/verbs.h"
 #include "iba/ib_al_ifc.h"
 #include "iba/ib_ci_ifc.h"
-#include "iba/ib_cm_ifc.h"
 #include "al_cm_cep.h"
 #include "al_mgr.h"
 #include "bus_ev_log.h"
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/bus/kernel/bus_port_mgr.c branches\mlx4/core/bus/kernel/bus_port_mgr.c
--- trunk/core/bus/kernel/bus_port_mgr.c	2011-09-13 09:15:31.580447200 -0700
+++ branches\mlx4/core/bus/kernel/bus_port_mgr.c	2011-10-10 16:58:57.922571800 -0700
@@ -772,6 +772,15 @@ port_mgr_port_add(
 	}
 
 	/*
+	 * Don't create PDO for IPoIB (and start IPoIB) while over a RoCE port.
+	 */
+	if ( p_pnp_rec->p_port_attr->transport != RDMA_TRANSPORT_IB ){
+		BUS_TRACE_EXIT( BUS_DBG_PNP,("IPoIb is not started for RoCE port. %s ca_guid %I64x port(%d)\n",
+								p_bfi->whoami, p_bfi->ca_guid, p_pnp_rec->p_port_attr->port_num));
+		return IB_SUCCESS;
+	}
+
+	/*
 	 * Allocate a PNP context for this object. pnp_rec.context is obj unique.
 	 */
 	if ( !p_ctx ) {
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/bus/kernel/SOURCES branches\mlx4/core/bus/kernel/SOURCES
--- trunk/core/bus/kernel/SOURCES	2011-09-13 09:15:31.567445900 -0700
+++ branches\mlx4/core/bus/kernel/SOURCES	2011-10-10 16:58:57.908570400 -0700
@@ -17,7 +17,7 @@ SOURCES= ibbus.rc		\
 	bus_iou_mgr.c		\
 	bus_stat.c
 
-INCLUDES=..\..\..\inc;..\..\..\inc\kernel;..\..\al;..\..\al\kernel;..\..\bus\kernel\$O;
+INCLUDES=..\..\..\inc;..\..\..\inc\kernel;..\..\al;..\..\al\kernel;..\..\bus\kernel\$O;..\..\..\hw\mlx4\inc;..\..\..\inc\kernel\iba;
 
 C_DEFINES=$(C_DEFINES) -DDRIVER -DDEPRECATE_DDK_FUNCTIONS -DNEED_CL_OBJ
 
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/complib/cl_memory.c branches\mlx4/core/complib/cl_memory.c
--- trunk/core/complib/cl_memory.c	2011-09-13 09:15:30.739363100 -0700
+++ branches\mlx4/core/complib/cl_memory.c	2011-10-10 16:58:57.086488200 -0700
@@ -92,6 +92,7 @@ __cl_mem_track_start( void )
 	if( status != CL_SUCCESS )
 	{
 		__cl_free_priv( gp_mem_tracker );
+		gp_mem_tracker = NULL;
 		return;
 	}
 }
@@ -179,8 +180,15 @@ cl_mem_display( void )
 		 */
 		p_hdr = PARENT_STRUCT( p_map_item, cl_malloc_hdr_t, map_item );
 
-		cl_msg_out( "\tMemory block at %p allocated in file %s line %d\n",
-			p_hdr->p_mem, p_hdr->file_name, p_hdr->line_num );
+#ifdef CL_KERNEL
+		DbgPrintEx(DPFLTR_IHVNETWORK_ID, DPFLTR_ERROR_LEVEL, 
+			"\tMemory block for '%s' at %p of size %#x allocated in file %s line %d\n",
+			(p_hdr->tag == NULL) ? "Unknown" : p_hdr->tag,
+			p_hdr->p_mem, p_hdr->size, p_hdr->file_name, p_hdr->line_num );
+#else
+		cl_msg_out( "\tMemory block at %p of size %#x allocated in file %s line %d\n",
+			p_hdr->p_mem, p_hdr->size, p_hdr->file_name, p_hdr->line_num );
+#endif
 
 		p_map_item = cl_qmap_next( p_map_item );
 	}
@@ -189,18 +197,21 @@ cl_mem_display( void )
 }
 
 
+
 /*
  * Allocates memory and stores information about the allocation in a list.
  * The contents of the list can be printed out by calling the function
  * "MemoryReportUsage".  Memory allocation will succeed even if the list
  * cannot be created.
  */
+static 
 void*
-__cl_malloc_trk(
+__cl_malloc_trk_internal(
 	IN	const char* const	p_file_name,
 	IN	const int32_t		line_num,
 	IN	const size_t		size,
-	IN	const boolean_t		pageable )
+	IN	const boolean_t		pageable,
+	IN	const char*			tag )
 {
 	cl_malloc_hdr_t	*p_hdr;
 	cl_list_item_t	*p_list_item;
@@ -264,6 +275,8 @@ __cl_malloc_trk(
 	 * not in the list without dereferencing memory we do not own.
 	 */
 	p_hdr->p_mem = p_mem;
+	p_hdr->size = (uint32_t)size;
+	p_hdr->tag = (char*)tag;
 
 	/* Insert the header structure into our allocation list. */
 	cl_qmap_insert( &gp_mem_tracker->alloc_map, (uintptr_t)p_mem, &p_hdr->map_item );
@@ -272,6 +285,34 @@ __cl_malloc_trk(
 	return( p_mem );
 }
 
+/*
+ * Allocates memory and stores information about the allocation in a list.
+ * The contents of the list can be printed out by calling the function
+ * "MemoryReportUsage".  Memory allocation will succeed even if the list
+ * cannot be created.
+ */
+void*
+__cl_malloc_trk(
+	IN	const char* const	p_file_name,
+	IN	const int32_t		line_num,
+	IN	const size_t		size,
+	IN	const boolean_t		pageable )
+{
+	return __cl_malloc_trk_internal( p_file_name,
+		line_num, size, pageable, NULL );
+}
+
+void*
+__cl_malloc_trk_ex(
+	IN	const char* const	p_file_name,
+	IN	const int32_t		line_num,
+	IN	const size_t		size,
+	IN	const boolean_t		pageable,
+	IN	const char*			tag )
+{
+	return __cl_malloc_trk_internal( p_file_name,
+		line_num, size, pageable, tag );
+}
 
 /*
  * Allocate non-tracked memory.
@@ -301,6 +342,22 @@ __cl_zalloc_trk(
 	return( p_buffer );
 }
 
+void*
+__cl_zalloc_trk_ex(
+	IN	const char* const	p_file_name,
+	IN	const int32_t		line_num,
+	IN	const size_t		size,
+	IN	const boolean_t		pageable,
+	IN	const char*			tag )
+{
+	void	*p_buffer;
+
+	p_buffer = __cl_malloc_trk_ex( p_file_name, line_num, size, pageable, tag );
+	if( p_buffer )
+		cl_memclr( p_buffer, size );
+
+	return( p_buffer );
+}

SH: We need to stop abstracting memory allocations.  There are already tools available for tracking memory allocations, especially for the kernel.

 void*
 __cl_zalloc_ntrk(
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/complib/cl_memtrack.h branches\mlx4/core/complib/cl_memtrack.h
--- trunk/core/complib/cl_memtrack.h	2011-09-13 09:15:30.757364900 -0700
+++ branches\mlx4/core/complib/cl_memtrack.h	2011-10-10 16:58:57.108490400 -0700
@@ -76,6 +76,8 @@ typedef struct _cl_malloc_hdr
 	void				*p_mem;
 	char				file_name[FILE_NAME_LENGTH];
 	int32_t				line_num;
+	int32_t				size;
+	char 				*tag;
 
 } cl_malloc_hdr_t;
 
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/dirs branches\mlx4/core/dirs
--- trunk/core/dirs	2011-09-13 09:15:40.866375700 -0700
+++ branches\mlx4/core/dirs	2011-10-10 16:59:07.969576400 -0700
@@ -4,5 +4,6 @@ DIRS=\
 	bus			\
 	iou			\
 	ibat		\
+	ibat_ex   \
 	winverbs	\
 	winmad
Only in branches\mlx4/core: ibat_ex
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/winverbs/kernel/wv_device.c branches\mlx4/core/winverbs/kernel/wv_device.c
--- trunk/core/winverbs/kernel/wv_device.c	2011-09-13 09:15:26.752964500 -0700
+++ branches\mlx4/core/winverbs/kernel/wv_device.c	2011-10-10 16:58:53.501129700 -0700
@@ -135,10 +135,12 @@ static void WvDeviceEventHandler(ib_even
 			WvDeviceCompleteRequests(&dev->pPorts[i], STATUS_SUCCESS, event);
 		}
 	} else {
+		if(pEvent->port_number <= dev->PortCount) {
 		WvDeviceCompleteRequests(&dev->pPorts[pEvent->port_number - 1],
 								 STATUS_SUCCESS, event);
 	}
 }
+}

SH: This check is not needed.  Upper level drivers must be able to trust that the lower drivers will not give them completely bogus data.
 
 static WV_DEVICE *WvDeviceAlloc(WV_PROVIDER *pProvider)
 {
@@ -216,6 +218,8 @@ static NTSTATUS WvDeviceCreatePorts(WV_D
 		return STATUS_NO_MEMORY;
 	}
 
+	ASSERT(ControlDevice != NULL);
+
 	WDF_IO_QUEUE_CONFIG_INIT(&config, WdfIoQueueDispatchManual);
 	for (i = 0; i < pDevice->PortCount; i++) {
 		pDevice->pPorts[i].Flags = 0;
@@ -537,8 +541,8 @@ static void WvConvertPortAttr(WV_IO_PORT
 	pAttributes->ActiveWidth	= pPortAttr->active_width;
 	pAttributes->ActiveSpeed	= pPortAttr->active_speed;
 	pAttributes->PhysicalState	= pPortAttr->phys_state;
+	pAttributes->Transport		= (UINT8) pPortAttr->transport;
 	pAttributes->Reserved[0]	= 0;
-	pAttributes->Reserved[1]	= 0;

SH: This is fine, but user space must still support older kernels which set this field to 0.

 }
 
 void WvDeviceQuery(WV_PROVIDER *pProvider, WDFREQUEST Request)
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/winverbs/kernel/wv_driver.c branches\mlx4/core/winverbs/kernel/wv_driver.c
--- trunk/core/winverbs/kernel/wv_driver.c	2011-09-13 09:15:26.724961700 -0700
+++ branches\mlx4/core/winverbs/kernel/wv_driver.c	2011-10-10 16:58:53.417121300 -0700
@@ -31,8 +31,10 @@
 #include <wdf.h>
 #include <wdmsec.h>
 #include <ntstatus.h>
+#include <initguid.h>
 
 #include "index_list.c"
+#include <rdma/verbs.h>
 #include "wv_driver.h"
 #include "wv_ioctl.h"
 #include "wv_provider.h"
@@ -44,10 +46,6 @@
 #include "wv_qp.h"
 #include "wv_ep.h"
 
-#include <initguid.h>
-#include <rdma/verbs.h>
-#include <iba/ib_cm_ifc.h>
-

SH: These changes are not necessary, and we need ib_cm_ifc.h, so it should explicitly be included.

 WDF_DECLARE_CONTEXT_TYPE_WITH_NAME(WV_RDMA_DEVICE, WvRdmaDeviceGetContext)
 
 WDFDEVICE				ControlDevice = NULL;
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/winverbs/user/SOURCES branches\mlx4/core/winverbs/user/SOURCES
--- trunk/core/winverbs/user/SOURCES	2011-09-13 09:15:27.658055000 -0700
+++ branches\mlx4/core/winverbs/user/SOURCES	2011-10-10 16:58:54.372216800 -0700
@@ -29,7 +29,15 @@ INCLUDES = ..;..\..\..\inc;..\..\..\inc\
 
 USER_C_FLAGS = $(USER_C_FLAGS) -DEXPORT_WV_SYMBOLS
 
+!if !$(FREEBUILD)
+C_DEFINES=$(C_DEFINES) -D_DEBUG -DDEBUG -DDBG
+!endif
+
 TARGETLIBS = \
 	$(SDK_LIB_PATH)\kernel32.lib	\
 	$(SDK_LIB_PATH)\uuid.lib		\
-	$(SDK_LIB_PATH)\ws2_32.lib
+	$(SDK_LIB_PATH)\ws2_32.lib      \
+	$(SDK_LIB_PATH)\iphlpapi.lib 	\
+	$(TARGETPATH)\*\ibat_ex.lib     \
+        $(TARGETPATH)\*\ibal.lib        \
+        $(TARGETPATH)\*\complib.lib

SH: No.  Winverbs should not depend on ibal or complib.  It shouldn't even depend on ibat_ex if that can be helped.  It should be as stand-alone as possible to make it easier to use.  This is why winverbs sent IOCTLs directly to the kernel for translations.

diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/winverbs/user/wv_provider.cpp branches\mlx4/core/winverbs/user/wv_provider.cpp
--- trunk/core/winverbs/user/wv_provider.cpp	2011-09-13 09:15:27.769066100 -0700
+++ branches\mlx4/core/winverbs/user/wv_provider.cpp	2011-10-10 16:58:54.380217600 -0700
@@ -35,6 +35,7 @@
 #include "wv_device.h"
 #include "wv_ep.h"
 #include "wv_ioctl.h"
+#include <iba/ibat_ex.h>
 
 CWVProvider::CWVProvider()
 {
@@ -136,42 +137,14 @@ out:
 STDMETHODIMP CWVProvider::
 TranslateAddress(const SOCKADDR* pAddress, WV_DEVICE_ADDRESS* pDeviceAddress)
 {
-	HANDLE hIbat;
-	IOCTL_IBAT_IP_TO_PORT_IN addr;
 	IBAT_PORT_RECORD port;
-	DWORD bytes;
-	HRESULT hr;
-
-	hIbat = CreateFileW(IBAT_WIN32_NAME, GENERIC_READ | GENERIC_WRITE,
-						FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
-						OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
-	if (hIbat == INVALID_HANDLE_VALUE) {
-		return HRESULT_FROM_WIN32(GetLastError());
-	}
-
-	addr.Version = IBAT_IOCTL_VERSION;
-	if (pAddress->sa_family == AF_INET) {
-		addr.Address.IpVersion = 4;
-		RtlCopyMemory(addr.Address.Address + 12,
-					  &((SOCKADDR_IN *)pAddress)->sin_addr, 4);
-	} else {
-		addr.Address.IpVersion = 6;
-		RtlCopyMemory(addr.Address.Address,
-					  &((SOCKADDR_IN6 *)pAddress)->sin6_addr, 16);
-	}
-
-	if (DeviceIoControl(hIbat, IOCTL_IBAT_IP_TO_PORT,
-						&addr, sizeof addr, &port, sizeof port, &bytes, NULL)) {
-		hr = WV_SUCCESS;
+	HRESULT hr = IBAT_EX::IpToPort( pAddress, &port );
+	if ( FAILED( hr ) )
+		return hr;
 		pDeviceAddress->DeviceGuid = port.CaGuid;
 		pDeviceAddress->Pkey = port.PKey;
 		pDeviceAddress->PortNumber = port.PortNum;
-	} else {
-		hr = HRESULT_FROM_WIN32(GetLastError());
-	}
-
-	CloseHandle(hIbat);
-	return hr;
+	return WV_SUCCESS;
 }
 
 STDMETHODIMP CWVProvider::
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/core/winverbs/wv_ioctl.h branches\mlx4/core/winverbs/wv_ioctl.h
--- trunk/core/winverbs/wv_ioctl.h	2011-09-13 09:15:28.105099700 -0700
+++ branches\mlx4/core/winverbs/wv_ioctl.h	2011-10-10 16:58:54.643243900 -0700
@@ -436,7 +436,8 @@ typedef struct _WV_IO_PORT_ATTRIBUTES
 	UINT8			ActiveWidth;
 	UINT8			ActiveSpeed;
 	UINT8			PhysicalState;
-	UINT8			Reserved[2];
+	UINT8			Transport;
+	UINT8			Reserved[1];
 
 }	WV_IO_PORT_ATTRIBUTES;

SH: Again, fine, but we need to handle the case where it is not set by an older driver.  (I'm writing this as I'm reviewing the code, so it may be handled below.)


<snip hw/mlx4 diffs>


diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/hw/mthca/kernel/hca_pnp.c branches\mlx4/hw/mthca/kernel/hca_pnp.c
--- trunk/hw/mthca/kernel/hca_pnp.c	2011-09-13 09:16:17.408029500 -0700
+++ branches\mlx4/hw/mthca/kernel/hca_pnp.c	2011-10-10 16:59:46.774456500 -0700
@@ -12,6 +12,7 @@
 
 #include "hca_driver.h"
 #include "mthca_dev.h"
+#include <rdma\verbs.h>
 
 #if defined(EVENT_TRACING)
 #ifdef offsetof
@@ -21,7 +22,6 @@
 #endif
 #include "mthca.h"
 #include <initguid.h>
-#include <rdma\verbs.h>
 #include <wdmguid.h>
 
 extern const char *mthca_version;
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/hw/mthca/kernel/hca_verbs.c branches\mlx4/hw/mthca/kernel/hca_verbs.c
--- trunk/hw/mthca/kernel/hca_verbs.c	2011-09-13 09:16:17.298018500 -0700
+++ branches\mlx4/hw/mthca/kernel/hca_verbs.c	2011-10-10 16:59:46.656444700 -0700
@@ -1093,7 +1093,7 @@ mlnx_create_spl_qp (
 	IN		const	uint8_t						port_num,
 	IN		const	void						*qp_context,
 	IN				ci_async_event_cb_t			event_handler,
-	IN		const	ib_qp_create_t				*p_create_attr,
+	IN	OUT			ib_qp_create_t				*p_create_attr,
 		OUT			ib_qp_attr_t				*p_qp_attr,
 		OUT			ib_qp_handle_t				*ph_qp )
 {
@@ -1118,7 +1118,7 @@ mlnx_create_qp (
 	IN		const	ib_pd_handle_t				h_pd,
 	IN		const	void						*qp_context,
 	IN				ci_async_event_cb_t			event_handler,
-	IN		const	ib_qp_create_t				*p_create_attr,
+	IN	OUT			ib_qp_create_t				*p_create_attr,
 		OUT			ib_qp_attr_t				*p_qp_attr,
 		OUT			ib_qp_handle_t				*ph_qp,
 	IN	OUT			ci_umv_buf_t				*p_umv_buf )
@@ -1641,6 +1641,19 @@ mlnx_port_get_transport (
 	UNREFERENCED_PARAMETER(port_num);
 	return RDMA_TRANSPORT_IB;
 }
+
+uint8_t
+mlnx_get_sl_for_ip_port (
+	IN		const	ib_ca_handle_t	h_ca,
+	IN	const uint8_t				ca_port_num,
+	IN	const uint16_t				ip_port_num)
+{
+	UNREFERENCED_PARAMETER(h_ca);
+	UNREFERENCED_PARAMETER(ca_port_num);
+	UNREFERENCED_PARAMETER(ip_port_num);
+	return 0xff;
+}
+
 void
 setup_ci_interface(
 	IN		const	ib_net64_t					ca_guid,
@@ -1697,7 +1710,7 @@ setup_ci_interface(
 
 	p_interface->local_mad = mlnx_local_mad;
 	p_interface->rdma_port_get_transport = mlnx_port_get_transport;
-	
+	p_interface->get_sl_for_ip_port = mlnx_get_sl_for_ip_port;
 
 	mlnx_memory_if(p_interface);
 	mlnx_direct_if(p_interface);
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/hw/mthca/kernel/mthca_provider.c branches\mlx4/hw/mthca/kernel/mthca_provider.c
--- trunk/hw/mthca/kernel/mthca_provider.c	2011-09-13 09:16:17.761064800 -0700
+++ branches\mlx4/hw/mthca/kernel/mthca_provider.c	2011-10-10 16:59:47.018480900 -0700
@@ -766,7 +766,7 @@ static struct ib_cq *mthca_create_cq(str
 		cq->set_ci_db_index = ucmd.set_db_index;
 		cq->arm_db_index    = ucmd.arm_db_index;
 		cq->u_arm_db_index    = ucmd.u_arm_db_index;
-		cq->p_u_arm_sn = (int*)((char*)u_arm_db_page + BYTE_OFFSET(ucmd.u_arm_db_page));
+		cq->p_u_arm_sn = (volatile u32 *)((char*)u_arm_db_page + BYTE_OFFSET(ucmd.u_arm_db_page));
 	}
 
 	for (nent = 1; nent <= entries; nent <<= 1)
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/hw/mthca/kernel/mthca_provider.h branches\mlx4/hw/mthca/kernel/mthca_provider.h
--- trunk/hw/mthca/kernel/mthca_provider.h	2011-09-13 09:16:17.850073700 -0700
+++ branches\mlx4/hw/mthca/kernel/mthca_provider.h	2011-10-10 16:59:47.110490100 -0700
@@ -203,7 +203,7 @@ struct mthca_cq {
 	__be32                *arm_db;
 	int                    arm_sn;
 	int                    u_arm_db_index;
-	int                *p_u_arm_sn;
+	volatile u32          *p_u_arm_sn;
 
 	union mthca_buf        queue;
 	struct mthca_mr        mr;
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/hw/mthca/kernel/mthca_qp.c branches\mlx4/hw/mthca/kernel/mthca_qp.c
--- trunk/hw/mthca/kernel/mthca_qp.c	2011-09-13 09:16:18.478136500 -0700
+++ branches\mlx4/hw/mthca/kernel/mthca_qp.c	2011-10-10 16:59:47.636542700 -0700
@@ -10,18 +10,18 @@
  * COPYING in the main directory of this source tree, or the
  * OpenIB.org BSD license below:
  *
- *     Redistribution and use in source and binary forms, with or
- *     without modification, are permitted provided that the following
- *     conditions are met:
+ *	   Redistribution and use in source and binary forms, with or
+ *	   without modification, are permitted provided that the following
+ *	   conditions are met:
  *
- *      - Redistributions of source code must retain the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer.
+ *		- Redistributions of source code must retain the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer.
  *
- *      - Redistributions in binary form must reproduce the above
- *        copyright notice, this list of conditions and the following
- *        disclaimer in the documentation and/or other materials
- *        provided with the distribution.
+ *		- Redistributions in binary form must reproduce the above
+ *		  copyright notice, this list of conditions and the following
+ *		  disclaimer in the documentation and/or other materials
+ *		  provided with the distribution.
  *
  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
  * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
@@ -53,36 +53,36 @@
 
 enum {
 	MTHCA_MAX_DIRECT_QP_SIZE = 4 * PAGE_SIZE,
-	MTHCA_ACK_REQ_FREQ       = 10,
-	MTHCA_FLIGHT_LIMIT       = 9,
-	MTHCA_UD_HEADER_SIZE     = 72, /* largest UD header possible */
+	MTHCA_ACK_REQ_FREQ		 = 10,
+	MTHCA_FLIGHT_LIMIT		 = 9,
+	MTHCA_UD_HEADER_SIZE	 = 72, /* largest UD header possible */
 	MTHCA_INLINE_HEADER_SIZE = 4,  /* data segment overhead for inline */
 	MTHCA_INLINE_CHUNK_SIZE  = 16  /* inline data segment chunk */
 };
 
 enum {
-	MTHCA_QP_STATE_RST  = 0,
+	MTHCA_QP_STATE_RST	= 0,
 	MTHCA_QP_STATE_INIT = 1,
-	MTHCA_QP_STATE_RTR  = 2,
-	MTHCA_QP_STATE_RTS  = 3,
-	MTHCA_QP_STATE_SQE  = 4,
-	MTHCA_QP_STATE_SQD  = 5,
-	MTHCA_QP_STATE_ERR  = 6,
+	MTHCA_QP_STATE_RTR	= 2,
+	MTHCA_QP_STATE_RTS	= 3,
+	MTHCA_QP_STATE_SQE	= 4,
+	MTHCA_QP_STATE_SQD	= 5,
+	MTHCA_QP_STATE_ERR	= 6,
 	MTHCA_QP_STATE_DRAINING = 7
 };
 
 enum {
-	MTHCA_QP_ST_RC 	= 0x0,
-	MTHCA_QP_ST_UC 	= 0x1,
-	MTHCA_QP_ST_RD 	= 0x2,
-	MTHCA_QP_ST_UD 	= 0x3,
+	MTHCA_QP_ST_RC	= 0x0,
+	MTHCA_QP_ST_UC	= 0x1,
+	MTHCA_QP_ST_RD	= 0x2,
+	MTHCA_QP_ST_UD	= 0x3,
 	MTHCA_QP_ST_MLX = 0x7
 };
 
 enum {
 	MTHCA_QP_PM_MIGRATED = 0x3,
-	MTHCA_QP_PM_ARMED    = 0x0,
-	MTHCA_QP_PM_REARM    = 0x1
+	MTHCA_QP_PM_ARMED	 = 0x0,
+	MTHCA_QP_PM_REARM	 = 0x1
 };
 
 enum {
@@ -105,24 +105,24 @@ enum {
 #pragma pack(push,1)
 struct mthca_qp_path {
 	__be32 port_pkey;
-	u8     rnr_retry;
-	u8     g_mylmc;
+	u8	   rnr_retry;
+	u8	   g_mylmc;
 	__be16 rlid;
-	u8     ackto;
-	u8     mgid_index;
-	u8     static_rate;
-	u8     hop_limit;
+	u8	   ackto;
+	u8	   mgid_index;
+	u8	   static_rate;
+	u8	   hop_limit;
 	__be32 sl_tclass_flowlabel;
-	u8     rgid[16];
+	u8	   rgid[16];
 } ;
 
 struct mthca_qp_context {
 	__be32 flags;
 	__be32 tavor_sched_queue; /* Reserved on Arbel */
-	u8     mtu_msgmax;
-	u8     rq_size_stride;	/* Reserved on Tavor */
-	u8     sq_size_stride;	/* Reserved on Tavor */
-	u8     rlkey_arbel_sched_queue;	/* Reserved on Tavor */
+	u8	   mtu_msgmax;
+	u8	   rq_size_stride;	/* Reserved on Tavor */
+	u8	   sq_size_stride;	/* Reserved on Tavor */
+	u8	   rlkey_arbel_sched_queue; /* Reserved on Tavor */
 	__be32 usr_page;
 	__be32 local_qpn;
 	__be32 remote_qpn;
@@ -164,23 +164,23 @@ struct mthca_qp_param {
 #pragma pack(pop)
 
 enum {
-	MTHCA_QP_OPTPAR_ALT_ADDR_PATH     = 1 << 0,
-	MTHCA_QP_OPTPAR_RRE               = 1 << 1,
-	MTHCA_QP_OPTPAR_RAE               = 1 << 2,
-	MTHCA_QP_OPTPAR_RWE               = 1 << 3,
-	MTHCA_QP_OPTPAR_PKEY_INDEX        = 1 << 4,
-	MTHCA_QP_OPTPAR_Q_KEY             = 1 << 5,
-	MTHCA_QP_OPTPAR_RNR_TIMEOUT       = 1 << 6,
+	MTHCA_QP_OPTPAR_ALT_ADDR_PATH	  = 1 << 0,
+	MTHCA_QP_OPTPAR_RRE 			  = 1 << 1,
+	MTHCA_QP_OPTPAR_RAE 			  = 1 << 2,
+	MTHCA_QP_OPTPAR_RWE 			  = 1 << 3,
+	MTHCA_QP_OPTPAR_PKEY_INDEX		  = 1 << 4,
+	MTHCA_QP_OPTPAR_Q_KEY			  = 1 << 5,
+	MTHCA_QP_OPTPAR_RNR_TIMEOUT 	  = 1 << 6,
 	MTHCA_QP_OPTPAR_PRIMARY_ADDR_PATH = 1 << 7,
-	MTHCA_QP_OPTPAR_SRA_MAX           = 1 << 8,
-	MTHCA_QP_OPTPAR_RRA_MAX           = 1 << 9,
-	MTHCA_QP_OPTPAR_PM_STATE          = 1 << 10,
-	MTHCA_QP_OPTPAR_PORT_NUM          = 1 << 11,
-	MTHCA_QP_OPTPAR_RETRY_COUNT       = 1 << 12,
-	MTHCA_QP_OPTPAR_ALT_RNR_RETRY     = 1 << 13,
-	MTHCA_QP_OPTPAR_ACK_TIMEOUT       = 1 << 14,
-	MTHCA_QP_OPTPAR_RNR_RETRY         = 1 << 15,
-	MTHCA_QP_OPTPAR_SCHED_QUEUE       = 1 << 16
+	MTHCA_QP_OPTPAR_SRA_MAX 		  = 1 << 8,
+	MTHCA_QP_OPTPAR_RRA_MAX 		  = 1 << 9,
+	MTHCA_QP_OPTPAR_PM_STATE		  = 1 << 10,
+	MTHCA_QP_OPTPAR_PORT_NUM		  = 1 << 11,
+	MTHCA_QP_OPTPAR_RETRY_COUNT 	  = 1 << 12,
+	MTHCA_QP_OPTPAR_ALT_RNR_RETRY	  = 1 << 13,
+	MTHCA_QP_OPTPAR_ACK_TIMEOUT 	  = 1 << 14,
+	MTHCA_QP_OPTPAR_RNR_RETRY		  = 1 << 15,
+	MTHCA_QP_OPTPAR_SCHED_QUEUE 	  = 1 << 16
 };
 
 static const u8 mthca_opcode[] = {
@@ -209,110 +209,110 @@ static void fill_state_table()
 
 	/* IBQPS_RESET */	
 	t = &state_table[IBQPS_RESET][0];
-	t[IBQPS_RESET].trans 					= MTHCA_TRANS_ANY2RST;
-	t[IBQPS_ERR].trans 						= MTHCA_TRANS_ANY2ERR;
+	t[IBQPS_RESET].trans					= MTHCA_TRANS_ANY2RST;
+	t[IBQPS_ERR].trans						= MTHCA_TRANS_ANY2ERR;
 
 	t[IBQPS_INIT].trans 						= MTHCA_TRANS_RST2INIT;
-	t[IBQPS_INIT].req_param[UD]  	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_QKEY;
-	t[IBQPS_INIT].req_param[UC]  	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_ACCESS_FLAGS;
-	t[IBQPS_INIT].req_param[RC]  	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_ACCESS_FLAGS;
-	t[IBQPS_INIT].req_param[MLX]  	= IB_QP_PKEY_INDEX |IB_QP_QKEY;
-	t[IBQPS_INIT].opt_param[MLX]  	= IB_QP_PORT;
+	t[IBQPS_INIT].req_param[UD] 	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_QKEY;
+	t[IBQPS_INIT].req_param[UC] 	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_ACCESS_FLAGS;
+	t[IBQPS_INIT].req_param[RC] 	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_ACCESS_FLAGS;
+	t[IBQPS_INIT].req_param[MLX]	= IB_QP_PKEY_INDEX |IB_QP_QKEY;
+	t[IBQPS_INIT].opt_param[MLX]	= IB_QP_PORT;
 
 	/* IBQPS_INIT */	
 	t = &state_table[IBQPS_INIT][0];
-	t[IBQPS_RESET].trans 					= MTHCA_TRANS_ANY2RST;
-	t[IBQPS_ERR].trans 						= MTHCA_TRANS_ANY2ERR;
+	t[IBQPS_RESET].trans					= MTHCA_TRANS_ANY2RST;
+	t[IBQPS_ERR].trans						= MTHCA_TRANS_ANY2ERR;
 
 	t[IBQPS_INIT].trans 						= MTHCA_TRANS_INIT2INIT;
-	t[IBQPS_INIT].opt_param[UD]  	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_QKEY;
-	t[IBQPS_INIT].opt_param[UC]  	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_ACCESS_FLAGS;
-	t[IBQPS_INIT].opt_param[RC]  	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_ACCESS_FLAGS;
-	t[IBQPS_INIT].opt_param[MLX]  	= IB_QP_PKEY_INDEX |IB_QP_QKEY;
+	t[IBQPS_INIT].opt_param[UD] 	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_QKEY;
+	t[IBQPS_INIT].opt_param[UC] 	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_ACCESS_FLAGS;
+	t[IBQPS_INIT].opt_param[RC] 	= IB_QP_PKEY_INDEX |IB_QP_PORT |IB_QP_ACCESS_FLAGS;
+	t[IBQPS_INIT].opt_param[MLX]	= IB_QP_PKEY_INDEX |IB_QP_QKEY;
 
-	t[IBQPS_RTR].trans 						= MTHCA_TRANS_INIT2RTR;
-	t[IBQPS_RTR].req_param[UC]  	= 
+	t[IBQPS_RTR].trans						= MTHCA_TRANS_INIT2RTR;
+	t[IBQPS_RTR].req_param[UC]		= 
 		IB_QP_AV |IB_QP_PATH_MTU |IB_QP_DEST_QPN |IB_QP_RQ_PSN;
-	t[IBQPS_RTR].req_param[RC]  	= 
+	t[IBQPS_RTR].req_param[RC]		= 
 		IB_QP_AV |IB_QP_PATH_MTU |IB_QP_DEST_QPN |IB_QP_RQ_PSN |IB_QP_MAX_DEST_RD_ATOMIC |IB_QP_MIN_RNR_TIMER;
-	t[IBQPS_RTR].opt_param[UD]  	= IB_QP_PKEY_INDEX |IB_QP_QKEY;
-	t[IBQPS_RTR].opt_param[UC]  	= IB_QP_PKEY_INDEX |IB_QP_ALT_PATH |IB_QP_ACCESS_FLAGS;
-	t[IBQPS_RTR].opt_param[RC]  	= IB_QP_PKEY_INDEX |IB_QP_ALT_PATH |IB_QP_ACCESS_FLAGS;
-	t[IBQPS_RTR].opt_param[MLX]  	= IB_QP_PKEY_INDEX |IB_QP_QKEY;
+	t[IBQPS_RTR].opt_param[UD]		= IB_QP_PKEY_INDEX |IB_QP_QKEY;
+	t[IBQPS_RTR].opt_param[UC]		= IB_QP_PKEY_INDEX |IB_QP_ALT_PATH |IB_QP_ACCESS_FLAGS;
+	t[IBQPS_RTR].opt_param[RC]		= IB_QP_PKEY_INDEX |IB_QP_ALT_PATH |IB_QP_ACCESS_FLAGS;
+	t[IBQPS_RTR].opt_param[MLX] 	= IB_QP_PKEY_INDEX |IB_QP_QKEY;
 
-/* IBQPS_RTR */	
+/* IBQPS_RTR */ 
 	t = &state_table[IBQPS_RTR][0];
-	t[IBQPS_RESET].trans 					= MTHCA_TRANS_ANY2RST;
-	t[IBQPS_ERR].trans 						= MTHCA_TRANS_ANY2ERR;
+	t[IBQPS_RESET].trans					= MTHCA_TRANS_ANY2RST;
+	t[IBQPS_ERR].trans						= MTHCA_TRANS_ANY2ERR;
 
-	t[IBQPS_RTS].trans 						= MTHCA_TRANS_RTR2RTS;
-	t[IBQPS_RTS].req_param[UD]  	= IB_QP_SQ_PSN;
-	t[IBQPS_RTS].req_param[UC]  	= IB_QP_SQ_PSN;
-	t[IBQPS_RTS].req_param[RC]  	= 
+	t[IBQPS_RTS].trans						= MTHCA_TRANS_RTR2RTS;
+	t[IBQPS_RTS].req_param[UD]		= IB_QP_SQ_PSN;
+	t[IBQPS_RTS].req_param[UC]		= IB_QP_SQ_PSN;
+	t[IBQPS_RTS].req_param[RC]		= 
 		IB_QP_TIMEOUT |IB_QP_RETRY_CNT |IB_QP_RNR_RETRY |IB_QP_SQ_PSN |IB_QP_MAX_QP_RD_ATOMIC;
-	t[IBQPS_RTS].req_param[MLX]  	= IB_QP_SQ_PSN;
-	t[IBQPS_RTS].opt_param[UD]  	= IB_QP_CUR_STATE |IB_QP_QKEY;
-	t[IBQPS_RTS].opt_param[UC]  	= 
+	t[IBQPS_RTS].req_param[MLX] 	= IB_QP_SQ_PSN;
+	t[IBQPS_RTS].opt_param[UD]		= IB_QP_CUR_STATE |IB_QP_QKEY;
+	t[IBQPS_RTS].opt_param[UC]		= 
 		IB_QP_CUR_STATE |IB_QP_ALT_PATH |IB_QP_ACCESS_FLAGS |IB_QP_PATH_MIG_STATE;
-	t[IBQPS_RTS].opt_param[RC]  	= 	IB_QP_CUR_STATE |IB_QP_ALT_PATH |
+	t[IBQPS_RTS].opt_param[RC]		=	IB_QP_CUR_STATE |IB_QP_ALT_PATH |
 		IB_QP_ACCESS_FLAGS |IB_QP_MIN_RNR_TIMER |IB_QP_PATH_MIG_STATE;
-	t[IBQPS_RTS].opt_param[MLX]  	= IB_QP_CUR_STATE |IB_QP_QKEY;
+	t[IBQPS_RTS].opt_param[MLX] 	= IB_QP_CUR_STATE |IB_QP_QKEY;
 
-	/* IBQPS_RTS */	
+	/* IBQPS_RTS */ 
 	t = &state_table[IBQPS_RTS][0];
-	t[IBQPS_RESET].trans 					= MTHCA_TRANS_ANY2RST;
-	t[IBQPS_ERR].trans 						= MTHCA_TRANS_ANY2ERR;
+	t[IBQPS_RESET].trans					= MTHCA_TRANS_ANY2RST;
+	t[IBQPS_ERR].trans						= MTHCA_TRANS_ANY2ERR;
 
-	t[IBQPS_RTS].trans 						= MTHCA_TRANS_RTS2RTS;
-	t[IBQPS_RTS].opt_param[UD]  	= IB_QP_CUR_STATE |IB_QP_QKEY;
-	t[IBQPS_RTS].opt_param[UC]  	= IB_QP_ACCESS_FLAGS |IB_QP_ALT_PATH |IB_QP_PATH_MIG_STATE;
-	t[IBQPS_RTS].opt_param[RC]  	= 	IB_QP_ACCESS_FLAGS |
+	t[IBQPS_RTS].trans						= MTHCA_TRANS_RTS2RTS;
+	t[IBQPS_RTS].opt_param[UD]		= IB_QP_CUR_STATE |IB_QP_QKEY;
+	t[IBQPS_RTS].opt_param[UC]		= IB_QP_ACCESS_FLAGS |IB_QP_ALT_PATH |IB_QP_PATH_MIG_STATE;
+	t[IBQPS_RTS].opt_param[RC]		=	IB_QP_ACCESS_FLAGS |
 		IB_QP_ALT_PATH |IB_QP_PATH_MIG_STATE |IB_QP_MIN_RNR_TIMER;
-	t[IBQPS_RTS].opt_param[MLX]  	= IB_QP_CUR_STATE |IB_QP_QKEY;
+	t[IBQPS_RTS].opt_param[MLX] 	= IB_QP_CUR_STATE |IB_QP_QKEY;
 
-	t[IBQPS_SQD].trans 						= MTHCA_TRANS_RTS2SQD;
-	t[IBQPS_SQD].opt_param[UD]  	= IB_QP_EN_SQD_ASYNC_NOTIFY;
-	t[IBQPS_SQD].opt_param[UC]  	= IB_QP_EN_SQD_ASYNC_NOTIFY;
-	t[IBQPS_SQD].opt_param[RC]  	= 	IB_QP_EN_SQD_ASYNC_NOTIFY;
-	t[IBQPS_SQD].opt_param[MLX]  	= IB_QP_EN_SQD_ASYNC_NOTIFY;
+	t[IBQPS_SQD].trans						= MTHCA_TRANS_RTS2SQD;
+	t[IBQPS_SQD].opt_param[UD]		= IB_QP_EN_SQD_ASYNC_NOTIFY;
+	t[IBQPS_SQD].opt_param[UC]		= IB_QP_EN_SQD_ASYNC_NOTIFY;
+	t[IBQPS_SQD].opt_param[RC]		=	IB_QP_EN_SQD_ASYNC_NOTIFY;
+	t[IBQPS_SQD].opt_param[MLX] 	= IB_QP_EN_SQD_ASYNC_NOTIFY;
 
-	/* IBQPS_SQD */	
+	/* IBQPS_SQD */ 
 	t = &state_table[IBQPS_SQD][0];
-	t[IBQPS_RESET].trans 					= MTHCA_TRANS_ANY2RST;
-	t[IBQPS_ERR].trans 						= MTHCA_TRANS_ANY2ERR;
+	t[IBQPS_RESET].trans					= MTHCA_TRANS_ANY2RST;
+	t[IBQPS_ERR].trans						= MTHCA_TRANS_ANY2ERR;
 
-	t[IBQPS_RTS].trans 						= MTHCA_TRANS_SQD2RTS;
-	t[IBQPS_RTS].opt_param[UD]  	= IB_QP_CUR_STATE |IB_QP_QKEY;
-	t[IBQPS_RTS].opt_param[UC]  	= IB_QP_CUR_STATE |
+	t[IBQPS_RTS].trans						= MTHCA_TRANS_SQD2RTS;
+	t[IBQPS_RTS].opt_param[UD]		= IB_QP_CUR_STATE |IB_QP_QKEY;
+	t[IBQPS_RTS].opt_param[UC]		= IB_QP_CUR_STATE |
 		IB_QP_ALT_PATH |IB_QP_ACCESS_FLAGS |IB_QP_PATH_MIG_STATE;
-	t[IBQPS_RTS].opt_param[RC]  	= 	IB_QP_CUR_STATE |IB_QP_ALT_PATH |
+	t[IBQPS_RTS].opt_param[RC]		=	IB_QP_CUR_STATE |IB_QP_ALT_PATH |
 		IB_QP_ACCESS_FLAGS |IB_QP_MIN_RNR_TIMER |IB_QP_PATH_MIG_STATE;
-	t[IBQPS_RTS].opt_param[MLX]  	= IB_QP_CUR_STATE |IB_QP_QKEY;
+	t[IBQPS_RTS].opt_param[MLX] 	= IB_QP_CUR_STATE |IB_QP_QKEY;
 
-	t[IBQPS_SQD].trans 						= MTHCA_TRANS_SQD2SQD;
-	t[IBQPS_SQD].opt_param[UD]  	= IB_QP_PKEY_INDEX |IB_QP_QKEY;
-	t[IBQPS_SQD].opt_param[UC]  	= IB_QP_AV |	IB_QP_CUR_STATE |
+	t[IBQPS_SQD].trans						= MTHCA_TRANS_SQD2SQD;
+	t[IBQPS_SQD].opt_param[UD]		= IB_QP_PKEY_INDEX |IB_QP_QKEY;
+	t[IBQPS_SQD].opt_param[UC]		= IB_QP_AV |	IB_QP_CUR_STATE |
 		IB_QP_ALT_PATH |IB_QP_ACCESS_FLAGS |IB_QP_PKEY_INDEX |IB_QP_PATH_MIG_STATE;
-	t[IBQPS_SQD].opt_param[RC]  	= 	IB_QP_AV |IB_QP_TIMEOUT |IB_QP_RETRY_CNT |IB_QP_RNR_RETRY |
+	t[IBQPS_SQD].opt_param[RC]		=	IB_QP_AV |IB_QP_TIMEOUT |IB_QP_RETRY_CNT |IB_QP_RNR_RETRY |
 		IB_QP_MAX_QP_RD_ATOMIC |IB_QP_MAX_DEST_RD_ATOMIC |IB_QP_CUR_STATE |IB_QP_ALT_PATH |
 		IB_QP_ACCESS_FLAGS |IB_QP_PKEY_INDEX |IB_QP_MIN_RNR_TIMER |IB_QP_PATH_MIG_STATE;
-	t[IBQPS_SQD].opt_param[MLX]  	= IB_QP_PKEY_INDEX |IB_QP_QKEY;
+	t[IBQPS_SQD].opt_param[MLX] 	= IB_QP_PKEY_INDEX |IB_QP_QKEY;
 
-	/* IBQPS_SQE */	
+	/* IBQPS_SQE */ 
 	t = &state_table[IBQPS_SQE][0];
-	t[IBQPS_RESET].trans 					= MTHCA_TRANS_ANY2RST;
-	t[IBQPS_ERR].trans 						= MTHCA_TRANS_ANY2ERR;
+	t[IBQPS_RESET].trans					= MTHCA_TRANS_ANY2RST;
+	t[IBQPS_ERR].trans						= MTHCA_TRANS_ANY2ERR;
 
-	t[IBQPS_RTS].trans 						= MTHCA_TRANS_SQERR2RTS;
-	t[IBQPS_RTS].opt_param[UD]  	= IB_QP_CUR_STATE |IB_QP_QKEY;
-	t[IBQPS_RTS].opt_param[UC]  	= IB_QP_CUR_STATE | IB_QP_ACCESS_FLAGS;
-//	t[IBQPS_RTS].opt_param[RC]  	= 	IB_QP_CUR_STATE |IB_QP_MIN_RNR_TIMER;
-	t[IBQPS_RTS].opt_param[MLX]  	= IB_QP_CUR_STATE |IB_QP_QKEY;
+	t[IBQPS_RTS].trans						= MTHCA_TRANS_SQERR2RTS;
+	t[IBQPS_RTS].opt_param[UD]		= IB_QP_CUR_STATE |IB_QP_QKEY;
+	t[IBQPS_RTS].opt_param[UC]		= IB_QP_CUR_STATE | IB_QP_ACCESS_FLAGS;
+//	t[IBQPS_RTS].opt_param[RC]		=	IB_QP_CUR_STATE |IB_QP_MIN_RNR_TIMER;
+	t[IBQPS_RTS].opt_param[MLX] 	= IB_QP_CUR_STATE |IB_QP_QKEY;
 
-	/* IBQPS_ERR */	
+	/* IBQPS_ERR */ 
 	t = &state_table[IBQPS_ERR][0];
-	t[IBQPS_RESET].trans 					= MTHCA_TRANS_ANY2RST;
-	t[IBQPS_ERR].trans 						= MTHCA_TRANS_ANY2ERR;
+	t[IBQPS_RESET].trans					= MTHCA_TRANS_ANY2RST;
+	t[IBQPS_ERR].trans						= MTHCA_TRANS_ANY2ERR;
 
 };
 
@@ -337,7 +337,7 @@ static void dump_wqe(u32 print_lvl, u32 
 	UNUSED_PARAM_WOWPP(qp_ptr);
 	UNUSED_PARAM_WOWPP(print_lvl);
 
-	(void) wqe;	/* avoid warning if mthca_dbg compiled away... */
+	(void) wqe; /* avoid warning if mthca_dbg compiled away... */
 	HCA_PRINT(print_lvl,HCA_DBG_QP,("WQE contents  QPN 0x%06x \n",qp_ptr->qpn));
 	HCA_PRINT(print_lvl,HCA_DBG_QP,("WQE contents [%02x] %08x %08x %08x %08x \n",0
 		, cl_ntoh32(wqe[0]), cl_ntoh32(wqe[1]), cl_ntoh32(wqe[2]), cl_ntoh32(wqe[3])));
@@ -367,7 +367,7 @@ static void *get_send_wqe(struct mthca_q
 			(n << qp->sq.wqe_shift);
 	else
 		return (u8*)qp->queue.page_list[(qp->send_wqe_offset +
-					    (n << qp->sq.wqe_shift)) >>
+						(n << qp->sq.wqe_shift)) >>
 					   PAGE_SHIFT].page +
 			((qp->send_wqe_offset + (n << qp->sq.wqe_shift)) &
 			 (PAGE_SIZE - 1));
@@ -378,12 +378,12 @@ static void mthca_wq_init(struct mthca_w
 	spin_lock_init(&wq->lock);	
 	wq->next_ind  = 0;	
 	wq->last_comp = wq->max - 1;	
-	wq->head      = 0;	
-	wq->tail      = 0;	
+	wq->head	  = 0;	
+	wq->tail	  = 0;	
 }
 
 void mthca_qp_event(struct mthca_dev *dev, u32 qpn,
-		    enum ib_event_type event_type, u8 vendor_code)
+			enum ib_event_type event_type, u8 vendor_code)
 {
 	struct mthca_qp *qp;
 	ib_event_rec_t event;
@@ -403,7 +403,7 @@ void mthca_qp_event(struct mthca_dev *de
 	event.type = event_type;
 	event.context = qp->ibqp.qp_context;
 	event.vendor_specific = vendor_code;
-	HCA_PRINT(TRACE_LEVEL_WARNING,HCA_DBG_QP,("QP %06x Async event  event_type 0x%x vendor_code 0x%x\n",
+	HCA_PRINT(TRACE_LEVEL_WARNING,HCA_DBG_QP,("QP %06x Async event	event_type 0x%x vendor_code 0x%x\n",
 		qpn,event_type,vendor_code));
 	qp->ibqp.event_handler(&event);
 
@@ -421,7 +421,7 @@ static int to_mthca_state(enum ib_qp_sta
 	case IBQPS_SQD:   return MTHCA_QP_STATE_SQD;
 	case IBQPS_SQE:   return MTHCA_QP_STATE_SQE;
 	case IBQPS_ERR:   return MTHCA_QP_STATE_ERR;
-	default:                return -1;
+	default:				return -1;
 	}
 }
 
@@ -445,10 +445,10 @@ static inline enum ib_qp_state to_ib_qp_
 	case MTHCA_QP_STATE_RTR:   return IBQPS_RTR;
 	case MTHCA_QP_STATE_RTS:   return IBQPS_RTS;
 	case MTHCA_QP_STATE_SQD:   return IBQPS_SQD;
-	case MTHCA_QP_STATE_DRAINING:   return IBQPS_SQD;
+	case MTHCA_QP_STATE_DRAINING:	return IBQPS_SQD;
 	case MTHCA_QP_STATE_SQE:   return IBQPS_SQE;
 	case MTHCA_QP_STATE_ERR:   return IBQPS_ERR;
-	default:                return -1;
+	default:				return -1;
 	}
 }
 
@@ -480,17 +480,17 @@ static void to_ib_ah_attr(struct mthca_d
 				struct mthca_qp_path *path)
 {
 	memset(ib_ah_attr, 0, sizeof *ib_ah_attr);
-	ib_ah_attr->port_num 	  = (u8)((cl_ntoh32(path->port_pkey) >> 24) & 0x3);
+	ib_ah_attr->port_num	  = (u8)((cl_ntoh32(path->port_pkey) >> 24) & 0x3);
 
 	if (ib_ah_attr->port_num == 0 || ib_ah_attr->port_num > dev->limits.num_ports)
 		return;
 
-	ib_ah_attr->dlid     	  = cl_ntoh16(path->rlid);
-	ib_ah_attr->sl       	  = (u8)(cl_ntoh32(path->sl_tclass_flowlabel) >> 28);
+	ib_ah_attr->dlid		  = cl_ntoh16(path->rlid);
+	ib_ah_attr->sl			  = (u8)(cl_ntoh32(path->sl_tclass_flowlabel) >> 28);
 	ib_ah_attr->src_path_bits = path->g_mylmc & 0x7f;
 	//TODO: work around: set always full speed	- really, it's much more complicate
 	ib_ah_attr->static_rate   = 0;
-	ib_ah_attr->ah_flags      = (path->g_mylmc & (1 << 7)) ? IB_AH_GRH : 0;
+	ib_ah_attr->ah_flags	  = (path->g_mylmc & (1 << 7)) ? IB_AH_GRH : 0;
 	if (ib_ah_attr->ah_flags) {
 		ib_ah_attr->grh.sgid_index = (u8)(path->mgid_index & (dev->limits.gid_table_len - 1));
 		ib_ah_attr->grh.hop_limit  = path->hop_limit;
@@ -540,20 +540,20 @@ int mthca_query_qp(struct ib_qp *ibqp, s
 		goto out_mailbox;
 	}
 
-	qp_param    = mailbox->buf;
-	context     = &qp_param->context;
+	qp_param	= mailbox->buf;
+	context 	= &qp_param->context;
 	mthca_state = cl_ntoh32(context->flags) >> 28;
 
-	qp->state		     = to_ib_qp_state(mthca_state);
-	qp_attr->qp_state	     = qp->state;
-	qp_attr->path_mtu 	     = context->mtu_msgmax >> 5;
-	qp_attr->path_mig_state      =
+	qp->state			 = to_ib_qp_state(mthca_state);
+	qp_attr->qp_state		 = qp->state;
+	qp_attr->path_mtu		 = context->mtu_msgmax >> 5;
+	qp_attr->path_mig_state 	 =
 		to_ib_mig_state((cl_ntoh32(context->flags) >> 11) & 0x3);
-	qp_attr->qkey 		     = cl_ntoh32(context->qkey);
-	qp_attr->rq_psn 	     = cl_ntoh32(context->rnr_nextrecvpsn) & 0xffffff;
-	qp_attr->sq_psn 	     = cl_ntoh32(context->next_send_psn) & 0xffffff;
-	qp_attr->dest_qp_num 	     = cl_ntoh32(context->remote_qpn) & 0xffffff;
-	qp_attr->qp_access_flags     =
+	qp_attr->qkey			 = cl_ntoh32(context->qkey);
+	qp_attr->rq_psn 		 = cl_ntoh32(context->rnr_nextrecvpsn) & 0xffffff;
+	qp_attr->sq_psn 		 = cl_ntoh32(context->next_send_psn) & 0xffffff;
+	qp_attr->dest_qp_num		 = cl_ntoh32(context->remote_qpn) & 0xffffff;
+	qp_attr->qp_access_flags	 =
 		to_ib_qp_access_flags(cl_ntoh32(context->params2));
 
 	if (qp->transport == RC || qp->transport == UC) {
@@ -561,11 +561,11 @@ int mthca_query_qp(struct ib_qp *ibqp, s
 		to_ib_ah_attr(dev, &qp_attr->alt_ah_attr, &context->alt_path);
 		qp_attr->alt_pkey_index =
 			(u16)(cl_ntoh32(context->alt_path.port_pkey) & 0x7f);
-		qp_attr->alt_port_num 	= qp_attr->alt_ah_attr.port_num;
+		qp_attr->alt_port_num	= qp_attr->alt_ah_attr.port_num;
 	}
 
 	qp_attr->pkey_index = (u16)(cl_ntoh32(context->pri_path.port_pkey) & 0x7f);
-	qp_attr->port_num   =
+	qp_attr->port_num	=
 		(u8)((cl_ntoh32(context->pri_path.port_pkey) >> 24) & 0x3);
 
 	/* qp_attr->en_sqd_async_notify is only applicable in modify qp */
@@ -575,22 +575,23 @@ int mthca_query_qp(struct ib_qp *ibqp, s
 
 	qp_attr->max_dest_rd_atomic =
 		(u8)(1 << ((cl_ntoh32(context->params2) >> 21) & 0x7));
-	qp_attr->min_rnr_timer 	    =
+	qp_attr->min_rnr_timer		=
 		(u8)((cl_ntoh32(context->rnr_nextrecvpsn) >> 24) & 0x1f);
-	qp_attr->timeout 	    = context->pri_path.ackto >> 3;
-	qp_attr->retry_cnt 	    = (u8)((cl_ntoh32(context->params1) >> 16) & 0x7);
-	qp_attr->rnr_retry 	    = context->pri_path.rnr_retry >> 5;
-	qp_attr->alt_timeout 	    = context->alt_path.ackto >> 3;
+	qp_attr->timeout		= context->pri_path.ackto >> 3;
+	qp_attr->retry_cnt		= (u8)((cl_ntoh32(context->params1) >> 16) & 0x7);
+	qp_attr->rnr_retry		= context->pri_path.rnr_retry >> 5;
+	qp_attr->alt_timeout		= context->alt_path.ackto >> 3;
 
 done:
-	qp_attr->cur_qp_state	     = qp_attr->qp_state;
-	qp_attr->cap.max_send_wr     = qp->sq.max;
-	qp_attr->cap.max_recv_wr     = qp->rq.max;
-	qp_attr->cap.max_send_sge    = qp->sq.max_gs;
-	qp_attr->cap.max_recv_sge    = qp->rq.max_gs;
+	qp_attr->cur_qp_state		 = qp_attr->qp_state;
+	qp_attr->cap.max_send_wr	 = qp->sq.max;
+	qp_attr->cap.max_recv_wr	 = qp->rq.max;
+	qp_attr->cap.max_send_sge	 = qp->sq.max_gs;
+	qp_attr->cap.max_recv_sge	 = qp->rq.max_gs;
 	qp_attr->cap.max_inline_data = qp->max_inline_data;
 
-	qp_init_attr->cap	     = qp_attr->cap;
+	qp_init_attr->cap			 = qp_attr->cap;
+	qp_init_attr->sq_sig_type	 = qp->sq_policy;
 
 out_mailbox:
 	mthca_free_mailbox(dev, mailbox);
@@ -619,11 +620,11 @@ static void init_port(struct mthca_dev *
 
 	RtlZeroMemory(&param, sizeof param);
 
-	param.port_width    = dev->limits.port_width_cap;
-	param.vl_cap    = dev->limits.vl_cap;
-	param.mtu_cap   = dev->limits.mtu_cap;
-	param.gid_cap   = (u16)dev->limits.gid_table_len;
-	param.pkey_cap  = (u16)dev->limits.pkey_table_len;
+	param.port_width	= dev->limits.port_width_cap;
+	param.vl_cap	= dev->limits.vl_cap;
+	param.mtu_cap	= dev->limits.mtu_cap;
+	param.gid_cap	= (u16)dev->limits.gid_table_len;
+	param.pkey_cap	= (u16)dev->limits.pkey_table_len;
 
 	err = mthca_INIT_IB(dev, &param, port, &status);
 	if (err)
@@ -753,7 +754,7 @@ int mthca_modify_qp(struct ib_qp *ibqp, 
 	}
 
 	if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC &&
-	    attr->max_dest_rd_atomic > 1 << dev->qp_table.rdb_shift) {
+		attr->max_dest_rd_atomic > 1 << dev->qp_table.rdb_shift) {
 		HCA_PRINT(TRACE_LEVEL_ERROR ,HCA_DBG_QP,("Max rdma_atomic as responder %u too large (max %d)\n",
 			  attr->max_dest_rd_atomic, 1 << dev->qp_table.rdb_shift));
 		goto out;
@@ -768,9 +769,9 @@ int mthca_modify_qp(struct ib_qp *ibqp, 
 	qp_context = &qp_param->context;
 	RtlZeroMemory(qp_param, sizeof *qp_param);
 
-	qp_context->flags      = cl_hton32((to_mthca_state(new_state) << 28) |
-					     (to_mthca_st(qp->transport) << 16));
-	qp_context->flags     |= cl_hton32(MTHCA_QP_BIT_DE);
+	qp_context->flags	   = cl_hton32((to_mthca_state(new_state) << 28) |
+						 (to_mthca_st(qp->transport) << 16));
+	qp_context->flags	  |= cl_hton32(MTHCA_QP_BIT_DE);
 	if (!(attr_mask & IB_QP_PATH_MIG_STATE))
 		qp_context->flags |= cl_hton32(MTHCA_QP_PM_MIGRATED << 11);
 	else {
@@ -846,20 +847,20 @@ int mthca_modify_qp(struct ib_qp *ibqp, 
 	}
 
 	if (attr_mask & IB_QP_AV) {
-		qp_context->pri_path.g_mylmc     = attr->ah_attr.src_path_bits & 0x7f;
-		qp_context->pri_path.rlid        = cl_hton16(attr->ah_attr.dlid);
-		//TODO: work around: set always full speed  - really, it's much more complicate
+		qp_context->pri_path.g_mylmc	 = attr->ah_attr.src_path_bits & 0x7f;
+		qp_context->pri_path.rlid		 = cl_hton16(attr->ah_attr.dlid);
+		//TODO: work around: set always full speed	- really, it's much more complicate
 		qp_context->pri_path.static_rate = 0;
 		if (attr->ah_attr.ah_flags & IB_AH_GRH) {
 			qp_context->pri_path.g_mylmc |= 1 << 7;
 			qp_context->pri_path.mgid_index = attr->ah_attr.grh.sgid_index;
 			qp_context->pri_path.hop_limit = attr->ah_attr.grh.hop_limit;
 			qp_context->pri_path.sl_tclass_flowlabel =
-				cl_hton32((attr->ah_attr.sl << 28)                |
-					    (attr->ah_attr.grh.traffic_class << 20) |
-					    (attr->ah_attr.grh.flow_label));
+				cl_hton32((attr->ah_attr.sl << 28)				  |
+						(attr->ah_attr.grh.traffic_class << 20) |
+						(attr->ah_attr.grh.flow_label));
 			memcpy(qp_context->pri_path.rgid,
-			       attr->ah_attr.grh.dgid.raw, 16);
+				   attr->ah_attr.grh.dgid.raw, 16);
 		} else {
 			qp_context->pri_path.sl_tclass_flowlabel =
 				cl_hton32(attr->ah_attr.sl << 28);
@@ -875,7 +876,7 @@ int mthca_modify_qp(struct ib_qp *ibqp, 
 	/* XXX alt_path */
 
 	/* leave rdd as 0 */
-	qp_context->pd         = cl_hton32(to_mpd(ibqp->pd)->pd_num);
+	qp_context->pd		   = cl_hton32(to_mpd(ibqp->pd)->pd_num);
 	/* leave wqe_base as 0 (we always create an MR based at 0 for WQs) */
 	qp_context->wqe_lkey   = cl_hton32(qp->mr.ibmr.lkey);
 	qp_context->params1    = cl_hton32((unsigned long)(
@@ -893,7 +894,7 @@ int mthca_modify_qp(struct ib_qp *ibqp, 
 		if (attr->max_rd_atomic) {
 			qp_context->params1 |=
 				cl_hton32(MTHCA_QP_BIT_SRE |
-					    MTHCA_QP_BIT_SAE);
+						MTHCA_QP_BIT_SAE);
 			qp_context->params1 |=
 				cl_hton32(fls(attr->max_rd_atomic - 1) << 21);
 		}
@@ -920,7 +921,7 @@ int mthca_modify_qp(struct ib_qp *ibqp, 
 	}
 
 	if (attr_mask & (IB_QP_ACCESS_FLAGS | IB_QP_MAX_DEST_RD_ATOMIC)) {
-		qp_context->params2      |= get_hw_access_flags(qp, attr, attr_mask);
+		qp_context->params2 	 |= get_hw_access_flags(qp, attr, attr_mask);
 		qp_param->opt_param_mask |= cl_hton32(MTHCA_QP_OPTPAR_RWE |
 							MTHCA_QP_OPTPAR_RRE |
 							MTHCA_QP_OPTPAR_RAE);
@@ -940,8 +941,8 @@ int mthca_modify_qp(struct ib_qp *ibqp, 
 
 	qp_context->ra_buff_indx =
 		cl_hton32(dev->qp_table.rdb_base +
-			    ((qp->qpn & (dev->limits.num_qps - 1)) * MTHCA_RDB_ENTRY_SIZE <<
-			     dev->qp_table.rdb_shift));
+				((qp->qpn & (dev->limits.num_qps - 1)) * MTHCA_RDB_ENTRY_SIZE <<
+				 dev->qp_table.rdb_shift));
 
 	qp_context->cqn_rcv = cl_hton32(to_mcq(ibqp->recv_cq)->cqn);
 
@@ -955,25 +956,25 @@ int mthca_modify_qp(struct ib_qp *ibqp, 
 
 	if (ibqp->srq)
 		qp_context->srqn = cl_hton32(1 << 24 |
-					       to_msrq(ibqp->srq)->srqn);
+						   to_msrq(ibqp->srq)->srqn);
 
 	if (cur_state == IBQPS_RTS && new_state == IBQPS_SQD	&&
-	    attr_mask & IB_QP_EN_SQD_ASYNC_NOTIFY		&&
-	    attr->en_sqd_async_notify)
+		attr_mask & IB_QP_EN_SQD_ASYNC_NOTIFY		&&
+		attr->en_sqd_async_notify)
 		sqd_event = (u32)(1 << 31);
 
 	err = mthca_MODIFY_QP(dev, state_table[cur_state][new_state].trans,
-			      qp->qpn, 0, mailbox, sqd_event, &status);
+				  qp->qpn, 0, mailbox, sqd_event, &status);
 	if (err) {
 		HCA_PRINT(TRACE_LEVEL_ERROR ,HCA_DBG_QP,("mthca_MODIFY_QP returned error (qp-num = 0x%x) returned status %02x "
-			"cur_state  = %d  new_state = %d attr_mask = %d req_param = %d opt_param = %d\n",
+			"cur_state	= %d  new_state = %d attr_mask = %d req_param = %d opt_param = %d\n",
 			ibqp->qp_num, status, cur_state, new_state, 
-			attr_mask, req_param, opt_param));        
+			attr_mask, req_param, opt_param));		  
 		goto out_mailbox;
 	}
 	if (status) {
 		HCA_PRINT(TRACE_LEVEL_ERROR ,HCA_DBG_QP,("mthca_MODIFY_QP bad status(qp-num = 0x%x) returned status %02x "
-			"cur_state  = %d  new_state = %d attr_mask = %d req_param = %d opt_param = %d\n",
+			"cur_state	= %d  new_state = %d attr_mask = %d req_param = %d opt_param = %d\n",
 			ibqp->qp_num, status, cur_state, new_state, 
 			attr_mask, req_param, opt_param));
 		err = -EINVAL;
@@ -1011,10 +1012,10 @@ int mthca_modify_qp(struct ib_qp *ibqp, 
 	 */
 	if (new_state == IBQPS_RESET && !qp->ibqp.ucontext) {
 		mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn,
-			       qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL);
+				   qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL);
 		if (qp->ibqp.send_cq != qp->ibqp.recv_cq)
 			mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn,
-				       qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL);
+					   qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL);
 
 		mthca_wq_init(&qp->sq);
 		qp->sq.last = get_send_wqe(qp, qp->sq.max - 1);
@@ -1080,7 +1081,7 @@ static void mthca_adjust_qp_caps(struct 
 		(int)(max_data_size / sizeof (struct mthca_data_seg)));
 	qp->rq.max_gs = min(dev->limits.max_sg,
 		(int)((min(dev->limits.max_desc_sz, 1 << qp->rq.wqe_shift) -
-		sizeof (struct mthca_next_seg)) / sizeof (struct mthca_data_seg)));	
+		sizeof (struct mthca_next_seg)) / sizeof (struct mthca_data_seg))); 
 }
 
 /*
@@ -1091,8 +1092,8 @@ static void mthca_adjust_qp_caps(struct 
  * queue)
  */
 static int mthca_alloc_wqe_buf(struct mthca_dev *dev,
-			       struct mthca_pd *pd,
-			       struct mthca_qp *qp)
+				   struct mthca_pd *pd,
+				   struct mthca_qp *qp)
 {
 	int size;
 	int err = -ENOMEM;
@@ -1105,7 +1106,7 @@ static int mthca_alloc_wqe_buf(struct mt
 		return -EINVAL;
 
 	for (qp->rq.wqe_shift = 6; 1 << qp->rq.wqe_shift < size;
-	     qp->rq.wqe_shift++)
+		 qp->rq.wqe_shift++)
 		; /* nothing */
 
 	size = qp->sq.max_gs * sizeof (struct mthca_data_seg);
@@ -1149,11 +1150,11 @@ static int mthca_alloc_wqe_buf(struct mt
 		return -EINVAL;
 
 	for (qp->sq.wqe_shift = 6; 1 << qp->sq.wqe_shift < size;
-	     qp->sq.wqe_shift++)
+		 qp->sq.wqe_shift++)
 		; /* nothing */
 
 	qp->send_wqe_offset = ALIGN(qp->rq.max << qp->rq.wqe_shift,
-				    1 << qp->sq.wqe_shift);
+					1 << qp->sq.wqe_shift);
 
 	/*
 	 * If this is a userspace QP, we don't actually have to
@@ -1172,7 +1173,7 @@ static int mthca_alloc_wqe_buf(struct mt
 		goto err_out;
 
 	err = mthca_buf_alloc(dev, size, MTHCA_MAX_DIRECT_QP_SIZE,
-			      &qp->queue, &qp->is_direct, pd, 0, &qp->mr);
+				  &qp->queue, &qp->is_direct, pd, 0, &qp->mr);
 	if (err)
 		goto err_out;
 	
@@ -1185,16 +1186,16 @@ err_out:
 }
 
 static void mthca_free_wqe_buf(struct mthca_dev *dev,
-			       struct mthca_qp *qp)
+				   struct mthca_qp *qp)
 {
 	mthca_buf_free(dev, (int)(LONG_PTR)NEXT_PAGE_ALIGN(qp->send_wqe_offset +
-				       (qp->sq.max << qp->sq.wqe_shift)),
-		       &qp->queue, qp->is_direct, &qp->mr);
+					   (qp->sq.max << qp->sq.wqe_shift)),
+			   &qp->queue, qp->is_direct, &qp->mr);
 	kfree(qp->wrid);
 }
 
 static int mthca_map_memfree(struct mthca_dev *dev,
-			     struct mthca_qp *qp)
+				 struct mthca_qp *qp)
 {
 	int ret;
 
@@ -1207,10 +1208,10 @@ static int mthca_map_memfree(struct mthc
 		if (ret)
 			goto err_qpc;
 
- 		ret = mthca_table_get(dev, dev->qp_table.rdb_table,
- 				      qp->qpn << dev->qp_table.rdb_shift);
- 		if (ret)
- 			goto err_eqpc;
+		ret = mthca_table_get(dev, dev->qp_table.rdb_table,
+					  qp->qpn << dev->qp_table.rdb_shift);
+		if (ret)
+			goto err_eqpc;
 
 	}
 
@@ -1235,7 +1236,7 @@ static void mthca_unmap_memfree(struct m
 }
 
 static int mthca_alloc_memfree(struct mthca_dev *dev,
-			       struct mthca_qp *qp)
+				   struct mthca_qp *qp)
 {
 	int ret = 0;
 
@@ -1258,7 +1259,7 @@ static int mthca_alloc_memfree(struct mt
 }
 
 static void mthca_free_memfree(struct mthca_dev *dev,
-			       struct mthca_qp *qp)
+				   struct mthca_qp *qp)
 {
 	if (mthca_is_memfree(dev)) {
 		mthca_free_db(dev, MTHCA_DB_TYPE_SQ, qp->sq.db_index);
@@ -1280,10 +1281,10 @@ static int mthca_alloc_qp_common(struct 
 	init_waitqueue_head(&qp->wait);
 	KeInitializeMutex(&qp->mutex, 0);
 
-	qp->state    	 = IBQPS_RESET;
+	qp->state		 = IBQPS_RESET;
 	qp->atomic_rd_en = 0;
-	qp->resp_depth   = 0;
-	qp->sq_policy    = send_policy;
+	qp->resp_depth	 = 0;
+	qp->sq_policy	 = send_policy;
 	mthca_wq_init(&qp->sq);
 	mthca_wq_init(&qp->rq);
 
@@ -1321,7 +1322,7 @@ static int mthca_alloc_qp_common(struct 
 		struct mthca_next_seg *next;
 		struct mthca_data_seg *scatter;
 		int size = (sizeof (struct mthca_next_seg) +
-			    qp->rq.max_gs * sizeof (struct mthca_data_seg)) / 16;
+				qp->rq.max_gs * sizeof (struct mthca_data_seg)) / 16;
 
 		for (i = 0; i < qp->rq.max; ++i) {
 			next = get_recv_wqe(qp, i);
@@ -1330,15 +1331,15 @@ static int mthca_alloc_qp_common(struct 
 			next->ee_nds = cl_hton32(size);
 
 			for (scatter = (void *) (next + 1);
-			     (void *) scatter < (void *) ((u8*)next + (u32)(1 << qp->rq.wqe_shift));
-			     ++scatter)
+				 (void *) scatter < (void *) ((u8*)next + (u32)(1 << qp->rq.wqe_shift));
+				 ++scatter)
 				scatter->lkey = cl_hton32(MTHCA_INVAL_LKEY);
 		}
 
 		for (i = 0; i < qp->sq.max; ++i) {
 			next = get_send_wqe(qp, i);
 			next->nda_op = cl_hton32((((i + 1) & (qp->sq.max - 1)) <<
-						    qp->sq.wqe_shift) +
+							qp->sq.wqe_shift) +
 						   qp->send_wqe_offset);
 		}
 	}
@@ -1355,11 +1356,11 @@ static int mthca_set_qp_size(struct mthc
 	int max_data_size = mthca_max_data_size(dev, qp, dev->limits.max_desc_sz);
 
 	/* Sanity check QP size before proceeding */
-	if (cap->max_send_wr  	 > (u32)dev->limits.max_wqes ||
-	    cap->max_recv_wr  	 > (u32)dev->limits.max_wqes ||
-	    cap->max_send_sge 	 > (u32)dev->limits.max_sg   ||
-	    cap->max_recv_sge 	 > (u32)dev->limits.max_sg   ||
-	    cap->max_inline_data > (u32)mthca_max_inline_data(max_data_size))
+	if (cap->max_send_wr	 > (u32)dev->limits.max_wqes ||
+		cap->max_recv_wr	 > (u32)dev->limits.max_wqes ||
+		cap->max_send_sge	 > (u32)dev->limits.max_sg	 ||
+		cap->max_recv_sge	 > (u32)dev->limits.max_sg	 ||
+		cap->max_inline_data > (u32)mthca_max_inline_data(max_data_size))
 		return -EINVAL;
 
 	/*
@@ -1387,9 +1388,9 @@ static int mthca_set_qp_size(struct mthc
 
 	qp->rq.max_gs = cap->max_recv_sge;
 	qp->sq.max_gs = MAX(cap->max_send_sge,
-			      ALIGN(cap->max_inline_data + MTHCA_INLINE_HEADER_SIZE,
-				    MTHCA_INLINE_CHUNK_SIZE) /
-			      (int)sizeof (struct mthca_data_seg));
+				  ALIGN(cap->max_inline_data + MTHCA_INLINE_HEADER_SIZE,
+					MTHCA_INLINE_CHUNK_SIZE) /
+				  (int)sizeof (struct mthca_data_seg));
 
 	return 0;
 }
@@ -1422,7 +1423,7 @@ int mthca_alloc_qp(struct mthca_dev *dev
 		return -ENOMEM;
 
 	err = mthca_alloc_qp_common(dev, pd, send_cq, recv_cq,
-				    send_policy, qp);
+					send_policy, qp);
 	if (err) {
 		mthca_free(&dev->qp_table.alloc, qp->qpn);
 		return err;
@@ -1437,14 +1438,14 @@ int mthca_alloc_qp(struct mthca_dev *dev
 }
 
 int mthca_alloc_sqp(struct mthca_dev *dev,
-		    struct mthca_pd *pd,
-		    struct mthca_cq *send_cq,
-		    struct mthca_cq *recv_cq,
-		    enum ib_sig_type send_policy,
-		    struct ib_qp_cap *cap,
-		    int qpn,
-		    int port,
-		    struct mthca_sqp *sqp)
+			struct mthca_pd *pd,
+			struct mthca_cq *send_cq,
+			struct mthca_cq *recv_cq,
+			enum ib_sig_type send_policy,
+			struct ib_qp_cap *cap,
+			int qpn,
+			int port,
+			struct mthca_sqp *sqp)
 {
 	u32 mqpn = qpn * 2 + dev->qp_table.sqp_start + port - 1;
 	int err;
@@ -1474,11 +1475,11 @@ int mthca_alloc_sqp(struct mthca_dev *de
 		goto err_out;
 
 	sqp->port = port;
-	sqp->qp.qpn       = mqpn;
+	sqp->qp.qpn 	  = mqpn;
 	sqp->qp.transport = MLX;
 
 	err = mthca_alloc_qp_common(dev, pd, send_cq, recv_cq,
-				    send_policy, &sqp->qp);
+					send_policy, &sqp->qp);
 	if (err)
 		goto err_out_free;
 
@@ -1558,10 +1559,10 @@ void mthca_free_qp(struct mthca_dev *dev
 	 */
 	if (!qp->ibqp.ucontext) {
 		mthca_cq_clean(dev, to_mcq(qp->ibqp.send_cq)->cqn, qp->qpn,
-			       qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL);
+				   qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL);
 		if (qp->ibqp.send_cq != qp->ibqp.recv_cq)
 			mthca_cq_clean(dev, to_mcq(qp->ibqp.recv_cq)->cqn, qp->qpn,
-				       qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL);
+					   qp->ibqp.srq ? to_msrq(qp->ibqp.srq) : NULL);
 
 		mthca_free_memfree(dev, qp);
 		mthca_free_wqe_buf(dev, qp);
@@ -1587,12 +1588,12 @@ static enum mthca_wr_opcode conv_ibal_wr
 		case WR_SEND: 
 			opcode = (wr->send_opt & IB_SEND_OPT_IMMEDIATE) ? MTHCA_OPCODE_SEND_IMM : MTHCA_OPCODE_SEND;
 			break;
-		case WR_RDMA_WRITE:	
+		case WR_RDMA_WRITE: 
 			opcode = (wr->send_opt & IB_SEND_OPT_IMMEDIATE) ? MTHCA_OPCODE_RDMA_WRITE_IMM : MTHCA_OPCODE_RDMA_WRITE;
 			break;
-		case WR_RDMA_READ: 		opcode = MTHCA_OPCODE_RDMA_READ; break;
-		case WR_COMPARE_SWAP: 		opcode = MTHCA_OPCODE_ATOMIC_CS; break;
-		case WR_FETCH_ADD: 			opcode = MTHCA_OPCODE_ATOMIC_FA; break;
+		case WR_RDMA_READ:		opcode = MTHCA_OPCODE_RDMA_READ; break;
+		case WR_COMPARE_SWAP:		opcode = MTHCA_OPCODE_ATOMIC_CS; break;
+		case WR_FETCH_ADD:			opcode = MTHCA_OPCODE_ATOMIC_FA; break;
 		default:						opcode = MTHCA_OPCODE_INVALID;break;
 	}
 	return opcode;
@@ -1600,9 +1601,9 @@ static enum mthca_wr_opcode conv_ibal_wr
 
 /* Create UD header for an MLX send and build a data segment for it */
 static int build_mlx_header(struct mthca_dev *dev, struct mthca_sqp *sqp,
-			    int ind, struct _ib_send_wr *wr,
-			    struct mthca_mlx_seg *mlx,
-			    struct mthca_data_seg *data)
+				int ind, struct _ib_send_wr *wr,
+				struct mthca_mlx_seg *mlx,
+				struct mthca_data_seg *data)
 {
 	enum ib_wr_opcode opcode = conv_ibal_wr_opcode(wr);
 	int header_size;
@@ -1618,7 +1619,7 @@ static int build_mlx_header(struct mthca
 		
 	ib_ud_header_init(256, /* assume a MAD */
 		mthca_ah_grh_present(to_mah((struct ib_ah *)wr->dgrm.ud.h_av)),
-	  	&sqp->ud_header);
+		&sqp->ud_header);
 
 	err = mthca_read_ah(dev, to_mah((struct ib_ah *)wr->dgrm.ud.h_av), &sqp->ud_header);
 	if (err){
@@ -1662,7 +1663,7 @@ static int build_mlx_header(struct mthca
 	sqp->ud_header.bth.destination_qpn = wr->dgrm.ud.remote_qp;
 	sqp->ud_header.bth.psn = cl_hton32((sqp->send_psn++) & ((1 << 24) - 1));
 	sqp->ud_header.deth.qkey = wr->dgrm.ud.remote_qkey & 0x00000080 ?
-					       cl_hton32(sqp->qkey) : wr->dgrm.ud.remote_qkey;
+						   cl_hton32(sqp->qkey) : wr->dgrm.ud.remote_qkey;
 	sqp->ud_header.deth.source_qpn = cl_hton32(sqp->qp.ibqp.qp_num);
 
 	header_size = ib_ud_header_pack(&sqp->ud_header,
@@ -1670,15 +1671,15 @@ static int build_mlx_header(struct mthca
 					ind * MTHCA_UD_HEADER_SIZE);
 
 	data->byte_count = cl_hton32(header_size);
-	data->lkey       = cl_hton32(to_mpd(sqp->qp.ibqp.pd)->ntmr.ibmr.lkey);
-	data->addr       = CPU_2_BE64(sqp->sg.dma_address +
-				       ind * MTHCA_UD_HEADER_SIZE);
+	data->lkey		 = cl_hton32(to_mpd(sqp->qp.ibqp.pd)->ntmr.ibmr.lkey);
+	data->addr		 = CPU_2_BE64(sqp->sg.dma_address +
+					   ind * MTHCA_UD_HEADER_SIZE);
 
 	return 0;
 }
 
 static inline int mthca_wq_overflow(struct mthca_wq *wq, int nreq,
-				    struct ib_cq *ib_cq)
+					struct ib_cq *ib_cq)
 {
 	unsigned cur;
 	struct mthca_cq *cq;
@@ -1715,7 +1716,7 @@ int mthca_tavor_post_send(struct ib_qp *
 	SPIN_LOCK_PREP(lh);   
 
 	spin_lock_irqsave(&qp->sq.lock, &lh);
-    
+	
 	/* XXX check that state is OK to post send */
 
 	ind = qp->sq.next_ind;
@@ -1746,7 +1747,7 @@ int mthca_tavor_post_send(struct ib_qp *
 			 cl_hton32(MTHCA_NEXT_SOLICIT) : 0)   |
 			cl_hton32(1);
 		if (opcode == MTHCA_OPCODE_SEND_IMM||
-		    opcode == MTHCA_OPCODE_RDMA_WRITE_IMM)
+			opcode == MTHCA_OPCODE_RDMA_WRITE_IMM)
 			((struct mthca_next_seg *) wqe)->imm = wr->immediate_data;
 
 		wqe += sizeof (struct mthca_next_seg);
@@ -1834,8 +1835,8 @@ int mthca_tavor_post_send(struct ib_qp *
 
 		case MLX:
 			err = build_mlx_header(dev, to_msqp(qp), ind, wr,
-					       (void*)(wqe - sizeof (struct mthca_next_seg)),
-					       (void*)wqe);
+						   (void*)(wqe - sizeof (struct mthca_next_seg)),
+						   (void*)wqe);
 			if (err) {
 				if (bad_wr)
 					*bad_wr = wr;
@@ -1872,7 +1873,7 @@ int mthca_tavor_post_send(struct ib_qp *
 					}
 
 					memcpy(wqe, (void *) (ULONG_PTR) sge->vaddr,
-					       sge->length);
+						   sge->length);
 					wqe += sge->length;
 				}
 
@@ -1880,20 +1881,20 @@ int mthca_tavor_post_send(struct ib_qp *
 				size += align(s + sizeof *seg, 16) / 16;
 			}
 		} else {
-    			
-    		for (i = 0; i < (int)wr->num_ds; ++i) {
-    			((struct mthca_data_seg *) wqe)->byte_count =
-    				cl_hton32(wr->ds_array[i].length);
-    			((struct mthca_data_seg *) wqe)->lkey =
-    				cl_hton32(wr->ds_array[i].lkey);
-    			((struct mthca_data_seg *) wqe)->addr =
-    				cl_hton64(wr->ds_array[i].vaddr);
-    			wqe += sizeof (struct mthca_data_seg);
-    			size += sizeof (struct mthca_data_seg) / 16;
-    			HCA_PRINT(TRACE_LEVEL_VERBOSE ,HCA_DBG_QP ,("SQ %06x [%02x]  lkey 0x%08x vaddr 0x%I64x 0x%x\n",qp->qpn,i,
-    				(wr->ds_array[i].lkey),(wr->ds_array[i].vaddr),wr->ds_array[i].length));
-    		}
-    	}
+				
+			for (i = 0; i < (int)wr->num_ds; ++i) {
+				((struct mthca_data_seg *) wqe)->byte_count =
+					cl_hton32(wr->ds_array[i].length);
+				((struct mthca_data_seg *) wqe)->lkey =
+					cl_hton32(wr->ds_array[i].lkey);
+				((struct mthca_data_seg *) wqe)->addr =
+					cl_hton64(wr->ds_array[i].vaddr);
+				wqe += sizeof (struct mthca_data_seg);
+				size += sizeof (struct mthca_data_seg) / 16;
+				HCA_PRINT(TRACE_LEVEL_VERBOSE ,HCA_DBG_QP ,("SQ %06x [%02x]  lkey 0x%08x vaddr 0x%I64x 0x%x\n",qp->qpn,i,
+					(wr->ds_array[i].lkey),(wr->ds_array[i].vaddr),wr->ds_array[i].length));
+			}
+		}
 
 		/* Add one more inline data segment for ICRC */
 		if (qp->transport == MLX) {
@@ -1946,19 +1947,19 @@ out:
 		wmb();
 
 		mthca_write64(doorbell,
-			      dev->kar + MTHCA_SEND_DOORBELL,
-			      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
+				  dev->kar + MTHCA_SEND_DOORBELL,
+				  MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
 	}
 
 	qp->sq.next_ind = ind;
 	qp->sq.head    += nreq;
 	
-    spin_unlock_irqrestore(&lh);   
+	spin_unlock_irqrestore(&lh);   
 	return err;
 }
 
 int mthca_tavor_post_recv(struct ib_qp *ibqp, struct _ib_recv_wr *wr,
-			     struct _ib_recv_wr **bad_wr)
+				 struct _ib_recv_wr **bad_wr)
 {
 	struct mthca_dev *dev = to_mdev(ibqp->device);
 	struct mthca_qp *qp = to_mqp(ibqp);
@@ -1989,7 +1990,7 @@ int mthca_tavor_post_recv(struct ib_qp *
 			wmb();
 
 			mthca_write64(doorbell, dev->kar + MTHCA_RECV_DOORBELL,
-		      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
+			  MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
 
 			qp->rq.head += MTHCA_TAVOR_MAX_WQES_PER_RECV_DB;
 			size0 = 0;
@@ -2034,7 +2035,7 @@ int mthca_tavor_post_recv(struct ib_qp *
 				cl_hton64(wr->ds_array[i].vaddr);
 			wqe += sizeof (struct mthca_data_seg);
 			size += sizeof (struct mthca_data_seg) / 16;
-//			HCA_PRINT(TRACE_LEVEL_ERROR  ,HCA_DBG_QP ,("RQ %06x [%02x]  lkey 0x%08x vaddr 0x%I64x 0x %x 0x%08x\n",i,qp->qpn,
+//			HCA_PRINT(TRACE_LEVEL_ERROR  ,HCA_DBG_QP ,("RQ %06x [%02x]	lkey 0x%08x vaddr 0x%I64x 0x %x 0x%08x\n",i,qp->qpn,
 //				(wr->ds_array[i].lkey),(wr->ds_array[i].vaddr),wr->ds_array[i].length, wr->wr_id));
 		}
 
@@ -2064,7 +2065,7 @@ out:
 		wmb();
 
 		mthca_write64(doorbell, dev->kar + MTHCA_RECV_DOORBELL,
-	      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
+		  MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
 	}
 
 	qp->rq.next_ind = ind;
@@ -2228,7 +2229,7 @@ int mthca_arbel_post_send(struct ib_qp *
 
 		case UD:
 			memcpy(((struct mthca_arbel_ud_seg *) wqe)->av,
-			       to_mah((struct ib_ah *)wr->dgrm.ud.h_av)->av, MTHCA_AV_SIZE);
+				   to_mah((struct ib_ah *)wr->dgrm.ud.h_av)->av, MTHCA_AV_SIZE);
 			((struct mthca_arbel_ud_seg *) wqe)->dqpn = wr->dgrm.ud.remote_qp;
 			((struct mthca_arbel_ud_seg *) wqe)->qkey = wr->dgrm.ud.remote_qkey;
 
@@ -2238,8 +2239,8 @@ int mthca_arbel_post_send(struct ib_qp *
 
 		case MLX:
 			err = build_mlx_header(dev, to_msqp(qp), ind, wr,
-					       (void*)(wqe - sizeof (struct mthca_next_seg)),
-					       (void*)wqe);
+						   (void*)(wqe - sizeof (struct mthca_next_seg)),
+						   (void*)wqe);
 			if (err) {
 				if (bad_wr)
 					*bad_wr = wr;
@@ -2257,7 +2258,7 @@ int mthca_arbel_post_send(struct ib_qp *
 				*bad_wr = wr;
 			goto out;
 		}
-        if (wr->send_opt & IB_SEND_OPT_INLINE) {
+		if (wr->send_opt & IB_SEND_OPT_INLINE) {
 			if (wr->num_ds) {
 				struct mthca_inline_seg *seg = (struct mthca_inline_seg *)wqe;
 				uint32_t s = 0;
@@ -2276,7 +2277,7 @@ int mthca_arbel_post_send(struct ib_qp *
 					}
 
 					memcpy(wqe, (void *) (uintptr_t) sge->vaddr,
-					       sge->length);
+						   sge->length);
 					wqe += sge->length;
 				}
 
@@ -2284,17 +2285,17 @@ int mthca_arbel_post_send(struct ib_qp *
 				size += align(s + sizeof *seg, 16) / 16;
 			}
 		} else {
-    		for (i = 0; i < (int)wr->num_ds; ++i) {
-    			((struct mthca_data_seg *) wqe)->byte_count =
-    				cl_hton32(wr->ds_array[i].length);
-    			((struct mthca_data_seg *) wqe)->lkey =
-    				cl_hton32(wr->ds_array[i].lkey);
-    			((struct mthca_data_seg *) wqe)->addr =
-    				cl_hton64(wr->ds_array[i].vaddr);
-    			wqe += sizeof (struct mthca_data_seg);
-    			size += sizeof (struct mthca_data_seg) / 16;
-    		}
-    	}
+			for (i = 0; i < (int)wr->num_ds; ++i) {
+				((struct mthca_data_seg *) wqe)->byte_count =
+					cl_hton32(wr->ds_array[i].length);
+				((struct mthca_data_seg *) wqe)->lkey =
+					cl_hton32(wr->ds_array[i].lkey);
+				((struct mthca_data_seg *) wqe)->addr =
+					cl_hton64(wr->ds_array[i].vaddr);
+				wqe += sizeof (struct mthca_data_seg);
+				size += sizeof (struct mthca_data_seg) / 16;
+			}
+		}
 
 		/* Add one more inline data segment for ICRC */
 		if (qp->transport == MLX) {
@@ -2354,8 +2355,8 @@ out:
 		 */
 		wmb();
 		mthca_write64(doorbell,
-			      dev->kar + MTHCA_SEND_DOORBELL,
-			      MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
+				  dev->kar + MTHCA_SEND_DOORBELL,
+				  MTHCA_GET_DOORBELL_LOCK(&dev->doorbell_lock));
 	}
 
 	spin_unlock_irqrestore(&lh);
@@ -2363,7 +2364,7 @@ out:
 }
 
 int mthca_arbel_post_recv(struct ib_qp *ibqp, struct _ib_recv_wr *wr,
-			     struct _ib_recv_wr **bad_wr)
+				 struct _ib_recv_wr **bad_wr)
 {
 	struct mthca_qp *qp = to_mqp(ibqp);
 	int err = 0;
@@ -2373,7 +2374,7 @@ int mthca_arbel_post_recv(struct ib_qp *
 	u8 *wqe;
 	SPIN_LOCK_PREP(lh);
 
- 	spin_lock_irqsave(&qp->rq.lock, &lh);
+	spin_lock_irqsave(&qp->rq.lock, &lh);
 
 	/* XXX check that state is OK to post receive */
 
@@ -2444,7 +2445,7 @@ out:
 }
 
 void mthca_free_err_wqe(struct mthca_dev *dev, struct mthca_qp *qp, int is_send,
-		       int index, int *dbd, __be32 *new_wqe)
+			   int index, int *dbd, __be32 *new_wqe)
 {
 	struct mthca_next_seg *next;
 
@@ -2487,15 +2488,15 @@ int mthca_init_qp_table(struct mthca_dev
 	 */
 	dev->qp_table.sqp_start = (dev->limits.reserved_qps + 1) & ~1UL;
 	err = mthca_alloc_init(&dev->qp_table.alloc,
-			       dev->limits.num_qps,
-			       (1 << 24) - 1,
-			       dev->qp_table.sqp_start +
-			       MTHCA_MAX_PORTS * 2);
+				   dev->limits.num_qps,
+				   (1 << 24) - 1,
+				   dev->qp_table.sqp_start +
+				   MTHCA_MAX_PORTS * 2);
 	if (err)
 		return err;
 
 	err = mthca_array_init(&dev->qp_table.qp,
-			       dev->limits.num_qps);
+				   dev->limits.num_qps);
 	if (err) {
 		mthca_alloc_cleanup(&dev->qp_table.alloc);
 		return err;
@@ -2503,8 +2504,8 @@ int mthca_init_qp_table(struct mthca_dev
 
 	for (i = 0; i < 2; ++i) {
 		err = mthca_CONF_SPECIAL_QP(dev, i ? IB_QPT_QP1 : IB_QPT_QP0,
-					    dev->qp_table.sqp_start + i * 2,
-					    &status);
+						dev->qp_table.sqp_start + i * 2,
+						&status);
 		if (err)
 			goto err_out;
 		if (status) {
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/hw/mthca/mt_utils.h branches\mlx4/hw/mthca/mt_utils.h
--- trunk/hw/mthca/mt_utils.h	2011-09-13 09:16:20.549343600 -0700
+++ branches\mlx4/hw/mthca/mt_utils.h	2011-10-10 16:59:49.555734600 -0700
@@ -111,7 +111,7 @@ static __inline int _ffs(const unsigned 
 }
 
 
-#define ffs(val)	_ffs((const unsigned long *)&val)
+#define ffs(val)	_ffs((const unsigned long *)&(val))
 
 /**
 * _ffz_raw - find the first zero bit in a word
@@ -202,21 +202,22 @@ static __inline int find_first_zero_bit(
 static __inline int find_next_zero_bit(const unsigned long *addr, int bits_size, int offset)
 {	
 	int res;
-	int ix = offset % BITS_PER_LONG;
-	int w_offset = offset / BITS_PER_LONG;
+	int ix = offset & 31;
+	int set = offset & ~31;
+	const unsigned long *p = addr + (set >> 5);
 
 	// search in the first word while we are in the middle
 	if (ix) {
-		res = _ffz_raw(addr + w_offset, ix);
+		res = _ffz_raw(p, ix);
 		if (res)
-			return res - 1;
-		++addr;
-		bits_size -= BITS_PER_LONG;
-		ix = BITS_PER_LONG;
+			return set + res - 1;
+		++p;
+		set += BITS_PER_LONG;
 	}
 
-	res = find_first_zero_bit( addr, bits_size );
-	return res + ix;
+	// search the rest of the bitmap
+	res = find_first_zero_bit(p, bits_size - (unsigned)(32 * (p - addr)));
+	return res + set;
 }
 
 void fill_bit_tbls();
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/hw/mthca/user/mlnx_ual_main.c branches\mlx4/hw/mthca/user/mlnx_ual_main.c
--- trunk/hw/mthca/user/mlnx_ual_main.c	2011-09-13 09:16:20.315320200 -0700
+++ branches\mlx4/hw/mthca/user/mlnx_ual_main.c	2011-10-10 16:59:49.172696300 -0700
@@ -44,13 +44,6 @@ uint32_t	mlnx_dbg_lvl = 0; // MLNX_TRACE
 
 static void uvp_init();
 
-extern BOOL APIENTRY
-_DllMainCRTStartupForGS(
-	IN				HINSTANCE					h_module,
-	IN				DWORD						ul_reason_for_call, 
-	IN				LPVOID						lp_reserved );
-
-
 BOOL APIENTRY
 DllMain(
 	IN				HINSTANCE					h_module,
@@ -61,14 +54,8 @@ DllMain(
 	{
 	case DLL_PROCESS_ATTACH:
 #if defined(EVENT_TRACING)
-		WPP_INIT_TRACING(L"mthcau.dll");
+    WPP_INIT_TRACING(L"mthcau.dll");
 #endif
-		if( !_DllMainCRTStartupForGS(
-			h_module, ul_reason_for_call, lp_reserved ) )
-		{
-			return FALSE;
-		}
-
 		fill_bit_tbls();
 		uvp_init();
 		break;
@@ -86,8 +73,7 @@ DllMain(
 #endif
 
 	default:
-		return _DllMainCRTStartupForGS(
-			h_module, ul_reason_for_call, lp_reserved );
+		return TRUE;
 	}
 	return TRUE;
 }
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/hw/mthca/user/mlnx_uvp.h branches\mlx4/hw/mthca/user/mlnx_uvp.h
--- trunk/hw/mthca/user/mlnx_uvp.h	2011-09-13 09:16:20.144303100 -0700
+++ branches\mlx4/hw/mthca/user/mlnx_uvp.h	2011-10-10 16:59:49.049684000 -0700
@@ -131,7 +131,7 @@ struct mthca_cq {
 	int                arm_db_index;
 	uint32_t          *arm_db;
 	int                u_arm_db_index;
-	uint32_t          *p_u_arm_sn;
+	volatile uint32_t *p_u_arm_sn;
 };
 
 struct mthca_srq {
@@ -257,7 +257,7 @@ static inline int mthca_is_memfree(struc
 }
 
 int mthca_alloc_db(struct mthca_db_table *db_tab, enum mthca_db_type type,
-			  uint32_t **db);
+			  volatile uint32_t **db);
 void mthca_set_db_qn(uint32_t *db, enum mthca_db_type type, uint32_t qn);
 void mthca_free_db(struct mthca_db_table *db_tab, enum mthca_db_type type, int db_index);
 struct mthca_db_table *mthca_alloc_db_tab(int uarc_size);
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/hw/mthca/user/mlnx_uvp.rc branches\mlx4/hw/mthca/user/mlnx_uvp.rc
--- trunk/hw/mthca/user/mlnx_uvp.rc	2011-09-13 09:16:20.414330100 -0700
+++ branches\mlx4/hw/mthca/user/mlnx_uvp.rc	2011-10-10 16:59:49.198698900 -0700
@@ -37,8 +37,8 @@
 
 #ifdef DBG
 #define VER_FILEDESCRIPTION_STR     "HCA User Mode Verb Provider (checked)"
-#define VER_INTERNALNAME_STR		"mthcaud.dll"
-#define VER_ORIGINALFILENAME_STR	"mthcaud.dll"
+#define VER_INTERNALNAME_STR		"mthcau.dll"
+#define VER_ORIGINALFILENAME_STR	"mthcau.dll"
 #else
 #define VER_FILEDESCRIPTION_STR     "HCA User Mode Verb Provider"
 #define VER_INTERNALNAME_STR		"mthcau.dll"
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/hw/mthca/user/mlnx_uvp_memfree.c branches\mlx4/hw/mthca/user/mlnx_uvp_memfree.c
--- trunk/hw/mthca/user/mlnx_uvp_memfree.c	2011-09-13 09:16:20.088297500 -0700
+++ branches\mlx4/hw/mthca/user/mlnx_uvp_memfree.c	2011-10-10 16:59:49.368715900 -0700
@@ -52,7 +52,7 @@ struct mthca_db_table {
 };
 
 int mthca_alloc_db(struct mthca_db_table *db_tab, enum mthca_db_type type,
-		   uint32_t **db)
+		  volatile uint32_t **db)
 {
 	int i, j, k;
 	int group, start, end, dir;
@@ -128,7 +128,7 @@ found:
 		j = MTHCA_DB_REC_PER_PAGE - 1 - j;
 
 	ret = i * MTHCA_DB_REC_PER_PAGE + j;
-	*db = (uint32_t *) &db_tab->page[i].db_rec[j];
+	*db = (volatile uint32_t *) &db_tab->page[i].db_rec[j];
 	
 out:
 	ReleaseMutex( db_tab->mutex );
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/hw/mthca/user/SOURCES branches\mlx4/hw/mthca/user/SOURCES
--- trunk/hw/mthca/user/SOURCES	2011-09-13 09:16:19.951283800 -0700
+++ branches\mlx4/hw/mthca/user/SOURCES	2011-10-10 16:59:49.282707300 -0700
@@ -2,15 +2,20 @@ TRUNK=..\..\..
 
 TARGETNAME=mthcau
 
+
 TARGETPATH=$(TRUNK)\bin\user\obj$(BUILD_ALT_DIR)
 TARGETTYPE=DYNLINK
-
+!if $(_NT_TOOLS_VERSION) == 0x700
+# DDK
+DLLDEF=$O\mlnx_uvp.def
+!else
 # WDK
 DLLDEF=$(OBJ_PATH)\$O\mlnx_uvp.def
-
+!endif
 #USE_NTDLL=1
 USE_MSVCRT=1
 DLLENTRY=DllMain
+NTTARGETFILES=Custom_target
 
 !if $(FREEBUILD)
 ENABLE_EVENT_TRACING=1
@@ -56,8 +61,13 @@ TARGETLIBS=\
 	$(SDK_LIB_PATH)\user32.lib \
 	$(SDK_LIB_PATH)\kernel32.lib \
 	$(SDK_LIB_PATH)\Advapi32.lib \
-	$(TARGETPATH)\*\complib.lib \
-	$(TARGETPATH)\*\ibal.lib
+        $(TARGETPATH)\*\complib.lib \
+        $(TARGETPATH)\*\ibal.lib
+
+
+!if !$(FREEBUILD)
+C_DEFINES=$(C_DEFINES) -D_DEBUG -DDEBUG -DDBG
+!endif
 
 #LINKER_FLAGS=/MAP /MAPINFO:LINES


SH: That's a huge number of changes to a driver that does NOT support IBoE.  Are all of those really needed?
 
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/inc/complib/cl_memory.h branches\mlx4/inc/complib/cl_memory.h
--- trunk/inc/complib/cl_memory.h	2011-09-13 09:15:47.799068900 -0700
+++ branches\mlx4/inc/complib/cl_memory.h	2011-10-10 16:59:17.406520000 -0700
@@ -189,6 +189,55 @@ __cl_malloc_trk(
 *	Memory Management, __cl_malloc_ntrk, __cl_zalloc_trk, __cl_free_trk
 **********/
 
+/****i* Component Library: Memory Management/__cl_malloc_trk_ex
+* NAME
+*	__cl_malloc_trk_ex
+*
+* DESCRIPTION
+*	The __cl_malloc_trk_ex function allocates and tracks a block of memory
+*	initialized to zero.
+*
+* SYNOPSIS
+*/
+CL_EXPORT void* CL_API
+__cl_malloc_trk_ex(
+	IN	const char* const	p_file_name,
+	IN	const int32_t		line_num,
+	IN	const size_t		size,
+	IN	const boolean_t		pageable,
+	IN	const char*			tag );
+/*
+* PARAMETERS
+*	p_file_name
+*		[in] Name of the source file initiating the allocation.
+*
+*	line_num
+*		[in] Line number in the specified file where the allocation is
+*		initiated
+*
+*	size
+*		[in] Size of the requested allocation.
+*
+*	pageable
+*		[in] On operating systems that support pageable vs. non pageable
+*		memory in the kernel, set to TRUE to allocate memory from paged pool.
+*     tag
+*		[in] An optional ASCII string, describing the name of memory block (or the owner)
+*
+* RETURN VALUES
+*	Pointer to allocated memory if successful.
+*
+*	NULL otherwise.
+*
+* NOTES
+*	Allocated memory follows alignment rules specific to the different
+*	environments.
+*	This function should not be called directly.  The cl_zalloc macro will
+*	redirect users to this function when memory tracking is enabled.
+*
+* SEE ALSO
+*	Memory Management, __cl_zalloc_ntrk, __cl_malloc_trk, __cl_free_trk
+**********/
 
 /****i* Component Library: Memory Management/__cl_zalloc_trk
 * NAME
@@ -237,6 +286,57 @@ __cl_zalloc_trk(
 *	Memory Management, __cl_zalloc_ntrk, __cl_malloc_trk, __cl_free_trk
 **********/
 
+/****i* Component Library: Memory Management/__cl_zalloc_trk_ex
+* NAME
+*	__cl_zalloc_trk_ex
+*
+* DESCRIPTION
+*	The __cl_zalloc_trk_ex function allocates and tracks a block of memory
+*	initialized to zero.
+*
+* SYNOPSIS
+*/
+CL_EXPORT void* CL_API
+__cl_zalloc_trk_ex(
+	IN	const char* const	p_file_name,
+	IN	const int32_t		line_num,
+	IN	const size_t		size,
+	IN	const boolean_t		pageable,
+	IN	const char*			tag );
+/*
+* PARAMETERS
+*	p_file_name
+*		[in] Name of the source file initiating the allocation.
+*
+*	line_num
+*		[in] Line number in the specified file where the allocation is
+*		initiated
+*
+*	size
+*		[in] Size of the requested allocation.
+*
+*	pageable
+*		[in] On operating systems that support pageable vs. non pageable
+*		memory in the kernel, set to TRUE to allocate memory from paged pool.
+*     tag
+*		[in] An optional ASCII string, describing the name of memory block (or the owner)
+*
+* RETURN VALUES
+*	Pointer to allocated memory if successful.
+*
+*	NULL otherwise.
+*
+* NOTES
+*	Allocated memory follows alignment rules specific to the different
+*	environments.
+*	This function should not be called directly.  The cl_zalloc macro will
+*	redirect users to this function when memory tracking is enabled.
+*
+* SEE ALSO
+*	Memory Management, __cl_zalloc_ntrk, __cl_malloc_trk, __cl_free_trk
+**********/
+
+
 
 /****i* Component Library: Memory Management/__cl_malloc_ntrk
 * NAME
@@ -933,6 +1033,18 @@ cl_copy_from_user(
 #define cl_pzalloc( a )	\
 	__cl_zalloc_trk( __FILE__, __LINE__, a, TRUE )
 
+#define cl_malloc_ex( a, tag )	\
+	__cl_malloc_trk_ex( __FILE__, __LINE__, a, FALSE, tag )
+
+#define cl_zalloc_ex( a, tag )	\
+	__cl_zalloc_trk_ex( __FILE__, __LINE__, a, FALSE, tag )
+
+#define cl_palloc_ex( a, tag )	\
+	__cl_malloc_trk_ex( __FILE__, __LINE__, a, TRUE, tag )
+
+#define cl_pzalloc_ex( a, tag )	\
+	__cl_zalloc_trk_ex( __FILE__, __LINE__, a, TRUE, tag )
+
 #define cl_free( a )	\
 	__cl_free_trk( a )
 
@@ -949,6 +1061,14 @@ cl_copy_from_user(
 
 #define cl_pzalloc( a )	\
 	__cl_zalloc_ntrk( a, TRUE )
+	
+#define cl_malloc_ex( a, tag )	cl_malloc( a )
+	
+#define cl_zalloc_ex( a, tag )	cl_zalloc( a )
+	
+#define cl_palloc_ex( a, tag )	cl_palloc( a )
+	
+#define cl_pzalloc_ex( a, tag )	cl_pzalloc( a )
 
 #define cl_free( a )	\
 	__cl_free_ntrk( a )

SH: Do not extend complib even more.  It needs to go away and be replaced with native calls (which any Windows developer would recognize), not added to.

diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/inc/iba/ib_al_ioctl.h branches\mlx4/inc/iba/ib_al_ioctl.h
--- trunk/inc/iba/ib_al_ioctl.h	2011-09-13 09:15:48.686157600 -0700
+++ branches\mlx4/inc/iba/ib_al_ioctl.h	2011-10-10 16:59:19.579737300 -0700
@@ -3479,7 +3479,7 @@ typedef struct _ual_ndi_req_cm_ioctl_in
 	uint64_t					h_qp;
 	net64_t						guid;
 	net32_t						cid;
-	uint16_t					dst_port;
+	net16_t						dst_port;
 	uint8_t						resp_res;
 	uint8_t						init_depth;
 	uint8_t						prot;
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/inc/iba/ib_ci.h branches\mlx4/inc/iba/ib_ci.h
--- trunk/inc/iba/ib_ci.h	2011-09-13 09:15:48.704159400 -0700
+++ branches\mlx4/inc/iba/ib_ci.h	2011-10-10 16:59:19.596739000 -0700
@@ -74,7 +74,9 @@ extern "C"
  * definition.
  */
 #define VERBS_MAJOR_VER			(0x0002)
-#define VERBS_MINOR_VER			(0x0002)
+#define VERBS_MINOR_VER			(0x0004)
+#define VERBS_EX_MAJOR_VER		(0x0001)
+#define VERBS_EX_MINOR_VER		(0x0000)
 
 #define VERBS_VERSION			(((VERBS_MAJOR_VER) << 16) | (VERBS_MINOR_VER))
 #define MK_VERBS_VERSION(maj,min)	((((maj) & 0xFFFF) << 16) | \

SH: Can't this be done without breaking every application?  What is the plan for supporting existing applications?

@@ -164,7 +166,7 @@ typedef void
 * NOTES
 *	The consumer only gets the cq_context. It is the client
 *	responsibility to store the cq_handle in the context after the creation
-*	time. So it can call ci_poll_cq() after the arrival of the notification.
+*	time. So it can call ci_poll_cq() or ci_poll_cq_array() after the arrival of the notification.
 * SEE ALSO
 *	ci_create_cq
 ******
@@ -916,7 +918,7 @@ typedef ib_api_status_t
 	IN		const	ib_pd_handle_t				h_pd,
 	IN		const	void						*qp_context,
 	IN		const	ci_async_event_cb_t			pfn_async_event_cb,
-	IN		const	ib_qp_create_t				*p_create_attr,
+	IN	OUT			ib_qp_create_t				*p_create_attr,
 		OUT			ib_qp_attr_t				*p_qp_attr,
 		OUT			ib_qp_handle_t				*ph_qp,
 	IN	OUT			ci_umv_buf_t				*p_umv_buf OPTIONAL );
@@ -934,7 +936,7 @@ typedef ib_api_status_t
 *	pfn_async_event_cb
 *		[in] Asynchronous event handler.
 *	p_create_attr
-*		[in] Initial attributes with which the qp must be created.
+*		[in,out] Initial attributes with which the qp must be created.
 *	p_qp_attr
 *		[out] Attributes of the newly created queue pair.
 *	ph_qp
@@ -961,6 +963,7 @@ typedef ib_api_status_t
 *		Unreliable datagram not supported
 *	IB_INVALID_PARAMETER
 *		The parameter p_create_attr is invalid.
+*		erroneous value p_create_attr fixed to the max possible
 * NOTES
 *	If any of the initial parameters is not valid, the queue pair is not
 *	created. If the routine call is not successful then the contents of
@@ -982,7 +985,7 @@ typedef ib_api_status_t
 	IN		const	uint8_t				port_num,
 	IN		const	void				*qp_context,
 	IN		const	ci_async_event_cb_t			pfn_async_event_cb,
-	IN		const	ib_qp_create_t		*p_create_attr,
+	IN	OUT			ib_qp_create_t		*p_create_attr,
 		OUT			ib_qp_attr_t		*p_qp_attr,
 		OUT			ib_qp_handle_t		*ph_qp );
 /*
@@ -1001,8 +1004,9 @@ typedef ib_api_status_t
 *	pfn_async_event_cb
 *		[in] Asynchronous event handler.
 *	p_create_attr
-*		[in] Initial set of attributes with which the queue pair is to be
-*		created.
+*		[in,out] Initial set of attributes with which the queue pair is to be created.
+*			     Upon invalid parameter function will return IB_INVALID_SETTING 
+*			     and change the parameter by max allowable value.
 *	p_qp_attr
 *		[out] QP attributes after the qp is successfully created.
 *
@@ -1230,7 +1234,7 @@ typedef ib_api_status_t
 *	pending callbacks returning back to the verbs provider driver.
 *
 *	If the CQ associated with this QP is still not destroyed, the completions
-*	on behalf of this QP can still be pulled via the ci_poll_cq() call. Any
+*	on behalf of this QP can still be pulled via the ci_poll_cq() or ci_poll_cq_array() call. Any
 *	resources allocated by the Channel Interface must be deallocated as part
 *	of this call.
 * SEE ALSO
@@ -1289,7 +1293,83 @@ typedef ib_api_status_t
 *		one of the parameters was NULL.
 * NOTES
 *	The consumer would need a way to retrieve the cq_handle associated with
-*	context being returned, so it can perform ci_poll_cq() to retrieve
+*	context being returned, so it can perform ci_poll_cq() or ci_poll_cq_array() to retrieve
+*	completion queue entries. The handle as such is not being passed, since
+*	there is no information in the handle that is visible to the consumer.
+*	Passing a context directly would help avoid any reverse lookup that the
+*	consumer would need to perform in order to identify it's own internal
+*	data-structures	needed to process this completion completely.
+* SEE ALSO
+*	ci_destroy_cq, ci_query_cq, ci_resize_cq
+******
+*/
+
+
+/****** struct ci_group_affinity_t
+* mask - Specifies the affinity mask. The bits in the affinity mask identify a set of processors within the group identified by 'group' field. 
+* group - Specifies the group number. 
+*/
+typedef struct _ib_group_affinity
+{
+	uint64_t				mask;
+	uint16_t				group;
+}	ib_group_affinity_t;
+
+/****f* Verbs/ci_create_cq
+* NAME
+*	ci_create_cq_ex -- Create a completion queue (CQ) on the specified HCA with specified affinity.
+* SYNOPSIS
+*/
+
+typedef ib_api_status_t
+(*ci_create_cq_ex) (
+	IN		const	ib_ca_handle_t				h_ca,
+	IN		const	void						*cq_context,
+	IN		const	ci_async_event_cb_t			pfn_async_event_cb,
+	IN				ci_completion_cb_t			completion_cb,
+	IN				ib_group_affinity_t         *affinity,
+	IN	OUT			uint32_t* const				p_size,
+		OUT			ib_cq_handle_t				*ph_cq,
+	IN	OUT			ci_umv_buf_t				*p_umv_buf OPTIONAL );

SH: If we need a new call to create CQs, we should just pass in a structure, so that it can be more easily extended later.

+/*
+* DESCRIPTION
+*	The consumer must specify the minimum number of entries in the CQ. The
+*	exact number of entries the Channel Interface created is returned to the
+*	client. If the requested number of entries is larger than what this
+*	HCA can support, an error is returned.
+* PARAMETERS
+*	h_ca
+*		[in] A handle to the open HCA
+*	cq_context
+*		[in] The context that is passed during the completion callbacks.
+*	pfn_async_event_cb
+*		[in] Asynchronous event handler.
+*	completion_cb
+*		[in] Callback for completion events
+*	affinity
+*		[in] CQ affinity
+*	p_size
+*		[in out] Points to a variable containing the number of CQ entries
+*		requested by the consumer. On completion points to the size of the
+*		CQ that was created by the provider.
+*	ph_cq
+*		[out] Handle to the newly created CQ on successful creation.
+*	p_umv_buf
+*		[in out] Vendor specific parameter to support user mode IO.
+* RETURN VALUE
+*	IB_SUCCESS
+*		The operation was successful.
+*	IB_INVALID_CA_HANDLE
+*		The h_ca passed is invalid.
+*	IB_INSUFFICIENT_RESOURCES
+*		Insufficient resources to complete request.
+*	IB_INVALID_CQ_SIZE
+*		Requested CQ Size is not supported.
+*	IB_INVALID_PARAMETER
+*		one of the parameters was NULL.
+* NOTES
+*	The consumer would need a way to retrieve the cq_handle associated with
+*	context being returned, so it can perform ci_poll_cq() or ci_poll_cq_array() to retrieve
 *	completion queue entries. The handle as such is not being passed, since
 *	there is no information in the handle that is visible to the consumer.
 *	Passing a context directly would help avoid any reverse lookup that the
@@ -1353,6 +1433,41 @@ typedef ib_api_status_t
 ******
 */
 
+typedef ib_api_status_t
+(*ci_modify_cq) (
+	IN		const	ib_cq_handle_t				h_cq,
+	IN 		uint16_t 							moder_cnt,
+	IN      uint16_t 							moder_time,
+	IN	OUT			ci_umv_buf_t				*p_umv_buf OPTIONAL );

SH: This needs a difference name than 'modify CQ', but I really question whether this functionality is something that should be exposed above verbs.  This seems more like a feature of how the CQ is armed, versus its existence.

+/*
+* DESCRIPTION
+*	This routine allows the caller to modify CQ  interrupt moderation.
+* PARAMETERS
+*	h_cq
+*		[in] Completion Queue handle
+*	moder_cnt
+*		[in] This parameter indicates the requested interrupt moderation count.
+*	moder_time
+*		[in] This parameter indicates the requested time interval between indicated interrupts.
+*	p_umv_buf
+*		[in out] Vendor specific parameter to support user mode IO.
+* RETURN VALUE
+*	IB_SUCCESS
+*		The modify operation was successful.
+*	IB_INVALID_CQ_HANDLE
+*		The CQ handle is invalid.
+*	IB_INSUFFICIENT_RESOURCES
+*		Insufficient resources to complete request.
+*	IB_INVALID_PARAMETER
+*		one of the parameters was NULL.
+*
+* NOTES
+*
+* SEE ALSO
+*	ci_create_cq
+******
+*/
+
 /****f* Verbs/ci_query_cq
 * NAME
 *	ci_query_cq -- Query the number of entries configured for the CQ.
@@ -2273,7 +2388,7 @@ typedef ib_api_status_t
 *	on different types of queue pairs, and the different modifiers
 *	acceptable for the work request for different QP service types.
 * SEE ALSO
-*	ci_post_recv, ci_poll_cq
+*	ci_post_recv, ci_poll_cq, ci_poll_cq_array
 ******
 */
 
@@ -2365,7 +2480,7 @@ typedef ib_api_status_t
 *		QP was in reset or init state.
 *		(TBD: there may be an errata that allows posting in init state)
 * SEE ALSO
-*	ci_post_send, ci_poll_cq.
+*	ci_post_send, ci_poll_cq, ci_poll_cq_array
 ******
 */
 
@@ -2406,7 +2521,7 @@ typedef ib_api_status_t
 *	is optional by a channel adapter vendor.
 *
 * SEE ALSO
-*	ci_create_cq, ci_poll_cq, ci_enable_cq_notify, ci_enable_ncomp_cq_notify
+*	ci_create_cq, ci_poll_cq, ci_poll_cq_array, ci_enable_cq_notify, ci_enable_ncomp_cq_notify
 *****/
 
 /****f* Verbs/ci_poll_cq
@@ -2450,6 +2565,57 @@ typedef ib_api_status_t
 ******
 */
 
+
+/****f* Verbs/ci_poll_cq_array
+* NAME
+*	ci_poll_cq_array -- Retrieve a work completion record from a completion queue
+* SYNOPSIS
+*/
+
+typedef ib_api_status_t
+(*ci_poll_cq_array) (
+	IN		const	ib_cq_handle_t				h_cq,
+	IN	OUT			int*						p_num_entries,
+		OUT			ib_wc_t*	const			wc );
+/*
+* DESCRIPTION
+*	This routine retrieves a work completion entry from the specified
+*	completion queue. The contents of the data returned in a work completion
+*	is specified in ib_wc_t.
+*
+* PARAMETERS
+*	h_cq
+*		[in] Handle to the completion queue being polled.
+*	p_num_entries
+*		[in out] Pointer to a variable, containing the number of entries in the array.
+*		On succeful return it will contain the number of filled entries of the array.
+*	wc
+*		[out] An array of workcompletions retrieved from the completion queue
+*		and successfully processed.
+*
+* RETURN VALUE
+*	IB_SUCCESS
+*		Poll completed successfully and found N>0 entries. 
+*		The wc array then contains N entries filled and *p_num_entries is equal to N.
+*	IB_INVALID_CQ_HANDLE
+*		The cq_handle supplied is not valid.
+*	IB_NOT_FOUND
+*		There were no completion entries found in the specified CQ.
+*
+* NOTES
+*	This function returns qp_context in the first field of WC structure.
+*	This first field (p_next) is intended to link WCs in a list and is not supposed 
+*	to be used in an array of WCs.
+*	qp_context is a value, defined by user upon create_qp.
+*	This function is intended for use with SRQ when the new qp_context
+*	returned value will to the QP, related to the completion.
+*
+* SEE ALSO
+*	ci_create_cq, ci_post_send, ci_post_recv, ci_bind_mw
+******
+*/
+
+
 /****f* Verbs/ci_enable_cq_notify
 * NAME
 *	ci_enable_cq_notify -- Invoke the Completion handler, on next entry added.
@@ -2482,12 +2648,12 @@ typedef ib_api_status_t
 *	The consumer cannot call a request for notification without emptying
 *	entries from the CQ. i.e if a consumer registers for a notification
 *	request in the completion callback before pulling entries from the
-*	CQ via ci_poll_cq, the notification is not generated for completions
+*	CQ via ci_poll_cq or ci_poll_cq_array, the notification is not generated for completions
 *	already in the CQ. For e.g. in the example below, if there are no calls
-*   to ci_poll_cq()	after the ci_enable_cq_notify(). For any CQ entries added
+*   to ci_poll_cq() or ci_poll_cq_array() after the ci_enable_cq_notify(). For any CQ entries added
 *	before calling this ci_enable_cq_notify() call, the consumer does not
 *	get a completion notification callback. In order to comply with the verb
-*	spec, consumer is supposed to perform a ci_poll_cq() after the
+*	spec, consumer is supposed to perform a ci_poll_cq() or ci_poll_cq_array() after the
 *	ci_enable_cq_notify() is made to retrive any entries that might have
 *	been added to the CQ before the CI registers the notification enable.
 *
@@ -2548,7 +2714,7 @@ typedef ib_api_status_t
 *	vendor.
 *
 * SEE ALSO
-*	ci_create_cq, ci_peek_cq, ci_poll_cq, ci_enable_cq_notify
+*	ci_create_cq, ci_peek_cq, ci_poll_cq, ci_poll_cq_array, ci_enable_cq_notify
 ******
 */
 
@@ -2774,6 +2940,116 @@ typedef ib_api_status_t
 *	ci_register_smr, ci_create_mw, ib_ci_op_t
 *****/
 
+
+/****f* Verbs/ci_alloc_fast_reg_mr
+* NAME
+*	ci_alloc_fast_reg_mr -- Allocate MR for fast registration
+*
+* SYNOPSIS
+*/
+
+typedef ib_api_status_t
+(*ci_alloc_fast_reg_mr) (
+	IN		const	ib_pd_handle_t				h_pd,
+	IN		const	int							max_page_list_len,
+	OUT			net32_t* const					p_lkey,
+	OUT			net32_t* const					p_rkey,
+	OUT 		ib_mr_handle_t					*ph_mr );
+/*
+* DESCRIPTION
+*	This routine allocates MR for fast registration send request
+*
+* PARAMETERS
+*	h_pd
+*		[in] Handle to PD
+*
+*	max_page_list_len
+*		[in] size of the region in pages
+*
+*	p_lkey
+*		[out] Local Key Attributes of the registered memory region
+*
+*	p_rkey
+*		[out] Remote key of the registered memory region. The verbs provider
+*		is required to give this in the expected ordering on the wire. When
+*		rkey's are exchanged between remote nodes, no swapping of this data
+*		will be performed.
+*
+*	ph_mr
+*		[out] handle to the allocated region
+*	
+* RETURN VALUE
+*	IB_SUCCESS
+*		MR was successfully allocated.
+*
+* SEE ALSO
+*	
+******
+*/
+
+/****f* Verbs/ci_alloc_fast_reg_page_list
+* NAME
+*	ci_alloc_fast_reg_page_list -- Allocate page list for fast registration
+*
+* SYNOPSIS
+*/
+
+typedef ib_api_status_t
+(*ci_alloc_fast_reg_page_list) (
+	IN		const	ib_ca_handle_t				h_ca,
+	IN		const	int							max_page_list_len,
+	OUT 		ib_fast_reg_page_list_t			**page_list );
+/*
+* DESCRIPTION
+*	This routine allocates page list for fast registration send request
+*
+* PARAMETERS
+*	h_ca
+*		[in] Handle to HCA
+*
+*	max_page_list_len
+*		[in] size of the region in pages
+*
+*	page_list
+*		[out] pointer to the allocated page list
+*	
+* RETURN VALUE
+*	IB_SUCCESS
+*		page list was successfully allocated.
+*
+* SEE ALSO
+*	
+******
+*/
+
+
+
+/****f* Verbs/ci_free_fast_reg_page_list
+* NAME
+*	ci_free_fast_reg_page_list -- Release page list
+* SYNOPSIS
+*/
+
+typedef void
+(*ci_free_fast_reg_page_list) (
+	IN				ib_fast_reg_page_list_t		*h_fr );
+/*
+* DESCRIPTION
+*	This routine Release page list for fast memory registration
+*
+* PARAMETERS
+*	h_fr
+*		[in] The multicast handle passed back to consumer after the
+*	
+* RETURN VALUE
+*
+* SEE ALSO
+*
+******
+*/
+
+
+
 typedef enum rdma_transport_type
 (*ci_rdma_port_get_transport) (
 	IN			const	ib_ca_handle_t		h_ca,
@@ -2795,6 +3071,31 @@ typedef enum rdma_transport_type
 ******
 */
 
+typedef uint8_t
+(*ci_get_sl_for_ip_port) (
+	IN			const	ib_ca_handle_t		h_ca,
+	IN			const 	uint8_t				adapter_port_num,
+	IN			const 	uint16_t			ip_port_num );
+
+/*
+* DESCRIPTION
+*	This routine retrives service level for given IP port
+*
+* PARAMETERS
+*	h_ca
+*		[in] Handle to HCA
+*	adapter_port_num
+*		[in] Adapter port number
+*	ip_port_num
+*		[in] IP port number
+*	
+* RETURN VALUE
+*		Service level to use for given IP port
+* SEE ALSO
+*
+******
+*/
+
 
SH: This isn't a verbs function.  IP addressing resides above the HCA driver.  And port numbers are even above IP.  There's something wrong in the design of the stack to have this pushed so far down.

 #define	MAX_LIB_NAME		32
 
@@ -2887,6 +3188,7 @@ typedef struct _ci_interface
 	 */
 	ci_create_cq		create_cq;
 	ci_resize_cq		resize_cq;
+	ci_modify_cq		modify_cq;
 	ci_query_cq			query_cq;
 	ci_destroy_cq		destroy_cq;
 
@@ -2946,9 +3248,22 @@ typedef struct _ci_interface
 	 */
 	ci_local_mad		local_mad;
 
+	/* 2.1 verbs */
+	ci_poll_cq_array			poll_cq_array;
+
+	/* fast memory registration support */
+	ci_alloc_fast_reg_mr		alloc_fast_reg_mr;
+	ci_alloc_fast_reg_page_list	alloc_fast_reg_page_list;
+	ci_free_fast_reg_page_list	free_fast_reg_page_list;
+
 	/* 2.2 verbs */
 	ci_rdma_port_get_transport rdma_port_get_transport;
 
+	/* 2.3 verbs */
+	ci_get_sl_for_ip_port 	   get_sl_for_ip_port;	
+
+	/* 2.4 verbs */
+	ci_create_cq_ex     	   create_cq_ex;
 } ci_interface_t;
 /********/
 
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/inc/iba/ib_types.h branches\mlx4/inc/iba/ib_types.h
--- trunk/inc/iba/ib_types.h	2011-10-10 16:57:01.228903600 -0700
+++ branches\mlx4/inc/iba/ib_types.h	2011-10-10 16:59:19.559735300 -0700
@@ -9996,6 +9996,8 @@ typedef struct _ib_port_attr {
 	uint8_t					subnet_timeout;
 	uint8_t					active_speed;
 	uint8_t					phys_state;
+	uint8_t					ext_active_speed;
+	uint8_t					link_encoding;
 
 	ib_port_cap_t			cap;
 	uint16_t				pkey_ctr;
@@ -10074,6 +10076,9 @@ typedef struct _ib_ca_attr {
 	uint32_t				max_srq;
 	uint32_t				max_srq_wrs;
 	uint32_t				max_srq_sges;
+	uint32_t 				bf_reg_size;
+	uint32_t 				bf_regs_per_page;
+	uint32_t				max_sq_desc_sz;
 
 	/*
 	 * local_ack_delay:
@@ -10623,6 +10628,29 @@ typedef struct _ib_qp_create
 	boolean_t				sq_signaled;
 
 }	ib_qp_create_t;
+
+/****s* Access Layer/ib_qp_create_ex_t
+* NAME
+*	ib_qp_create_t
+*
+* DESCRIPTION
+*	Attributes used to initialize a queue pair at creation time.
+*
+* SYNOPSIS
+*/
+	
+typedef enum _ib_qp_create_flags_t
+{
+	IB_QP_CREATE_FLAG_SQ_ACCESS_CLIENT_SYNC = 1, // access to SQ is synchronous, no need for lock at lower levels 
+	IB_QP_CREATE_FLAG_RQ_ACCESS_CLIENT_SYNC = 1 << 1, // access to RQ is synchronous, no need for lock at lower levels 	
+} ib_qp_create_flags_t;
+
+typedef struct _ib_qp_create_ex
+{
+	ib_qp_create_t qp_create;
+	int create_flags;
+}	ib_qp_create_ex_t;
+
 /*
 * FIELDS
 *	type
@@ -10875,6 +10903,8 @@ typedef enum _ib_wr_type_t
 	WR_FETCH_ADD,
 	WR_LSO,
 	WR_NOP,
+	WR_LOCAL_INV,
+	WR_FAST_REG_MR,
 	WR_UNKNOWN
 
 }	ib_wr_type_t;
@@ -10994,6 +11024,38 @@ typedef uint32_t ib_send_opt_t;
 
 /****s* Access Layer/ib_send_wr_t
 * NAME
+*	ib_fast_reg_page_list_t
+*
+* DESCRIPTION
+*	Information used to submit a work request for fast registration
+*
+*
+* SYNOPSIS
+*/
+typedef struct _ib_fast_reg_page_list 
+{
+	void		       	*device;
+	uint64_t		    *page_list;
+	unsigned int		max_page_list_len;
+} ib_fast_reg_page_list_t;
+/*
+* FIELDS
+*	device
+*		This field is reserved for low-level driver and should not be changed by caller
+*
+*	page_list
+*		Pointer to the array of physical page addresses
+*
+*	max_page_list_len
+*		Number of elements in the array
+*
+* SEE ALSO
+*	ib_send_wr_t
+*****/
+
+
+/****s* Access Layer/ib_send_wr_t
+* NAME
 *	ib_send_wr_t
 *
 * DESCRIPTION
@@ -11010,7 +11072,10 @@ typedef struct _ib_send_wr
 	uint32_t					num_ds;
 	ib_wr_type_t				wr_type;
 	ib_send_opt_t				send_opt;
+	union {
 	ib_net32_t					immediate_data;
+		net32_t					invalidate_rkey;
+	};
 
 	union
 	{
@@ -11065,6 +11130,19 @@ typedef struct _ib_send_wr
 			ib_net64_t			atomic2;
 
 		}	remote_ops;
+
+		struct _fast_reg
+		{
+			uint64_t 			iova_start;
+			ib_fast_reg_page_list_t   *page_list;
+			unsigned int		page_shift;
+			unsigned int		page_list_len;
+			uint32_t 			length;
+			int 				access_flags;
+			net32_t 			rkey;
+			uint32_t			fbo;
+			
+		} 	fast_reg;
 	};
 }	ib_send_wr_t;
 /*
@@ -12833,7 +12911,8 @@ typedef struct _ib_time_stamp {
 *	ib_cc_mad_t
 *********/
 
-#define IB_REQ_CM_RDMA_SID_PREFIX			0x0000000001000000
+#define IB_REQ_CM_RDMA_SID_PREFIX			CL_NTOH64( 0x0000000001000000I64 )
+#define IB_REQ_CM_RDMA_SID_PREFIX_MASK		CL_NTOH64( 0xFFFFFFFFFF000000I64 )

SH: These should be defined as HTON, not NTOH.

 #define IB_REQ_CM_RDMA_PDATA_SIZE			56
 #define IB_REQ_CM_RDMA_MAJOR_VERSION		0
 #define IB_REQ_CM_RDMA_MINOR_VERSION		0
@@ -12880,6 +12959,28 @@ typedef struct _ib_cm_rdma_req
 *
 *****/
 
+AL_INLINE net64_t AL_API
+ib_cm_rdma_sid(
+	IN				uint8_t						protocol,
+	IN				net16_t						port )
+{
+	return IB_REQ_CM_RDMA_SID_PREFIX | ((UINT64)protocol) << 40 | ((UINT64)port) << 48;
+}
+
+AL_INLINE net16_t AL_API
+ib_cm_rdma_sid_port(
+	IN				net64_t						sid )
+{
+	return (net16_t)(sid >> 48);
+}
+
+AL_INLINE uint8_t AL_API
+ib_cm_rdma_sid_protocol(
+	IN				net64_t						sid )
+{
+	return (uint8_t)(sid >> 40);
+}

SH: Use ntoh and hton on these routines to clarify the byte ordering, rather than assuming the endianess of the system.

+
 /****f* IBA Base: Types/ib_port_info_get_sm_sl
 * NAME
 *	ib_port_info_get_sm_sl
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/inc/kernel/iba/ib_cm_ifc.h branches\mlx4/inc/kernel/iba/ib_cm_ifc.h
--- trunk/inc/kernel/iba/ib_cm_ifc.h	2011-09-13 09:15:45.849874000 -0700
+++ branches\mlx4/inc/kernel/iba/ib_cm_ifc.h	2011-10-10 16:59:15.449324300 -0700
@@ -309,6 +309,7 @@ typedef struct _INFINIBAND_INTERFACE_CM
 
 }	INFINIBAND_INTERFACE_CM;
 
+
 #endif // _ib_cm_ifc_h_
 
 
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/inc/kernel/ip_packet.h branches\mlx4/inc/kernel/ip_packet.h
--- trunk/inc/kernel/ip_packet.h	2011-10-10 16:57:01.335914300 -0700
+++ branches\mlx4/inc/kernel/ip_packet.h	2011-10-10 16:59:15.642343600 -0700
@@ -547,7 +547,7 @@ static const uint8_t coIPoIB_CID_TotalLe
 static const uint8_t coIBDefaultDHCPPrefix[] = {
 	coIPoIB_HwTypeIB, 0x0, 0x0, 0x0,
 	0x0, 0x0, 0x2, 0x0,
-	0x0, 0x2, 0xc, 0x0
+	0x0, 0x2, 0xc9, 0x0
 };
 
 /* The CID will contain of: 
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/inc/kernel/rdma/verbs.h branches\mlx4/inc/kernel/rdma/verbs.h
--- trunk/inc/kernel/rdma/verbs.h	2011-09-13 09:15:44.045693600 -0700
+++ branches\mlx4/inc/kernel/rdma/verbs.h	2011-10-10 16:59:13.735152900 -0700
@@ -26,10 +26,12 @@
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  */
+#pragma once
 
 #ifndef _VERBS_H_
 #define _VERBS_H_
 
+#include <initguid.h>
 #include <iba/ib_ci.h>
 
 static inline USHORT VerbsVersion(UINT8 Major, UINT8 Minor)
@@ -47,6 +49,9 @@ static inline UINT8 VerbsVersionMinor(US
 	return (UINT8) Version;
 }
 
+DEFINE_GUID(GUID_RDMA_INTERFACE_VERBS, 0xf0ebae86, 0xedb5, 0x4b40,
+			0xa1, 0xa, 0x44, 0xd5, 0xdb, 0x3b, 0x96, 0x4e);
+
 typedef struct _RDMA_INTERFACE_VERBS
 {
 	INTERFACE		InterfaceHeader;
@@ -54,9 +59,46 @@ typedef struct _RDMA_INTERFACE_VERBS
 
 }	RDMA_INTERFACE_VERBS;
 
+typedef NTSTATUS (*AddRemovalRelations)(
+	IN	PDEVICE_OBJECT	p_fdo,		// mlx4_hca FDO
+	IN	PDEVICE_OBJECT	p_pdo		// mlx4_hca client PDO
+);
+
+typedef void (*RemoveRemovalRelations)(
+	IN	PDEVICE_OBJECT	p_fdo,		// mlx4_hca FDO
+	IN	PDEVICE_OBJECT	p_pdo		// mlx4_hca client PDO
+);

SH: Remove the comments.  The calls and their usage should not be tied to mlx4.

+
+// {D6C1C27E-765C-4c6b-92C2-B4066FEA1992}
+DEFINE_GUID(GUID_RDMA_INTERFACE_VERBS_EX, 0xd6c1c27e, 0x765c, 0x4c6b,
+			0x92, 0xc2, 0xb4, 0x6, 0x6f, 0xea, 0x19, 0x92);
+
+typedef struct _RDMA_INTERFACE_VERBS_EX
+{
+	INTERFACE				InterfaceHeader;
+	PDEVICE_OBJECT			p_fdo;			// mlx4_hca FDO
+	AddRemovalRelations		add;			// add PDO to removal relations
+	RemoveRemovalRelations	rmv;			// remove PDO from removal relations
+	
+}	RDMA_INTERFACE_VERBS_EX;

SH: Unlike user space, I don't think there's a compelling reason why the kernel drivers need to support older interfaces.  If we need to extend the RDMA_INTERFACE_VERBS structure, just add to it and update the GUID.

+
+
+//
+// Interface, intended for notifications
+//
+
+// {A027188D-564D-4d4e-825A-6AEC19774BAB}
+DEFINE_GUID(MLX4_BUS_NOTIFY_GUID, 
+0xa027188d, 0x564d, 0x4d4e, 0x82, 0x5a, 0x6a, 0xec, 0x19, 0x77, 0x4b, 0xab);
+
+
+typedef VOID (*MLX4_NOTIFY) (PVOID ifc_ctx, ULONG type, PVOID p_data, PCHAR str);
+
+typedef struct _MLX4_BUS_NOTIFY_INTERFACE{
+	INTERFACE i;
+	MLX4_NOTIFY				notify;
+	
+} MLX4_BUS_NOTIFY_INTERFACE, *PMLX4_BUS_NOTIFY_INTERFACE;
+
 #endif // _VERBS_H_
 
-#ifdef DEFINE_GUID
-DEFINE_GUID(GUID_RDMA_INTERFACE_VERBS, 0xf0ebae86, 0xedb5, 0x4b40,
-			0xa1, 0xa, 0x44, 0xd5, 0xdb, 0x3b, 0x96, 0x4e);
-#endif // DEFINE_GUID
Only in branches\mlx4/inc/user/iba: ibat_ex.h
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/inc/user/rdma/winverbs.h branches\mlx4/inc/user/rdma/winverbs.h
--- trunk/inc/user/rdma/winverbs.h	2011-09-13 09:15:50.674356400 -0700
+++ branches\mlx4/inc/user/rdma/winverbs.h	2011-10-10 16:59:20.728852200 -0700
@@ -134,6 +134,14 @@ typedef enum _WV_PORT_STATE
 
 }	WV_PORT_STATE;
 
+typedef enum _WV_PORT_TRANSPORT
+{
+	WvPortTransportIb,
+	WvPortTransportIwarp,
+	WvPortTransportRdmaoe
+
+}	WV_PORT_TRANSPORT;
+

SH: Okay - this should handle older kernels which set the field to 0.  I vote to remove the use of 'port' from the name of the enum and the values.

 typedef struct _WV_PORT_ATTRIBUTES
 {
 	DWORD			PortCabilityFlags;
@@ -155,6 +163,8 @@ typedef struct _WV_PORT_ATTRIBUTES
 	UINT8			ActiveWidth;
 	UINT8			ActiveSpeed;
 	UINT8			PhysicalState;
+	UINT8			Transport;
+	UINT8			Reserved[1];
 
 }	WV_PORT_ATTRIBUTES;
 
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/tests/perftest/rdma_bw/rdma_bw.c branches\mlx4/tests/perftest/rdma_bw/rdma_bw.c
--- trunk/tests/perftest/rdma_bw/rdma_bw.c	2011-09-13 09:15:21.537443000 -0700
+++ branches\mlx4/tests/perftest/rdma_bw/rdma_bw.c	2011-10-10 16:58:49.607740400 -0700
@@ -56,6 +56,7 @@ struct pingpong_context {
 	int                 tx_depth;
 	struct ibv_sge      list;
 	struct ibv_send_wr  wr;
+	uint8_t				transport;
 };
 
 struct pingpong_dest {
@@ -64,6 +65,7 @@ struct pingpong_dest {
 	int psn;
 	unsigned rkey;
 	unsigned long long vaddr;
+	union ibv_gid gid;
 };
 
 struct pp_data {
@@ -79,6 +81,7 @@ struct pp_data {
 	struct ibv_device			*ib_dev;
 	struct rdma_event_channel 	*cm_channel;
 	struct rdma_cm_id 			*cm_id;
+	int							gid_index;
 };
 
 static void pp_post_recv(struct pingpong_context *);
@@ -434,7 +437,7 @@ static struct pingpong_context *pp_init_
 {
 	struct pingpong_context *ctx;
 	struct ibv_device *ib_dev;
-	struct rdma_cm_id *cm_id;
+	struct rdma_cm_id *cm_id = NULL;
 	struct ibv_qp_init_attr attr;
 
 	ctx = malloc(sizeof *ctx);
@@ -565,6 +568,14 @@ static int pp_connect_ctx(struct pingpon
 	attr.ah_attr.sl         = 0;
 	attr.ah_attr.src_path_bits = 0;
 	attr.ah_attr.port_num   = (uint8_t) data.ib_port;
+	if(ctx->transport == WvPortTransportRdmaoe) 
+		printf(" Using grh is forced due to the use of RoCE\n");
+	if (data.gid_index>=0) {
+		attr.ah_attr.is_global  = 1;
+		attr.ah_attr.grh.dgid   = data.rem_dest->gid;
+		attr.ah_attr.grh.sgid_index = (uint8_t)data.gid_index;
+		attr.ah_attr.grh.hop_limit = 1;
+	}
 	if (ibv_modify_qp(ctx->qp, &attr,
 			  IBV_QP_STATE              |
 			  IBV_QP_AV                 |
@@ -751,6 +762,7 @@ static void usage(const char *argv0)
 	printf("  -t <dep>      size of tx queue (default 100)\n");
 	printf("  -n <iters>    number of exchanges (at least 2, default 1000)\n");
 	printf("  -b            measure bidirectional bandwidth (default unidirectional)\n");
+	printf("  -x <index>    test uses GID with GID index taken from command line (for RDMAoE index should be 0)\n");
 	printf("  -c 		    use RDMA CM\n");
 }
 
@@ -816,6 +828,7 @@ int __cdecl main(int argc, char *argv[])
 	WORD					 version;
 	WSADATA					 wsdata;
 	int						 err;
+	struct ibv_port_attr 	port_attr;
 
 	srand((unsigned int) time(NULL));
 	version = MAKEWORD(2, 2);
@@ -834,12 +847,13 @@ int __cdecl main(int argc, char *argv[])
 	data.ib_dev     = NULL;
 	data.cm_channel = NULL;
 	data.cm_id      = NULL;
+	data.gid_index	= -1;
 
 	/* Parameter parsing. */
 	while (1) {
 		int c;
 
-		c = getopt(argc, argv, "h:p:d:i:s:n:t:bc");
+		c = getopt(argc, argv, "h:p:d:i:s:n:t:x:bc");
 		if (c == -1)
 			break;
 
@@ -868,6 +882,10 @@ int __cdecl main(int argc, char *argv[])
 			data.size = strtol(optarg, NULL, 0);
 			break;
 
+		case 'x':
+			data.gid_index = strtol(optarg, NULL, 0);
+			break;
+
 		case 't':
 			data.tx_depth = strtol(optarg, NULL, 0);
 			if (data.tx_depth < 1) {
@@ -958,8 +976,18 @@ int __cdecl main(int argc, char *argv[])
 			if (!ctx) 
 				return 1;
 		}
+
+		if (ibv_query_port(ctx->context,(uint8_t)data.ib_port,&port_attr)) {
+			fprintf(stderr, "Failed to query port props");
+			return 1;
+		}
+		ctx->transport = port_attr.transport;
+		
+		if (data.gid_index < 0 && ctx->transport == WvPortTransportRdmaoe) 
+				data.gid_index = 0;
+
 		data.my_dest.lid = pp_get_local_lid(ctx, data.ib_port);
-		if (!data.my_dest.lid) {
+		if (ctx->transport == WvPortTransportIb && !data.my_dest.lid) {
 			fprintf(stderr, "Local lid 0x0 detected. Is an SM running?\n");
 			return 1;
 		}
@@ -967,6 +995,12 @@ int __cdecl main(int argc, char *argv[])
 		data.my_dest.psn = rand() & 0xffffff;
 		data.my_dest.rkey = ctx->mr->rkey;
 		data.my_dest.vaddr = (uintptr_t) ctx->buf + ctx->size;
+
+		if (data.gid_index != -1) {
+			if (ibv_query_gid(ctx->context,(uint8_t)data.ib_port,data.gid_index,&data.my_dest.gid)) {
+				return -1;
+			}
+		}
 	
 		/* Create connection between client and server.
 		* We do it by exchanging data over a TCP socket connection. */
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/tests/perftest/rdma_lat/rdma_lat.c branches\mlx4/tests/perftest/rdma_lat/rdma_lat.c
--- trunk/tests/perftest/rdma_lat/rdma_lat.c	2011-09-13 09:15:19.914280700 -0700
+++ branches\mlx4/tests/perftest/rdma_lat/rdma_lat.c	2011-10-10 16:58:48.130592700 -0700
@@ -68,6 +68,7 @@ struct pingpong_context {
 	int                 tx_depth;
 	struct 				ibv_sge list;
 	struct 				ibv_send_wr wr;
+	uint8_t				transport;
 };
 
 struct pingpong_dest {
@@ -76,6 +77,7 @@ struct pingpong_dest {
 	int psn;
 	unsigned rkey;
 	unsigned long long vaddr;
+	union ibv_gid gid;
 };
 
 struct pp_data {
@@ -454,7 +456,7 @@ static struct pingpong_context *pp_init_
 {
 	struct pingpong_context *ctx;
 	struct ibv_device *ib_dev;
-	struct rdma_cm_id *cm_id;
+	struct rdma_cm_id *cm_id = NULL;
 	struct ibv_qp_init_attr attr;
 
 	ctx = malloc(sizeof *ctx);
@@ -626,7 +628,7 @@ static int pp_open_port(struct pingpong_
 	data->my_dest.lid = pp_get_local_lid(ctx, data->ib_port);
 	data->my_dest.qpn = ctx->qp->qp_num;
 	data->my_dest.psn = rand() & 0xffffff;
-	if (!data->my_dest.lid) {
+	if (ctx->transport == WvPortTransportIb && !data->my_dest.lid) {
 		fprintf(stderr, "Local lid 0x0 detected. Is an SM running?\n");
 		return -1;
 	}
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/tests/perftest/read_bw/read_bw.c branches\mlx4/tests/perftest/read_bw/read_bw.c
--- trunk/tests/perftest/read_bw/read_bw.c	2011-09-13 09:15:21.198409100 -0700
+++ branches\mlx4/tests/perftest/read_bw/read_bw.c	2011-10-10 16:58:49.352714900 -0700
@@ -68,6 +68,7 @@ struct pingpong_context {
 	int                 tx_depth;
 	struct ibv_sge      list;
 	struct ibv_send_wr  wr;
+	uint8_t				transport;
 };
 
 /****************************************************************************** 
@@ -92,7 +93,7 @@ static int set_up_connection(struct ping
 
 	// We do not fail test upon lid above RoCE.
 	if (user_parm->gid_index == -1) {
-		if (!my_dest->lid) {
+		if (ctx->transport == WvPortTransportIb && !my_dest->lid) {
 			fprintf(stderr,"Local lid 0x0 detected,without any use of gid. Is SM running?\n");
 			return -1;
 		}
@@ -288,6 +289,8 @@ static int pp_connect_ctx(struct pingpon
 		attr.path_mtu               = IBV_MTU_4096;
 		break;
 	}
+	if(ctx->transport == WvPortTransportRdmaoe) 
+		printf(" Using grh is forced due to the use of RoCE\n");
 	printf(" Mtu : %d\n", user_parm->mtu);
 	attr.dest_qp_num 	= dest->qpn;
 	attr.rq_psn 		= dest->psn;
@@ -496,6 +499,7 @@ int __cdecl main(int argc, char *argv[])
 	unsigned			       size = 65536;
 	int                        i = 0;
 	int                        no_cpu_freq_fail = 0;
+	struct ibv_port_attr       port_attr;
 
 	int all = 0;
 	const char *servername = NULL;
@@ -683,6 +687,15 @@ int __cdecl main(int argc, char *argv[])
 	if (!ctx)
 		return 1;
 
+	if (ibv_query_port(ctx->context,user_param.ib_port,&port_attr)) {
+		fprintf(stderr, "Failed to query port props");
+		return 1;
+	}
+	ctx->transport = port_attr.transport;
+
+	if (user_param.gid_index < 0 && ctx->transport == WvPortTransportRdmaoe) 
+			user_param.gid_index = 0;
+	
 	// Set up the Connection.
 	if (set_up_connection(ctx,&user_param,&my_dest)) {
 		fprintf(stderr," Unable to set up socket connection\n");
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/tests/perftest/read_lat/read_lat.c branches\mlx4/tests/perftest/read_lat/read_lat.c
--- trunk/tests/perftest/read_lat/read_lat.c	2011-09-13 09:15:21.731462400 -0700
+++ branches\mlx4/tests/perftest/read_lat/read_lat.c	2011-10-10 16:58:49.883768000 -0700
@@ -74,6 +74,7 @@ struct pingpong_context {
 	int tx_depth;
 	struct ibv_sge list;
 	struct ibv_send_wr wr;
+	uint8_t transport;
 };
 
 /*
@@ -100,7 +101,7 @@ static int set_up_connection(struct ping
 
 	// We do not fail test upon lid in RDMAoE/Eth conf.
 	if (use_i < 0) {
-		if (!my_dest->lid) {
+		if (ctx->transport == WvPortTransportIb && !my_dest->lid) {
 			fprintf(stderr,"Local lid 0x0 detected. Is an SM running? \n");
 			fprintf(stderr,"If you're running RMDAoE you must use GIDs\n");
 			return -1;
@@ -302,6 +303,8 @@ static int pp_connect_ctx(struct pingpon
 		attr.path_mtu               = IBV_MTU_4096;
 		break;
 	}
+	if(ctx->transport == WvPortTransportRdmaoe) 
+		printf(" Using grh is forced due to the use of RoCE\n");
 	printf("Mtu : %d\n", user_parm->mtu);
 	attr.dest_qp_num            = dest->qpn;
 	attr.rq_psn                 = dest->psn;
@@ -523,6 +526,7 @@ int __cdecl main(int argc, char *argv[])
 	struct perftest_parameters  user_param;
 	int                      no_cpu_freq_fail = 0;
 	struct pingpong_dest	 my_dest,rem_dest;
+	struct ibv_port_attr port_attr;
 
 	int all = 0;
 	const char *servername = NULL;
@@ -713,6 +717,15 @@ int __cdecl main(int argc, char *argv[])
 	if (!ctx)
 		return 8;
 
+	if (ibv_query_port(ctx->context,user_param.ib_port,&port_attr)) {
+		fprintf(stderr, "Failed to query port props");
+		return 1;
+	}
+	ctx->transport = port_attr.transport;
+
+	if (user_param.gid_index < 0 && ctx->transport == WvPortTransportRdmaoe) 
+			user_param.gid_index = 0;
+	
 	// Set up the Connection.
 	if (set_up_connection(ctx,&user_param,&my_dest)) {
 		fprintf(stderr," Unable to set up socket connection\n");
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/tests/perftest/send_bw/send_bw.c branches\mlx4/tests/perftest/send_bw/send_bw.c
--- trunk/tests/perftest/send_bw/send_bw.c	2011-09-13 09:15:19.403229600 -0700
+++ branches\mlx4/tests/perftest/send_bw/send_bw.c	2011-10-10 16:58:47.412520900 -0700
@@ -74,6 +74,7 @@ struct pingpong_context {
 	void               		*buf;
 	unsigned            	size;
 	uint64_t				*my_addr;
+	uint8_t					transport;
 };
 
 /****************************************************************************** 
@@ -184,7 +185,7 @@ static int set_up_connection(struct ping
 	// We do not fail test upon lid above RoCE.
 
 	if (user_parm->gid_index < 0) {
-		if (!my_dest->lid) {
+		if (ctx->transport == WvPortTransportIb && !my_dest->lid) {
 			fprintf(stderr,"Local lid 0x0 detected,without any use of gid. Is SM running?\n");
 			return -1;
 		}
@@ -414,6 +415,9 @@ static int pp_connect_ctx(struct pingpon
 		attr.path_mtu               = IBV_MTU_4096;
 		break;
 	}
+
+	if(ctx->transport == WvPortTransportRdmaoe) 
+		printf(" Using grh is forced due to the use of RoCE\n");
 	printf(" Mtu : %d\n", user_parm->mtu);
     attr.dest_qp_num   = dest->qpn;
 	attr.rq_psn        = dest->psn;
@@ -919,6 +923,7 @@ int __cdecl main(int argc, char *argv[])
 	int 						all = 0;
 	int							size_of_arr;
 	const char 					*servername = NULL;
+	struct ibv_port_attr 		port_attr;
 
 	// Pointer to The relevent function of run_iter according to machine type.
 	int (*ptr_to_run_iter_uni)(struct pingpong_context*,struct perftest_parameters*,int);
@@ -1194,6 +1199,15 @@ int __cdecl main(int argc, char *argv[])
 	if (!ctx)
 		return 1;
 
+	if (ibv_query_port(context,user_param.ib_port,&port_attr)) {
+		fprintf(stderr, "Failed to query port props");
+		return 1;
+	}
+	ctx->transport = port_attr.transport;
+
+	if (user_param.gid_index < 0 && ctx->transport == WvPortTransportRdmaoe) 
+			user_param.gid_index = 0;
+	
 	// Set up the Connection.
 	if (set_up_connection(ctx,&user_param,&my_dest,&mcg_params)) {
 		fprintf(stderr," Unable to set up socket connection\n");
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/tests/perftest/send_lat/send_lat.c branches\mlx4/tests/perftest/send_lat/send_lat.c
--- trunk/tests/perftest/send_lat/send_lat.c	2011-09-13 09:15:20.620351300 -0700
+++ branches\mlx4/tests/perftest/send_lat/send_lat.c	2011-10-10 16:58:48.774657100 -0700
@@ -77,6 +77,7 @@ struct pingpong_context {
 	struct ibv_ah	    *ah;
 	void                *buf;
 	int                 size;	
+	uint8_t				transport;
 };
 
 /****************************************************************************** 
@@ -159,7 +160,7 @@ static int set_up_connection(struct ping
 
 	// We do not fail test upon lid above RoCE.
 	if (user_parm->gid_index < 0) {
-		if (!my_dest->lid) {
+		if (ctx->transport == WvPortTransportIb && !my_dest->lid) {
 			fprintf(stderr,"Local lid 0x0 detected,without any use of gid. Is SM running?\n");
 			return -1;
 		}
@@ -344,6 +345,8 @@ static int pp_connect_ctx(struct pingpon
 	attr.dest_qp_num      = dest->qpn;
 	attr.rq_psn           = dest->psn;
 	attr.ah_attr.dlid     = dest->lid;
+	if(ctx->transport == WvPortTransportRdmaoe) 
+		printf(" Using grh is forced due to the use of RoCE\n");
 	printf("Mtu : %d\n", user_parm->mtu);
 	if (user_parm->connection_type==RC) {
 		attr.max_dest_rd_atomic     = 1;
@@ -692,6 +695,7 @@ int __cdecl main(int argc, char *argv[])
 	struct ibv_device          *ib_dev = NULL;
 	struct perftest_parameters user_param;
 	int                        no_cpu_freq_fail = 0;
+	struct ibv_port_attr       port_attr;
 
 	int all = 0;
 	const char *servername = NULL;
@@ -935,6 +939,15 @@ int __cdecl main(int argc, char *argv[])
 	if (!ctx)
 		return 8;
 
+	if (ibv_query_port(ctx->context,user_param.ib_port,&port_attr)) {
+		fprintf(stderr, "Failed to query port props");
+		return 1;
+	}
+	ctx->transport = port_attr.transport;
+
+	if (user_param.gid_index < 0 && ctx->transport == WvPortTransportRdmaoe) 
+			user_param.gid_index = 0;
+	
 	// Set up the Connection.
 	if (set_up_connection(ctx,&user_param,&my_dest,&mcg_params)) {
 		fprintf(stderr," Unable to set up socket connection\n");
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/tests/perftest/send_lat/SOURCES branches\mlx4/tests/perftest/send_lat/SOURCES
--- trunk/tests/perftest/send_lat/SOURCES	2011-09-13 09:15:20.612350500 -0700
+++ branches\mlx4/tests/perftest/send_lat/SOURCES	2011-10-10 16:58:48.765656200 -0700
@@ -24,4 +24,4 @@ TARGETLIBS =						\
 	$(SDK_LIB_PATH)\uuid.lib		\
 	$(TARGETPATH)\*\libibverbs.lib		\
 	$(TARGETPATH)\*\libibumad.lib		\
-	$(TARGETPATH)\*\complib.lib		\
+	$(TARGETPATH)\*\complib.lib
Only in trunk/tools/infiniband-diags/include: ibdiag_version.h.in
Only in trunk/ulp/libibmad: .cproject
Only in trunk/ulp/libibmad: .project
Only in trunk/ulp/libibmad: AUTHORS
Only in trunk/ulp/libibmad: autogen.sh
Only in trunk/ulp/libibmad: ChangeLog
Only in trunk/ulp/libibmad: configure.in
Only in trunk/ulp/libibmad: COPYING
Only in trunk/ulp/libibmad: gen_chlog.sh
Only in trunk/ulp/libibmad: libibmad.spec.in
Only in trunk/ulp/libibmad: libibmad.ver
Only in trunk/ulp/libibmad: Makefile.am
Only in trunk/ulp/libibmad: README
Only in trunk/ulp/libibnetdisc: libibnetdisc.ver
Only in trunk/ulp/libibnetdisc: Makefile.am
Only in trunk/ulp/libibnetdisc: man
Only in trunk/ulp/libibnetdisc/src: libibnetdisc.map
Only in trunk/ulp/libibnetdisc: test
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/ulp/libibverbs/examples/devinfo/devinfo.c branches\mlx4/ulp/libibverbs/examples/devinfo/devinfo.c
--- trunk/ulp/libibverbs/examples/devinfo/devinfo.c	2011-09-13 09:12:05.600851300 -0700
+++ branches\mlx4/ulp/libibverbs/examples/devinfo/devinfo.c	2011-10-10 16:57:16.905471100 -0700
@@ -252,6 +252,8 @@ static int print_hca_cap(struct ibv_devi
 		printf("\t\t\tsm_lid:\t\t\t%d\n", port_attr.sm_lid);
 		printf("\t\t\tport_lid:\t\t%d\n", port_attr.lid);
 		printf("\t\t\tport_lmc:\t\t0x%02x\n", port_attr.lmc);
+		printf("\t\t\ttransport:\t\t%s\n", port_attr.transport == WvPortTransportRdmaoe ? "RoCE" : 
+			port_attr.transport == WvPortTransportIb ? "IB" : "IWarp");
 
 		if (verbose) {
 			printf("\t\t\tmax_msg_sz:\t\t0x%x\n", port_attr.max_msg_sz);
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/ulp/libibverbs/include/infiniband/verbs.h branches\mlx4/ulp/libibverbs/include/infiniband/verbs.h
--- trunk/ulp/libibverbs/include/infiniband/verbs.h	2011-09-13 09:12:00.614352700 -0700
+++ branches\mlx4/ulp/libibverbs/include/infiniband/verbs.h	2011-10-10 16:57:14.417222300 -0700
@@ -189,6 +189,7 @@ struct ibv_port_attr
 	uint8_t				active_width;
 	uint8_t				active_speed;
 	uint8_t				phys_state;
+	uint8_t				transport;
 };
 

SH: verbs.h must be kept in sync with Roland's upstream verbs.h file.

 // Only device/port level events are currently supported.
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/ulp/libibverbs/src/device.cpp branches\mlx4/ulp/libibverbs/src/device.cpp
--- trunk/ulp/libibverbs/src/device.cpp	2011-09-13 09:12:01.250416300 -0700
+++ branches\mlx4/ulp/libibverbs/src/device.cpp	2011-10-10 16:57:14.864267000 -0700
@@ -131,7 +131,7 @@ struct ibv_device **ibv_get_device_list(
 	WV_DEVICE_ATTRIBUTES attr;
 	struct verbs_device *dev_array;
 	struct ibv_device **pdev_array;
-	NET64 *guid;
+	NET64 *guid = NULL;
 	SIZE_T size, cnt;
 	HRESULT hr;
 
SH: Is this needed?

diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/ulp/libibverbs/src/verbs.cpp branches\mlx4/ulp/libibverbs/src/verbs.cpp
--- trunk/ulp/libibverbs/src/verbs.cpp	2011-09-13 09:12:01.312422500 -0700
+++ branches\mlx4/ulp/libibverbs/src/verbs.cpp	2011-10-10 16:57:14.907271300 -0700
@@ -230,6 +230,7 @@ int ibv_query_port(struct ibv_context *c
 	port_attr->active_width = attr.ActiveWidth;
 	port_attr->active_speed = attr.ActiveSpeed;
 	port_attr->phys_state = attr.PhysicalState;
+	port_attr->transport = attr.Transport;
 
 	return 0;
 }
@@ -734,7 +735,7 @@ int ibv_post_send(struct ibv_qp *qp, str
 {
 	struct ibv_send_wr *cur_wr;
 	HRESULT hr = 0;
-	struct ibv_ah *ah;
+	struct ibv_ah *ah = NULL;
 
 	if ((qp->qp_type == IBV_QPT_UD) && (wr->next != NULL))
 		return ibvw_wv_errno(WV_NOT_SUPPORTED);

SH: Is this needed?

diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/ulp/nd/user/NdListen.cpp branches\mlx4/ulp/nd/user/NdListen.cpp
--- trunk/ulp/nd/user/NdListen.cpp	2011-09-13 09:14:19.866276500 -0700
+++ branches\mlx4/ulp/nd/user/NdListen.cpp	2011-10-10 16:57:26.106391100 -0700
@@ -187,8 +187,7 @@ HRESULT GetPdataForActive(
         ual_cep_listen_ioctl_t listen;
         listen.cid = 0;
 
-        listen.cep_listen.svc_id = 
-            0x0000000001000000 | Protocol << 16 | Port;
+        listen.cep_listen.svc_id = ib_cm_rdma_sid( (uint8_t) Protocol, Port );
 
         listen.cep_listen.port_guid = m_pParent->m_PortGuid;
 
@@ -244,7 +243,9 @@ HRESULT GetPdataForActive(
         ND_PRINT( TRACE_LEVEL_INFORMATION, ND_DBG_NDI,
             ("Created listen CEP with cid %d \n", m_cid ) );
 
-        // TODO: Come up with something better for port number.
+        //
+        // The following port calculation must match what is done in the kernel.
+        //
         if( Port == 0 )
             Port = (USHORT)m_cid | (USHORT)(m_cid >> 16);
 
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/ulp/netdirect/user/nd_provider.cpp branches\mlx4/ulp/netdirect/user/nd_provider.cpp
--- trunk/ulp/netdirect/user/nd_provider.cpp	2011-10-11 08:08:00.958344500 -0700
+++ branches\mlx4/ulp/netdirect/user/nd_provider.cpp	2011-10-10 16:57:13.954176000 -0700
@@ -102,7 +102,7 @@ QueryAddressList(SOCKET_ADDRESS_LIST* pA
 	for (cnt = 0, ai = res; ai; ai = ai->ai_next) {
 		if (SUCCEEDED(ai->ai_flags)) {
 			pAddressList->Address[cnt].iSockaddrLength = ai->ai_addrlen;
-			pAddressList->Address[cnt++].lpSockaddr = (LPSOCKADDR) offset;
+			pAddressList->Address[cnt].lpSockaddr = (LPSOCKADDR) offset;
 			RtlCopyMemory(offset, ai->ai_addr, ai->ai_addrlen);
 			offset += ai->ai_addrlen;
 		}
diff -up -r -X \users\mshefty\scm\ofw\gen1\trunk\docs\dontdiff.txt -I '\$Id' trunk/ulp/netdirect2/user/nd_provider.cpp branches\mlx4/ulp/netdirect2/user/nd_provider.cpp
--- trunk/ulp/netdirect2/user/nd_provider.cpp	2011-10-11 08:08:10.597308300 -0700
+++ branches\mlx4/ulp/netdirect2/user/nd_provider.cpp	2011-10-10 16:58:33.574137200 -0700
@@ -135,7 +135,7 @@ QueryAdapterAddressList(SOCKET_ADDRESS_L
 	for (cnt = 0, ai = res; ai; ai = ai->ai_next) {
 		if (SUCCEEDED(ai->ai_flags)) {
 			pAddressList->Address[cnt].iSockaddrLength = ai->ai_addrlen;
-			pAddressList->Address[cnt++].lpSockaddr = (LPSOCKADDR) offset;
+			pAddressList->Address[cnt].lpSockaddr = (LPSOCKADDR) offset;
 			RtlCopyMemory(offset, ai->ai_addr, ai->ai_addrlen);
 			offset += ai->ai_addrlen;
 		}

SH: I committed these last two changes separately.

Only in trunk/ulp/opensm/user/include/iba: ib_types.h_linux_3.3.11
Only in trunk/ulp: opensm-git
Only in trunk/ulp: osm.diff

_______________________________________________
ofw mailing list
ofw at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw



More information about the ofw mailing list