[ofw] crash in mlx4 driver

Leonid Keller leonid at mellanox.co.il
Sun Mar 15 04:50:47 PDT 2009


See inline 

> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org 
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Sean Hefty
> Sent: Friday, March 13, 2009 9:47 PM
> To: Hefty, Sean; ofw at lists.openfabrics.org
> Subject: RE: [ofw] crash in mlx4 driver
> 
> >static ib_api_status_t
> >mlnx_um_open(
> >	IN		const	ib_ca_handle_t			
> 	h_ca,
> >	IN	OUT			ci_umv_buf_t* const
> >p_umv_buf,
> >		OUT			ib_ca_handle_t* const	
> 	ph_um_ca
> >)
> >{
> >	ib_api_status_t		status;
> >	mlnx_hca_t			*p_hca = (mlnx_hca_t *)h_ca;
> >	PFDO_DEVICE_DATA p_fdo = hca2fdo(p_hca);
> >	struct ib_device *p_ibdev = hca2ibdev(p_hca);
> >	struct ib_ucontext *p_uctx;
> >	struct ibv_get_context_resp *p_uresp;
> >
> >	HCA_ENTER(HCA_DBG_SHIM);
> >
> >	// sanity check
> >	ASSERT( p_umv_buf );
> >	if( !p_umv_buf->command )
> >	{ // no User Verb Provider
> >		p_uctx = cl_zalloc( sizeof(struct ib_ucontext) );
> >		if( !p_uctx )
> >		{
> >			status = IB_INSUFFICIENT_MEMORY;
> >			goto err_alloc_ucontext;
> >		}
> >		/* Copy the dev info. */
> >		p_uctx->device = p_ibdev;
> >		p_umv_buf->output_size = 0;
> >		status = IB_SUCCESS;
> >		goto done;
> >	}
> >
> >	// sanity check
> >	if ( p_umv_buf->output_size < sizeof(struct 
> ibv_get_context_resp) ||
> >		!p_umv_buf->p_inout_buf) {
> >		status = IB_INVALID_PARAMETER;
> >		goto err_inval_params;
> >	}
> >
> >	status = ibv_um_open( p_ibdev, p_umv_buf, &p_uctx );
> >	if (!NT_SUCCESS(status)) {
> 
> This check leads to the crash in the mlx4 driver.  
> ibv_um_open() returns ib_api_status_t.  In this case, 
> ibv_um_open is returning IB_ERROR (0x2b).
> NT_SUCCESS(0x2b) is true, which leads to the code executing 
> beyond the if statement and p_uctx is invalid.

A good catch. Thank you.

> 
> I will add a fix for this.  The problem is now moved back to 
> determining the earlier failure - either in the CQ overflow 
> or ipoib's error handling.
> 
> Note that the test leading to this crash is using sockets in 
> a way that other test applications may not have been.  It 
> uses select() with nonblocking sockets and a larger FD set.  
> I'm not sure if that's a related piece of data or not.
> 
> - Sean
> 
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
> 



More information about the ofw mailing list