[ofw] crash in mlx4 driver

Sean Hefty sean.hefty at intel.com
Fri Mar 13 12:47:15 PDT 2009


>static ib_api_status_t
>mlnx_um_open(
>	IN		const	ib_ca_handle_t				h_ca,
>	IN	OUT			ci_umv_buf_t* const
>p_umv_buf,
>		OUT			ib_ca_handle_t* const		ph_um_ca
>)
>{
>	ib_api_status_t		status;
>	mlnx_hca_t			*p_hca = (mlnx_hca_t *)h_ca;
>	PFDO_DEVICE_DATA p_fdo = hca2fdo(p_hca);
>	struct ib_device *p_ibdev = hca2ibdev(p_hca);
>	struct ib_ucontext *p_uctx;
>	struct ibv_get_context_resp *p_uresp;
>
>	HCA_ENTER(HCA_DBG_SHIM);
>
>	// sanity check
>	ASSERT( p_umv_buf );
>	if( !p_umv_buf->command )
>	{ // no User Verb Provider
>		p_uctx = cl_zalloc( sizeof(struct ib_ucontext) );
>		if( !p_uctx )
>		{
>			status = IB_INSUFFICIENT_MEMORY;
>			goto err_alloc_ucontext;
>		}
>		/* Copy the dev info. */
>		p_uctx->device = p_ibdev;
>		p_umv_buf->output_size = 0;
>		status = IB_SUCCESS;
>		goto done;
>	}
>
>	// sanity check
>	if ( p_umv_buf->output_size < sizeof(struct ibv_get_context_resp) ||
>		!p_umv_buf->p_inout_buf) {
>		status = IB_INVALID_PARAMETER;
>		goto err_inval_params;
>	}
>
>	status = ibv_um_open( p_ibdev, p_umv_buf, &p_uctx );
>	if (!NT_SUCCESS(status)) {

This check leads to the crash in the mlx4 driver.  ibv_um_open() returns
ib_api_status_t.  In this case, ibv_um_open is returning IB_ERROR (0x2b).
NT_SUCCESS(0x2b) is true, which leads to the code executing beyond the if
statement and p_uctx is invalid.

I will add a fix for this.  The problem is now moved back to determining the
earlier failure - either in the CQ overflow or ipoib's error handling.

Note that the test leading to this crash is using sockets in a way that other
test applications may not have been.  It uses select() with nonblocking sockets
and a larger FD set.  I'm not sure if that's a related piece of data or not.

- Sean




More information about the ofw mailing list