[ofw] crash in mlx4 driver
Sean Hefty
sean.hefty at intel.com
Fri Mar 13 12:47:15 PDT 2009
>static ib_api_status_t
>mlnx_um_open(
> IN const ib_ca_handle_t h_ca,
> IN OUT ci_umv_buf_t* const
>p_umv_buf,
> OUT ib_ca_handle_t* const ph_um_ca
>)
>{
> ib_api_status_t status;
> mlnx_hca_t *p_hca = (mlnx_hca_t *)h_ca;
> PFDO_DEVICE_DATA p_fdo = hca2fdo(p_hca);
> struct ib_device *p_ibdev = hca2ibdev(p_hca);
> struct ib_ucontext *p_uctx;
> struct ibv_get_context_resp *p_uresp;
>
> HCA_ENTER(HCA_DBG_SHIM);
>
> // sanity check
> ASSERT( p_umv_buf );
> if( !p_umv_buf->command )
> { // no User Verb Provider
> p_uctx = cl_zalloc( sizeof(struct ib_ucontext) );
> if( !p_uctx )
> {
> status = IB_INSUFFICIENT_MEMORY;
> goto err_alloc_ucontext;
> }
> /* Copy the dev info. */
> p_uctx->device = p_ibdev;
> p_umv_buf->output_size = 0;
> status = IB_SUCCESS;
> goto done;
> }
>
> // sanity check
> if ( p_umv_buf->output_size < sizeof(struct ibv_get_context_resp) ||
> !p_umv_buf->p_inout_buf) {
> status = IB_INVALID_PARAMETER;
> goto err_inval_params;
> }
>
> status = ibv_um_open( p_ibdev, p_umv_buf, &p_uctx );
> if (!NT_SUCCESS(status)) {
This check leads to the crash in the mlx4 driver. ibv_um_open() returns
ib_api_status_t. In this case, ibv_um_open is returning IB_ERROR (0x2b).
NT_SUCCESS(0x2b) is true, which leads to the code executing beyond the if
statement and p_uctx is invalid.
I will add a fix for this. The problem is now moved back to determining the
earlier failure - either in the CQ overflow or ipoib's error handling.
Note that the test leading to this crash is using sockets in a way that other
test applications may not have been. It uses select() with nonblocking sockets
and a larger FD set. I'm not sure if that's a related piece of data or not.
- Sean
More information about the ofw
mailing list