[ofw] crash in mlx4 driver
Leonid Keller
leonid at mellanox.co.il
Sun Mar 15 04:50:47 PDT 2009
See inline
> -----Original Message-----
> From: ofw-bounces at lists.openfabrics.org
> [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Sean Hefty
> Sent: Friday, March 13, 2009 9:47 PM
> To: Hefty, Sean; ofw at lists.openfabrics.org
> Subject: RE: [ofw] crash in mlx4 driver
>
> >static ib_api_status_t
> >mlnx_um_open(
> > IN const ib_ca_handle_t
> h_ca,
> > IN OUT ci_umv_buf_t* const
> >p_umv_buf,
> > OUT ib_ca_handle_t* const
> ph_um_ca
> >)
> >{
> > ib_api_status_t status;
> > mlnx_hca_t *p_hca = (mlnx_hca_t *)h_ca;
> > PFDO_DEVICE_DATA p_fdo = hca2fdo(p_hca);
> > struct ib_device *p_ibdev = hca2ibdev(p_hca);
> > struct ib_ucontext *p_uctx;
> > struct ibv_get_context_resp *p_uresp;
> >
> > HCA_ENTER(HCA_DBG_SHIM);
> >
> > // sanity check
> > ASSERT( p_umv_buf );
> > if( !p_umv_buf->command )
> > { // no User Verb Provider
> > p_uctx = cl_zalloc( sizeof(struct ib_ucontext) );
> > if( !p_uctx )
> > {
> > status = IB_INSUFFICIENT_MEMORY;
> > goto err_alloc_ucontext;
> > }
> > /* Copy the dev info. */
> > p_uctx->device = p_ibdev;
> > p_umv_buf->output_size = 0;
> > status = IB_SUCCESS;
> > goto done;
> > }
> >
> > // sanity check
> > if ( p_umv_buf->output_size < sizeof(struct
> ibv_get_context_resp) ||
> > !p_umv_buf->p_inout_buf) {
> > status = IB_INVALID_PARAMETER;
> > goto err_inval_params;
> > }
> >
> > status = ibv_um_open( p_ibdev, p_umv_buf, &p_uctx );
> > if (!NT_SUCCESS(status)) {
>
> This check leads to the crash in the mlx4 driver.
> ibv_um_open() returns ib_api_status_t. In this case,
> ibv_um_open is returning IB_ERROR (0x2b).
> NT_SUCCESS(0x2b) is true, which leads to the code executing
> beyond the if statement and p_uctx is invalid.
A good catch. Thank you.
>
> I will add a fix for this. The problem is now moved back to
> determining the earlier failure - either in the CQ overflow
> or ipoib's error handling.
>
> Note that the test leading to this crash is using sockets in
> a way that other test applications may not have been. It
> uses select() with nonblocking sockets and a larger FD set.
> I'm not sure if that's a related piece of data or not.
>
> - Sean
>
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>
More information about the ofw
mailing list