[openib-general] kernel oops
Hal Rosenstock
halr at voltaire.com
Tue Aug 30 16:36:52 PDT 2005
On Tue, 2005-08-30 at 12:04, Sean Hefty wrote:
> Hal Rosenstock wrote:
> > Why would ib_at_paths_by_route be called if no route were obtained (from
> > ib_at_route_by_ip) ? Isn't that a ucmpost issue ? (I also agree it's not
> > good for UAT to crash).
>
> The assumption that I made was that the call to ib_at_route_by_ip()
> would fail if given an invalid route.
That seems reasonable (but I haven't tried this but will once I get some
spare cycles).
> Also, since ucmpost is a simple
> test app designed more to test the CM than AT, I kept error testing to a
^^^^^^^
handling
> minimum.
>
> > It needs to be a valid route struct. I'm not sure how the kernel can
> > validate that is the case. It does check for NULL pointer but this is
> > bad pointer.
>
> Struct ib_at_ib_route should probably change the struct ibv_device
> *out_dev field. It looks like this field is actually set to a struct
> ib_device * that is a kernel pointer.
Ah, that's the kernel pointer you were referring to. [I missed that
before.]
> Can we just remove this field and
> use the sgid to locate the correct device structure in the kernel, or
> fail if it cannot be located?
That seems like a good idea.
> >>The AT code appears to passing a kernel pointer up to the userspace app,
> >>and then requires that pointer to be passed back to the kernel. This
> >>Needs to be changed to pass up some identifier that can be validated on
> >>the return to the kernel.
> >
> > Isn't it copying the ib_route structure to userspace ?
>
> Yes - but that contains the kernel device pointer. And looking at it
> more, the ABI contains pointers in the data structures. This should
> cause problems with 32-bit apps running on 64-bit kernels.
>
> I'm not sure how desirable it is to fix these issues versus moving to
> whatever the new CM abstraction API is.
Won't AT still be needed under the new CM abstraction for IB ? I guess
the answer is unclear. It still seems to me that it should be fixed
until there is something else to take its place. Do you concur ?
-- Hal
More information about the general
mailing list