[openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation
adi at hexapodia.org
Mon Apr 25 12:11:11 PDT 2005
On Sat, Apr 23, 2005 at 07:44:21PM -0700, Andrew Morton wrote:
> Timur Tabi <timur.tabi at ammasso.com> wrote:
> > As I said, the testcase only works with our hardware, and it's also
> > very large. It's one small test that's part of a huge test suite.
> > It takes a couple hours just to install the damn thing.
> > We want to produce a simpler test case that demonstrates the problem in an
> > easy-to-understand manner, but we don't have time to do that now.
> If your theory is correct then it should be able to demonstrate this
> problem without any special hardware at all: pin some user memory, then
> generate memory pressure then check the contents of those pinned pages.
> But if, for the DMA transfer, you're using the array of page*'s which were
> originally obtained from get_user_pages() then it's rather hard to see how
> the kernel could alter the page's contents.
> Then again, if mlock() fixes it then something's up. Very odd.
Libor Michalek posted a much more reasonable (to my limited
understanding) bug description in <20050412180447.E6958 at topspin.com>.
(And I'd love to provide a URL, but damned if I can figure out how to
find that message on gmane. Clue-bat applications gladly accepted.)
Libor Michalek wrote:
# The driver did use get_user_pages() to elevated the refcount on all the
# pages it was going to use for IO, as well as call set_page_dirty() since
# the pages were going to have data written to them from the device.
# The problem we were seeing is that the minor fault by the app resulted
# in a new physical page getting mapped for the application. The page that
# had the elevated refcount was still waiting for the data to be written
# to by the driver at the time that the app accessed the page causing the
# minor fault. Obviously since the app had a new mapping the data written
# by the driver was lost.
# It looks like code was added to try_to_unmap_one() to address this, so
# hopefully it's no longer an issue...
Which makes me think that Timur's bug is just an
insufficiently-understood version of Libor's.
More information about the general