[openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation
Bodo Eggert <harvested.in.lkml@posting.7eggert.dyndns.org>
7eggert at gmx.de
Fri Apr 22 06:10:09 PDT 2005
Andy Isaacson <adi at hexapodia.org> wrote:
> On Wed, Apr 20, 2005 at 10:07:45PM -0500, Timur Tabi wrote:
>> I don't know if VM_REGISTERED is a good idea or not, but it should be
>> absolutely impossible for the kernel to reclaim "registered" (aka pinned)
>> memory, no matter what. For RDMA services (such as Infiniband, iWARP, etc),
>> it's normal for non-root processes to pin hundreds of megabytes of memory,
>> and that memory better be locked to those physical pages until the
>> application deregisters them.
>
> If you take the hardline position that "the app is the only thing that
> matters", your code is unlikely to get merged. Linux is a
> general-purpose OS.
All userspace hardware drivers with DMA will require pinned pages (and some
of them will require continuous memory). Since this memory may be scheduled
to be accessed by DMA, reclaiming those pages may (aka. will) result in
"random" memory corruption unless done by the driver itself.
You can't even set a time limit, the driver may have allocated all DMA
memory to queued transfers, and some media needs to get plugged in by
the lazy robot. As soon as the robot arrives - boom. (For the same reason,
this memory MUST NOT be freed if the application terminates abnormally,
e.g. killed by OOM).
In other words, you need to make this memory as unaccessible as the
framebuffer on a graphic card. If that causes a lockup, you better had
prevented that while allocating.
> In a Linux context, I doubt that fullblown SA is necessary or
> appropriate. Rather, I'd suggest two new signals, SIGMEMLOW and
> SIGMEMCRIT. The userland comms library registers handlers for both.
> When the kernel decides that it needs to reclaim some memory from the
> app, it sends SIGMEMLOW. The comms library then has the responsibility
> to un-reserve some memory in an orderly fashion. If a reasonable [1]
> time has expired since SIGMEMLOW and the kernel is still hungry, the
> kernel sends SIGMEMCRIT. At this point, the comms lib *must* unregister
> some memory [2] even if it has to drop state to do so; if it returns
> from the signal handler without having unregistered the memory, the
> kernel will SIGKILL.
Choosing Data loss vs. finitely stalled system may sometimes be a bad
decision.
If I designes an application that might get a "gimme memory or die",
I'd reserve an extra bunch of memory with the only purpose of being
released in this situation. If the kernel had done that instead, this
part of memory could have been used e.g. as a read-only disk cache in
the meantime (off cause provided somebody cared to implement that).
> [2] Is there a way for the kernel to pass down to userspace how many
> pages it wants, maybe in the sigcontext?
Then you'd need only one signal.
I think this interface is usefull, it would e.g. allow a picture viewer
to cache as many decoded and scaled pictures as the RAM permits, freeing
them if the RAM gets full and the swap would have to be used.
--
"When the pin is pulled, Mr. Grenade is not our friend.
-U.S. Marine Corps
More information about the general
mailing list