[openib-general] Re: [PATCH][RFC][0/4] InfiniBand userspace verbs implementation
Andy Isaacson
adi at hexapodia.org
Thu Apr 21 10:38:21 PDT 2005
On Wed, Apr 20, 2005 at 10:07:45PM -0500, Timur Tabi wrote:
> Troy Benjegerdes wrote:
> >Someone (aka Tospin, infinicon, and Amasso) should probably post a patch
> >adding '#define VM_REGISTERD 0x01000000', and some extensions to
> >something like 'madvise' to set pages to be registered.
> >
> >My preference is said patch will also allow a way for the kernel to
> >reclaim registered memory from an application under memory pressure.
>
> I don't know if VM_REGISTERED is a good idea or not, but it should be
> absolutely impossible for the kernel to reclaim "registered" (aka pinned)
> memory, no matter what. For RDMA services (such as Infiniband, iWARP, etc),
> it's normal for non-root processes to pin hundreds of megabytes of memory,
> and that memory better be locked to those physical pages until the
> application deregisters them.
If you take the hardline position that "the app is the only thing that
matters", your code is unlikely to get merged. Linux is a
general-purpose OS.
I don't think that Troy was suggesting the kernel should deregister
memory without notifying the application. Personally, I envision
something like the NetBSD Scheduler Activations (SA) work, where the
kernel can notify the app of changes to its state in a very efficient
manner. (According to the NetBSD design whitepaper, the kernel does an
upcall whenever the multithreaded app gains or loses a CPU!)
In a Linux context, I doubt that fullblown SA is necessary or
appropriate. Rather, I'd suggest two new signals, SIGMEMLOW and
SIGMEMCRIT. The userland comms library registers handlers for both.
When the kernel decides that it needs to reclaim some memory from the
app, it sends SIGMEMLOW. The comms library then has the responsibility
to un-reserve some memory in an orderly fashion. If a reasonable [1]
time has expired since SIGMEMLOW and the kernel is still hungry, the
kernel sends SIGMEMCRIT. At this point, the comms lib *must* unregister
some memory [2] even if it has to drop state to do so; if it returns
from the signal handler without having unregistered the memory, the
kernel will SIGKILL.
[1] Part of the interface spec should cover the expectation as to how
long the library is allowed to take; I'd guess that 2 timeslices
should suffice.
[2] Is there a way for the kernel to pass down to userspace how many
pages it wants, maybe in the sigcontext?
> If kernel really thinks it needs to unpin those pages, then at the very
> least it should kill the process, and the syslog better have a very clear
> message indicating why. That way, the application doesn't continue
> thinking that everything's still going to work. If those pages become
> unpinned, the applications are going to experience serious data corruption.
You might want to consider what happens with your communication system
in a machine running power-saving modes (in the limit, suspend-to-disk).
Of course most machines with Infiniband adapters aren't running swsusp,
but it's not inconceivable that blade servers might sleep to lower power
and cooling costs.
-andy
More information about the general
mailing list