User-mode implementation of execve()
User-mode exec is an implementation of the execve() call that works by changing memory mappings from user space, rather than by using the kernel's execve() system call. This includes an ELF loader.
There is a prototype implementation in the scratch directory in SVN. This is mainly useful for PtraceJail.
User-mode exec must do the following things:
- Check for #! scripts
- Check whether the executable is ELF format (the common case)
- Close file descriptors that are marked as close-on-exec
- Unmap existing pages from the address space, except for the caller's code and some stack space
- The kernel's VDSO must also be left alone (need to get its address)
- Find currently mapped pages using /proc/self/maps
- Open and map the dynamic linker (if the executable specifies one)
- Map the executable
- Set up argv, envp and auxv on the stack
- Set stack pointer and jump to entry point
It's not possible to unmap all the code used by the caller. At least a page containing the exec code must remain.
Is it possible to shrink the stack? Is the stack usually mapped with MAP_GROWSDOWN? Note that the heap and stack are marked in /proc/self/maps with "[heap]" and "[stack]" so the kernel is presumably treating them as special cases.
- It appears that madvise()'s MADV_DONTNEED parameter can free stack pages, allowing the kernel to zero them. However this is a hint that the kernel can ignore, so we can't rely on it.
- libpthread has to map threads' stacks. It maps 8Mb by default, and sets the permissions on the bottom page to PROT_NONE in order to catch stack overflows. In contrast the initial stack mapped by the kernel does not have a fixed limit. In some systems the stack and the (brk()-allocated) heap can grow until they meet in the middle. Under Linux other mmap()'d regions can appear between the stack and heap.
Differences from kernel execve():
- Can't implement setuid executables
One nice side effect of a user mode execve() is that it can remove any limits on size of command line arguments that the kernel might normally impose.
Related work
QEMU includes an ELF loader for its "user mode emulation" (running individual emulated processes rather than whole machines). See http://cvs.savannah.nongnu.org/viewcvs/qemu/linux-user/?root=qemu.
grugq provides an implementation of a user-mode exec, described in "The Design and Implementation of Userland Exec".
rtldi implements an ELF loader for the purpose of chain loading another version of the dynamic linker (ld.so).
