Seccomp sandbox
The seccomp sandbox developed for Chromium could be used for Plash and has the potential to solve these problems:
It removes the need to build a modified version of glibc (PlashGlibc). Instead, the seccomp sandbox patches glibc at runtime in order to intercept the syscalls it executes. The seccomp sandbox works at the level of the Linux kernel's syscall ABI (as in InterceptingSystemCalls).
It does not require a setuid helper program (SetuidChrootJail), which requires root privileges to install.
- It can be more secure, since it allows individual syscalls to be disabled or intercepted. For example, we would be able to block network access. We can prevent zero page from being mapped in order to reduce exposure to kernel bugs.
- It can allow restricted access to /proc (e.g. /proc/self/maps), so we can support sandboxing programs that rely on this.
Down sides:
- Performance overhead: Seccomp sandbox requires most syscalls to be forwarded to a trusted thread to execute.
- Worse, some syscalls, including mmap(), must be checked by a trusted process.
- We could ameliorate this for sendmsg()+recvmsg() by combining this pair into a single virtual syscall that is sent to the trusted thread as a single message.
- Complexity: Seccomp sandbox's trusted thread is implemented as a big chunk of assembly code, since it cannot trust the contents of memory. It must be written to use registers instead of relying on the stack as compiled C code usually does.
Changes involved
The current seccomp sandbox is suitable for programs that partially distrust themselves. It allows them to drop authority by entering sandbox mode some time after startup. (This is similar to the model adopted in FreeBSD-Capsicum.) The startup code, which loads libraries and opens files, is unsandboxed. This is not Plash's model, however. We want to sandbox the glibc dynamic loader's startup too.
This means the seccomp sandbox has to be the first thing that runs. It cannot depend on glibc. It has to load ld.so, and it has to support the syscalls that ld.so executes on startup. It cannot patch an already-loaded ld.so and libc.so. It cannot really identify when libc.so is loaded, so we must ensure that ld.so and libc.so are pre-patched.
We will have two new components, both of which can be sandboxed:
- The library patcher. This might be run on untrusted input (depending on how we arrange for the patching to happen), so we should sandbox it.
- The ELF loader. Again, this might be run on untrusted input if the sandboxed program is able to supply its own ld.so.
Tasks
Seccomp sandbox maintenance:
Add tests to seccomp sandbox
- Add tests for signal handling
Write a reference implementation of the trusted thread in C, to make it easier to understand what the trusted thread does and to change it
- Speed up sendmsg()/recvmsg() by not sending them via the trusted process. On i386, this requires that the trusted thread check two registers instead of just one.
- Add ability to configure the sandbox at run time, e.g. enable/disable syscalls.
- Move the assembly code in trusted_thread.cc into .S files. Also split the i386/x86-64 versions into separate files.
- This means debugging (line number) info will work.
- This will make it easier to compare the i386 and x86-64 versions side-by-side, e.g. with Meld, especially if they use same label names.
- This will make it possible to use preprocessor constants, e.g. for syscall numbers and TLS offsets.
- Gyp build:
- Move the Gyp description into the seccomp directory. Then it can be synced to the out-of-tree version.
- Add the tests to the Gyp build.
Changes for Plash/seccomp:
- Write code for creating patched libc.so and ld-linux.so files
- Initial version can just replace "int $0x80" with "int $0". This is easy because both are two bytes long. It won't be efficient because every call will be run via a signal handler.
- Later we will want to insert jumps (5 bytes) to code that is appended to the ELF object.
- Hook up the sandbox to an ELF loader
- Implement a handler for the set_thread_area() syscall. ld.so calls this on startup to set up TLS. Simply forwarding this to the trusted thread to execute is no good, because it will change the wrong thread's state. To get around this, the calling thread exits, and the trusted thread creates a new thread which executes set_thread_area() and then picks up where the calling thread left off (by restoring registers).
- Port Plash's libc (client) code to the seccomp sandbox
- Change Plash's server code to support seccomp-sandboxed processes
- Change the seccomp sandbox to support being statically linked and not use glibc at all
- We can remove the need to read /proc/self/maps to discover existing mappings. The seccomp starter will know where its own code and other trusted mappings reside without querying /proc/self/maps, since it is self-contained and does not rely on glibc.
- This will mean that we could stack this seccomp-based sandbox with a setuid+chroot sandbox.
- Implement fork(), execve(), waitpid() and kill()
- fork(): Needs to spawn a new trusted thread and a new trusted process. Needs to remap the regions that are shared with the trusted process.
- execve(): Needs to invoke the sandbox helper process (e.g. /usr/bin/plash-seccomp-sandbox).
Infrastructure:
- Need a way to pull in a known-good revision of the seccomp-sandbox tree (like with gclient's DEPS files). Don't want to make a copy of seccomp-sandbox in the Plash tree, although including a copy will be fine for distributing a tarball.
Need a way to run tests on both the PlashGlibc and seccomp versions of Plash. Need a way to disable tests that don't work on seccomp yet.
For later:
- Restrict network access by proxying access to connect()/bind()
- Implement resource accounting
- Limit number of processes
- Limit amount of memory mapped
