Ideas
A page for ideas that don't have a better place to go.
Dependency coverage
Code coverage tracks which lines of code are executed (and sometimes, which branches are taken). Dependency coverage would track which dependencies of a component are used. For example, Firefox depends on libgtk, which contains various library and resource files. We can track whether those files are ever opened when we run Firefox.
Code coverage tells us what source code is not covered by automated tests; dependency coverage can tell us what components are not covered. Code coverage can alert us to dead code; dependency coverage can inform us about unused dependencies.
QEMU images
Could provide QEMU images for demonstrating Plash. Parts of Plash require root authority to install, such as ChrootSetuidJail; kernel modifications would also require root authority. If we ever have more of a POLA desktop, it would be useful to have a minimal system that boots up into that.
Killing sandboxed processes
One of the Nix papers points out: "Ensure that there are no processes left running under the uid selected for the builder. On modern Unix systems this can be done by performing a kill(-1, SIGKILL) operation while executing under that uid, which has the effect of sending the KILL signal to all processes executing under uid."
With ChrootSetuidJail, we could use this to kill all processes running under a particular sandbox.
Using Caja with Plash
Using an object-capability language like Caja (a tamed Javascript) could help make Plash's package system a lot more flexible.
Plash implements a lot of the behaviour in the TCB. For Debian packages, it does interpreting of APT source lines, combining repositories, choosing deb dependencies and initialising packages. It has a fixed file namespace. It interprets some fixed file formats: the Debian Packages index file and the .pkg file format.
Describing data structures in Caja would be handy. Linked structures would be easier.
The copy-on-write arrangement is set up by Plash, but it could be set up by untrusted Caja.
Using Caja instead of Python solves a bootstrapping problem. Most languages would have to be run in a separate process, and we would have to fetch all dependencies using the same package system.
When I did the most work on the package system, in December 2006, Caja didn't exist. Meanwhile, E's promises system raises the question of how to integrate it with Plash.
See also: CapPython
Deallocating memory
There are trade-offs involved in returning pages to the OS. If you repeatedly free pages only to allocate new pages, the OS has to zero them each time. There is also the system call overhead of invoking the kernel and unmapping the pages. A possible fix is for the process to declare which pages it no longer needs by setting some fields in its address space. The kernel (or a keeper process) could read these fields when it needs to allocate pages elsewhere.
This could be applied to the stack, depending on conventions for use of the stack pointer. I think that dirtied stack pages never get deallocated under Linux until the process (or thread) exits.
GNU make
Makefiles are really complex, especially glibc's. It is really difficult to understand or get information out of them. Maybe we could get more help from GNU make by adding some Python extensions in useful places.
Print more concise output. Add a Python hook for running each (expanded) shell command, into which make can pass the target being built.
List all available targets, or maybe just the .PHONY ones, along with where they are defined. Without this it's very hard to find what targets are available.
Collect all commands so that they can be run and re-run independently of make (which is slow for glibc). This could record precise dependency information by using Plash with logging.
Somehow extract a list of glibc's test cases from make check.
Would need to cope with recursive make invocations.
Accessing other processes' FD tables
Suppose there was a way to read and write another process's FD table. Then we could make a sandbox based on ptrace() where the open() syscall is caught and the FD is granted by writing the FD into the process table.
We would be doing emulation at the kernel syscall ABI level. Advantages:
No need for PlashGlibc. Statically linked executables would work.
- /proc/self/fd/N and /proc/self more generally would be much easier to do. We would know what process is doing the requesting.
- No problem with async-signal-safety.
- No problems of races between threads or of sharing FDs between libc.so and ld.so.
We would still need to find a way to do execve() safely with ptrace(). If there is a way to change another process's memory mappings, a user space implementation of execve() can use that.
How to implement /proc/PID and /proc/self
The kernel is already providing a "same sandbox?" check based on the sandbox's UID which it performs when sandboxed processes use kill() or ptrace(). We could implement a similar check on PIDs in user space by reading the UID/GID in /proc/PID and use it to implement a filtered view of /proc. We could rebind /proc/self whenever a process does fsop_fork. This would require that run-as-anonymous tells us the UID of the sandbox it has created.
Problems:
- Looking up PIDs in /proc is racy because PIDs can be reused after processes exit. Maybe we can partly get around this by holding onto a directory FD for /proc/PID (does that give the right behaviour?).
Most of /proc/PID for a sandboxed process isn't accessible to the ServerProcess running under the user's UID.
Use User Mode Linux
For programs that don't run under Plash for whatever reason (e.g. use of /proc), we could run them under User Mode Linux, which would in turn run in a Plash sandbox.
Limitations: slow start-up time for UML?
