ptrace()-based jailing for Plash

Status: planning

Aim: replace ChrootSetuidJail (which has various limitations) with a ptrace-based process monitor.

The process monitor should disable a set of syscalls, and let others through. We are not trying to interpose syscalls, except for a few exceptions such as fork(). We will still be using PlashGlibc. Interposing on calls such as open() is subject to various race conditions and cannot be done securely using ptrace().

The PTRACE_SYSCALL_MASK extension would make ptrace() efficient for our purposes, but it is not included in the mainline kernel. See Ptrace for general discussion of limitations of ptrace().

Difficulties

execve()

The monitor cannot allow the execve() syscall through because it involves a filename. We may need to implement execve() in userspace using mmap(). See UserModeExec.

Implementing execve() in userspace would be easier if memory mappings could be altered by a second process. User Mode Linux has a patch to the host kernel to provide such a facility. What is its status?

Ostia added an fexecve() call to Linux. It is not clear how this worked. execve() needs to load both the executable and the dynamic linker. The latter is specified in the executable.

Calls using process IDs

Calls such as wait() and kill() will have to be proxied via a trusted server. We will have to implement a process ID namespace in user space and keep track of which processes are part of the sandbox. wait() is the most important call to implement.

Delivery of signals such as SIGCHLD would have to be simulated, and this needs to work with select()/pselect().

Would we implement job control and process groups?

Costs

There will be an extra process per sandbox, assuming the monitor process is kept separate from the existing server process.

Limitations

A process cannot be ptraced by multiple processes, so strace and gdb would not work inside the ptrace jail.

System calls to allow

Requiring special handling:

Not sure:

See list of all Linux syscalls

Alternatives

Linux seccomp patch: This leaves too few syscalls for our requirements, only read(), write(), close() and exit(). We need sendmsg()/recvmsg(), mmap(), among others.

lcall

At one point there was a (now-obsolete) system call mechanism called lcall (an alternative to the "int 0x80" syscall mechanism), which was not intercepted by ptrace. Apparently this was fixed by the User Mode Linux project.

See:

Tasks

PtraceJail (last edited 2008-07-20 13:52:28 by MarkSeaborn)