Plash: tools for practical least privilege

Issues

Security vulnerabilities

connect() race condition

Problem: connect() on Unix domain sockets follows symlinks, and there is no way to switch this off.

filesys-obj-real.c calls lstat() to determine whether a directory entry is a symlink. If not, real_file_socket_connect() can be used; this will call connect(). Between lstat() and connect(), another process might replace the socket with a symlink.

Applicability: An adversary, A, in one sandbox cannot exploit this on its own; it requires a conspirator, B. A and B can conspire so that A gets access to an arbitrary socket S that is in A's server's namespace but not in A's namespace. A and B must have write access to some common directory. B does not have to have access to socket S (it only needs to know its pathname in A's namespace). This exploit can only occur if B is not in the same sandbox as A.

Exploit: see tests/socket-symlink-race.

Note: bind() does not follow symlinks. It behaves like open() with O_CREAT|O_EXCL. (There may be some other Unix variants where this is not true, perhaps including old versions of Linux.)

Possible solutions:

It turns out that link() doesn't work between directories that are on the same device but have been mounted using "mount --bind".

I thought it might be possible to open() the domain socket and then do connect() on /proc/self/fd/N (which would effectively operate on the inode rather than the file descriptor). However, open() does not work on domain sockets.

We could have a setuid tool for doing connect() that does the following:

Problems:

Another possibility is to have a lock that is shared between Plash servers, to ensure that no server creates a symlink while another is in the process of connecting to a domain socket. * The lock would have to be per-user, rather than system-wide, otherwise one user could deny service to another by holding the lock indefinitely. This means the symlink race could be exploited (only) by conspiring programs running under different users' Plash environments, with write or connect access to a common directory. * This doesn't protect against symlinks created programs not running under Plash. * The lock would need to be held around rename() calls, because a symlink can be put in place using rename(). * Could use a flock() lock, stored under /tmp/plash-<uid>.

Hard linking won't work on read-only filesystems, but that's okay, because you can't create domain sockets on those in the first place.

See <http://cert.uni-stuttgart.de/archive/bugtraq/1999/10/msg00011.html>: SSH authentication agent follows symlinks via a UNIX domain socket

chmod() race condition

There is a similar symlink race condition in using chmod(), which follows symlinks.

utimes() is similar to chmod(): glibc exports an lutimes() function, but it isn't implemented (it always returns ENOSYS). We use open() and futimes(). futimes() uses /proc/self/fd/N.

Running pola-shell as root

When X11 access is enabled, /tmp/.X11-unix is mapped as a writable slot. It should be a writable object in a read-only slot.

Bugs

stat64() doesn't work properly

The server processes are included as part of the job with the client processes in the job. The server has the same process group ID, and the shell will wait for it. This is convenient (for printing the exit status), but wrong. If the user presses Ctrl-C, and the client handles SIGINT and survives, the server will still be killed, but the client will become mostly useless.

libc's object-based execve() ignores the close-on-exec flag

Shell: build-fs.c: If you have the command "cmd foo", and `foo' is a symlink, the symlink will be followed and the shell will also grant access to the destination of the link. If you have the command "cmd => foo", the symlink is not followed. This is inconsistent. Actually, I have realised that following the symlink is not good from a security point of view.

Aspects that need more testing

libc thread safety

Might be problems in future

Re-entrancy: run_server_step() is called while waiting for a reply on a return continuation object. It will handle incoming requests -- these should be queued instead. I don't think this actually causes any bugs, since there are no TOCTTOU problems in the code. (There aren't really any invariants that are broken during a method call.)

No resource accountability (not really a bug)

Make sure that messages are encoded and decoded properly on 64-bit and other-endian machines. Currently I assume sizeof(int) == 4.

Sending on a socket is never queued. This could lead to DoS of servers. It could potentially lead to deadlocks, if both ends of a connection send at the same time (this doesn't happen at the moment because all connections are client-server and call-return).

There may be cases where libc calls should preserve errno but don't.

Behaviour that might need changing:

build-fs.c attaches copies of symlinks into processes' file namespaces, so the process won't see them change when they change in the real filesystem. This may not be expected. Actually, symlinks are immutable and the inode would change if you replaced one.

Problems running specific programs

GNU Emacs (resolved)

When run under Plash, GNU Emacs 21 prints the following and exits:

emacs: Memory exhausted--use M-x save-some-buffers RET

The fault lies with GNU Emacs; it has been fixed in CVS (not yet released as GNU Emacs 22).

The problem also occurs if you do:

/lib/ld-linux.so.2 /usr/bin/emacs

which is what Plash does internally.

The problem is that the use of address space changes when you invoke ld-linux.so.2 directly: the brk() syscall changes where it allocates memory from. brk() starts allocating from after the BSS (zero-initialised) segment of the executable that was invoked by exec(). For normal executables this is after 0x08000000. But ld-linux.so.2 gets loaded at 0x80000000, so brk() follows from somewhere after that, regardless of what executable ld-linux.so.2 subsequently loads.

Emacs allocates memory using malloc(), which uses brk(), and so it gets an address with one of the top 4 bits set, which it can't handle.

I would guess that Emacs' use of the top 4 bits hasn't changed but rather Emacs 22 uses mmap() to allocate memory rather than malloc().

This issue is also mentioned in: http://www.cs.berkeley.edu/~nks/fig/fig.ps

Konqueror (resolved)

Qt: 3.3.3 KDE: 3.3.2 Konqueror: 3.3.2

Konqueror has a problem starting up seemingly related to fam (a File Alteration Monitor). If it connects to fam's TCP port but then fails to connect to the Unix domain socket that fam creates in response, Konqueror fails (actually, kded fails).

Solution: disable the fam daemon.

XEmacs

Running a subprocess from XEmacs would give:

sendmsg: Bad file descriptor
recvmsg: Bad file descriptor
[2622] cap-protocol: [fd 5] to-server: connection error, errno 9 (Bad file descriptor)
Can't exec program /bin/sh
Process shell exited abnormally with code 1

XEmacs was closing the file descriptor that Plash uses.

This is fixed: Plash's libc will refuse to close that file descriptor.