Intercepting system calls: working at the kernel syscall ABI level
Proposal: Change Plash to work at the level of the kernel system call ABI, not the glibc ABI, so that PlashGlibc is no longer necessary.
Most importantly, this means we don't need to track the glibc implementation any more. We deal with an interface instead, and we can describe that interface (structs, argument lists, etc.) how we like; we can refactor the description language.
- It's easy to unit test that we have got the ABI description right: just make libc calls and check that the syscalls we see match up
- We would need to know about different processor architectures: mainly registers
Alternative methods of intercepting system calls:
- Use a loader/library than runs before ld.so, and loads ld.so (a simple ELF loader)
See rtldi for an example (see UserModeExec)
Library stays loaded, and handles intercepted syscalls from within the process, as in Ostia
- Hook into vsyscalls
- Change AT_SYSINFO in the auxv before running
- This didn't work with pre-2.4 glibcs, but this is less of an issue now. Other libcs and statically linked executables might not use vsyscalls.
- Use ptrace() to bounce system calls back to the process by setting PC (bad for performance, but OK for testing purposes)
- Coyotos might also be emulating Linux kernel ABI?
- Other systems, eg. BSDs, handle Linux kernel ABI?
We need to handle kernel ABI anyway if we do ptracing and implement PtraceJail
This only true to a limited extent. PtraceJail would only need to know about system call numbers. It would not need to know about arguments to system calls.
- Kernel ABI is smaller than glibc ABI
- strace already documents kernel ABI
- The kernel of course documents the kernel ABI, and is probably easier to understand than glibc code
- Other systems must understand kernel ABI, e.g. systemtap
Services we'd no longer get:
- Making syscalls.
- But there are plenty of other libraries, eg. dietlibc. dietlibc may not deal well with being dynamically linked, but that does not matter if our code is not dynamically linked.
- We would have a description of the kernel ABI, so no problem to use that!
- Some potential for nesting syscall wrapping.
- Dynamic linking: not really an issue. Don't need the loader (which contains syscall emulator/RPC code) to be split into dynamically linked parts.
- However, may need to relocate the loader on startup. See glibc's ld.so code for how to do this.
- Standard library functions: memory allocation, string handling
- Not really a problem, can just use something like dietlibc
- Threading: library can't necessarily use pthreads for locks.
- Could make one connection per thread. We would need to be able to determine which thread we're being called from, in order to determine which file descriptor to use. That can't be done transparently without a system call. Normally this is done using thread-local storage, which assumes something about the register usage of the process's ABI. (To be fair, having the system call use stack also assumes something about the ABI used in the process.)
- We could find a way to link the library with libpthread even though it was loaded before ld.so. This wouldn't be transparent.
Presumably Ostia handles this somehow?
Potentially awkward cases:
- readdir(): glibc implements a higher level interface than the kernel's getdents(). We would prefer to be able to reimplement the higher level interface.
Things that could be easier:
- Passing private args across execve(). No longer need to read special variables from environment.
Tasks:
- Create loader for ld.so that redirects auxv's vsyscall entry to point to its own handler
- Make the wrapper able to make its own real syscalls
- Make it work on multiple architectures
To investigate:
virtualsquare.org and View-OS: these appear to work by intercepting syscalls, so may have code we can reuse
