Persistence
Why persistence is important
Use cases
Kinds of objects to persist
A trusted server can provide a limited set of object types:
- Immutable files and directories: In the store, record the SHA1 hash (as in git).
Mutable files and directories (FsObjReal objects): In the store, record the filename. Maybe record the inode number for verification purposes.
Other object types: FsOp, FsObjCopyOnWrite, etc.
- Array of capabilities
It is easy for this one server process to save its object graph to a store.
If the trusted server provides enough of these abstractions, we can create a description of an untrusted process (its initial environment: root directory, executable to launch, etc.) and record it in the store. We can launch a process given a reference into the store to its description.
But how do we save and restore objects implemented in other, untrusted processes?
Option 1:
Each process is responsible for saving its internal object graph to a file (using kernel system calls). Trusted server remembers list of processes and saves the inter-process object graph.
Problem: consistency. When do the subgraphs get saved? What if they are not synchronised and can't be connected up?
Option 2:
Processes go via trusted server for saving capabilities and related data. Trusted server writes out whole, consistent object graph.
Issues
Storage reclamation, garbage collection
References into the store:
- eg. GNOME panel runs outside system. .desktop file contains index into the store to a descriptor specifying how to launch an application.
References from the store to outside world:
- eg. To files on host system. May want to move files around, so it's useful if these references can be reconnected.
- eg. To network ports.
Implementation
Strategies for saving store
Two ways to save an object graph:
- Stop-and-snapshot: Start from root references and traverse object graph. Store would be a flat file. This is fine for outputting a complete snapshot, but no good for outputting changes incrementally. This is like a stop-and-copy garbage collector. Unreachable objects will not get written to store, which is good.
- Write-through: When a persistent object is changed, write the changes to the underlying store immediately. Underlying store can be implemented by another layer, e.g. Berkeley DB or Samba tdb.
Not all persistable objects need to be persisted. When an object is initially created, it will be reachable from non-persistent roots but not from persistent roots. With a write-through strategy, linking the object in and making it reachable from persistent roots should cause it to be written into the store and allocated an object ID.
When the object is dropped completely it should be removed from the store. But it may become unreachable from persistent roots while still being live (reachable from non-persistent roots). Should it be removed from the store then? Probably not, because this situation may be temporary. Note that this is similar to inodes on a filesystem that have been unlinked from the directory tree (except that there is no way to relink such inodes). If the system crashes these inodes will get freed by fsck on startup.
Identifying persistable objects
Several ways to do this:
- Wrapper: Have a persistent wrapper around a non-persistent object. The wrapper knows how to recreate the non-persistent object.
- Weak-map: The store has a weak mapping that records how to recreate objects. It maps objects to constructors and arguments.
- Persistable-interface: Objects know how to save themselves. Objects may identify themselves as such by deriving from a Python class and implementing a method.
