Choose and unpack Debian packages
Status: implemented
Implemented in: packaging/
This process consists of the following steps, each done by a separate command line program:
- Update-avail: Get Packages.gz lists from Debian repository, based on sources.list file, and combine into a single Packages file.
- Choose: Given initial package (e.g. firefox), build list of all dependencies.
- Fetch: Download .deb files from package list and check MD5 hashes. Store in a cache directory using hashes as filenames.
- Unpack: Unpack the selected packages.
- Optional steps: Run preinst/postinst scripts. These often require dpkg to be working, so another step is to populate the /var/lib/dpkg directory with details of the chosen packages, as debootstrap does.
Arguments involved in process:
- sources.list file
- cache dir for Packages.gz files
- Packages file: list of all available packages
- Packages file: list of chosen packages
- cache directory for storing .deb packages (by package name, by hash)
- cache directory for storing unpacked .debs (control and main data)
- destination directory for unpacking
Contents
Duplicating dpkg/APT
We are reimplementing functionality that is provided by dpkg and APT.
Their functionality can be split into smaller, better-defined steps. Intermediate formats will be defined. This allows for more tweaking at each stage. Steps can be replaced independently.
dpkg and APT overlap: both check dependencies.
In some cases what we do can be simpler than what dpkg/APT do. We never upgrade or uninstall packages.
"Update-avail" step
Usage:
plash-pkg-update-avail
Reads the "sources.list" file (from the current directory). Downloads Packages indexes for each repository listed there (these are cached in the "package-lists" directory), and combines these into a single Packages index, which it outputs to "package-lists/Packages.combined".
Equivalent to "apt-get update", although "update-avail" is more descriptive than just "update".
APT will use DiffIndex to download diffs (which are in ed format). We can omit this for a first cut.
APT fetches the Releases file which contains hashes of the Packages files and is signed with the repository's private key.
"Choose" step
Usage:
plash-pkg-choose <output-dir> <dependency>...
Examples:
plash-pkg-choose firefox/ firefox plash-pkg-choose gnumeric "gnumeric, plash-gtk-powerbox (> 1.0.0)"
Chooses a set of packages that satisfy the specified dependencies. Each dependency can be given in the same format as the "Depends" field of a Debian control file. The list of available packages is read from "package-lists/Packages.combined". The tool writes the list of chosen packages to "<output-dir>/package-list". This is formatted as a cut-down Packages file.
Any packages in the repository marked as "Essential: yes" are added to the output list, even if there are no explicit dependencies on them. This is a conservative approach: though it will include a number of packages that will very likely be needed (bash, libc6), it also includes system tools that will likely not be needed (e2fsprogs).
Other files are written to <output-dir> for debugging purposes. The "choose-log" file shows a tree of the chosen packages. "degrees" lists the chosen packages by degree of separation from the initial dependencies.
An initial cut does not need to check versioned dependencies. It does not need to handle "Provides" or "Conflicts" or "Replaces".
"Fetch" step
Usage:
plash-pkg-fetch <package-list-file>
Example:
plash-pkg-fetch firefox/package-list
Takes a list of chosen packages (typically the result of the "choose" step), and downloads the .deb files of those packages. It stores the downloaded files in the "cache" directory. It will try looking in "/var/cache/apt/archives" for the required files before resorting to downloading them.
The tool will print the total size of the files that need to be fetched. If this is not zero, it will prompt before starting the download.
"Unpack" step
Usage:
plash-pkg-unpack <package-list-file> <destination-dir>
Example:
plash-pkg-unpack firefox/package-list firefox/unpacked
Unpacks the packages in the list into the destination directory. This tool will share packages' files between destination directories: It first unpacks each package into the "unpack-cache" directory (if they have not been unpacked there already). It then hard links the packages' files into the destination directory. Hence file inodes are shared between destinations, although directory inodes are not.
Package list files
We extend the Packages file format with a "Base-URL" field, to which the existing "Filename" field can be appended to give a URL where the .deb can be downloaded. This is just a location hint -- the content is verified using the MD5 hash.
Tasks
Rewrite plash-pkg Perl code in Python
Have plash-pkg scripts and modules installed into Debian package
In the "cache" and "unpack-cache" directories, include hashes in filenames
Add *.list files to /var/lib/dpkg/info to allow Python postinsts to work
Move config/cache dirs from current directory to ~/.cache/plash-pkg
Not being implemented
- Downloading Releases file from Debian repository and checking signatures on Packages files. Checking signatures requires configuring public keys to check against. With APT, the keys are stored in a .gpg file next to sources.list.
The code puts untrusted strings into filenames. In future it should access the filesystem via the FsObj interface so that filenames cannot escape a directory using "..".
