Native Client: Dynamic linking plan
This page has been moved to http://code.google.com/p/nativeclient/wiki/AcquiringDynamicLibraries and http://code.google.com/p/nativeclient/wiki/DynamicLinkingPlan
Contents
Background
Why use dynamic libraries?
- Aids code sharing -- allows executables to be smaller
- Allows library versions to be changed without having to relink executables
- This also makes it easier to comply with the GNU LGPL
- Some programs don't know in advance what they will be linking against
- e.g. CPython's C extension modules. Imports are determined by Python code.
- Plugins
Native Client is being extended to support dynamic loading. We would like to support dynamic linking and dynamic libraries on top of this. While the basic dynamic loading support (validating and loading a chunk of code) must be implemented in NaCl's trusted runtime, the dynamic linker that sits on top of this, implementing symbol resolution, library lookups, dlopen(), etc., will be entirely untrusted code.
Plan
The basic plan is to use GNU libc (glibc for short) to provide dynamic linking support. glibc includes a dynamic linker (ld.so).
Changes involved
- Port glibc. I have already done a basic port of glibc.
Add scripts to build glibc as part of the NaCl tree (as an alternative to nacl-newlib), on Buildbot, to prevent regressions.
- Toolchain changes:
Basic -fPIC code generation. -- Committed: fixed bug in the assembler.
Make the linker generate correct PLT entries that pass the validator. -- Committed.
Linker TLS instruction sequence rewriting. -- Binutils patch written but not committed.
- Linker scripts.
- Trusted runtime changes:
- Extend sel_ldr to be able to load the dynamic linker, ld.so.
How libraries will be loaded
Each .so file can be fetched from a URL.
The NaCl browser interface already provides a Javascript interface to fetch a URL and return a NaCl file descriptor for the file. We can use this interface for fetching the executable and the .so files it requires. Using a file descriptor for the file is important if we want to provide an mmap operation for loading code or data. However, if code is always dynamically loaded by copying rather than mapping, there is no advantage for NaCl to provide a mmap operation. If this is the case, the web app can fetch code by any means, such as XMLHttpRequest and any new mechanisms.
Same Origin Policy
XMLHttpRequest is constrained by a Same Origin Policy (SOP). NaCl's interface for fetching files will also be constrained by a SOP; note that the NaCl NPAPI plugin has to implement the SOP itself.
The main reason for the SOP is that XHR requests convey cookies -- a type of ambient authority. The Same Origin Policy is not intended to prevent web apps from sending messages across origins (this is possible via redirects and <img> elements); it is only intended to prevent the web app from seeing the server's response to the request.
Loading libraries in NaCl is analogous to loading Javascript files via the <script src=...> tag. However, interestingly, <script> is not constrained by the SOP. The server is effectively opting in to revealing the response across origins by setting the content-type to "text/javascript". Supposedly the response is not revealed directly to the web app: the DOM, which is trusted, evaluates the Javascript and so the web app only gets access to the values the script assigns to variables. In NaCl's case, however, interpreting .so files is done by untrusted code. We have to reveal the fetched data to the web app, so NaCl cannot be as unconstrained as the <script> tag.
Sharing libraries across sites
It will be desirable to share library files across sites, so that the browser does not have to download identical files multiple times. This problem already exists for Javascript libraries. NaCl executables and libraries are expected to be larger than Javascript libraries so the problem becomes more important.
For Javascript libraries, the main (only?) mechanism for doing this is the <script> tag. This allows sharing in a centralised model in which multiple sites pick a central site to download the library file from. This works because <script> does not follow a Same Origin Policy. Sites using the central site's services are vulnerable to the central site which can change the file contents it serves up. The script text is not available across origins so the web app cannot check the text against a hash before running it.
For NaCl, web apps could fetch libraries using CORS or Uniform Messaging (formerly known as GuestXHR), which are not NaCl-specific.
We might also wish to allow decentralised sharing of files. For example, sites A and B both host libfoo.so. If the browser has already downloaded libfoo.so from site A, it won't need to download it again from site B, and vice-versa. Schemes for doing this by embedding secure hashes into URLs have been proposed (see Douglas Crockford's post).
This problem is not unique to NaCl, so we should not adopt a solution which is NaCl-specific.
Prefetching
The naive approach is to fetch each library file as ld.so requests them. We could reduce latency by listing all the libraries we expect to load up-front. The Javascript code can request the files on startup, to fetch them into the browser's cache. This would just mean that the requests are pipelined.
Versioning
As with static linking, each web app gets to choose its own version of libc and other libraries. Furthermore, different NaCl processes in the same web app can use different libc versions. Libc is not supplied by the browser.
We don't expect there to be a huge number of libc versions, but older and newer versions of the same libc are likely to be around at the same time, as are different libc implementations (such as newlib and glibc).
Web apps get to pick a set of libraries that are known to work well together. This is analogous to selecting a set of Javascript libraries, or selecting a set of packages for a software distribution such as Debian or Fedora. This way we can avoid "DLL hell"; libraries are not the responsibility of the end user.
This provides extra flexibility that is not available to typical applications on Linux when packaged with commonly-used packaging systems like dpkg or RPM. Packaging systems such as Zero-Install and Nix allow multiple library versions to coexist in the same way that I am proposing for NaCl.
Though we have this extra flexibility we will still have all the versioning mechanisms that are available in ELF shared libraries normally: libraries can opt to provide stable ABIs and declare interfaces via sonames and ELF symbol versioning; we get the benefit of separate compilation.
Upgrading libraries is the responsibility of the web app.
Licensing issues
- glibc is licensed under the GNU LGPL.
- We have to provide the source. That should not be a problem.
- Others who supply glibc's .so files may also have to provide the source. LGPLv3 may have relaxed this requirement.
- The intention of glibc's requirement is that programs linked to the LGPL'd library can be relinked to different versions of the library. This is satisfied by using dynamic linking.
- It may be useful to provide a framework for linking to the source code of an open source library whenever we link to a compiled copy of the library.
Disadvantages of dynamic linking
- Position independent code can be a little slower than fixed-position code on some architectures. On i386, a register (%ebx) often has to be given over to locating the current library (specifically, its GOT) because the instruction set lacks a PC-relative addressing mode.
- There is a cost to doing linking and symbol lookup at runtime. This tends to be worse for C++ where there tend to be more symbols, with longer names. This cost can be avoided by prelinking, but we don't plan on supporting prelinking. Prelinking is being used less these days because it defeats address space randomisation.
Terminology
These terms all mean much the same thing: dynamic library, shared library, shared object, dynamic shared object (DSO), .so file
