Introduction to glibc's dynamic linker
Contents
The dynamic linker's name
On most architectures, including i386, the dynamic linker has the filename /lib/ld-linux.so.2.
On x86-64, the dynamic linker lives under /lib/ld-linux-x86-64.so.2. Having a different filename from i386 is part of multilib support: it allows 32-bit and 64-bit libraries to coexist in the same filesystem. The pathname of the dynamic linker is embedded into executables (more on this below) and it is one of the few hard-coded pathnames in the core of Unix that cannot be overridden by setting an environment variable.
The dynamic linker is often referred to as "ld.so", because that was its name on earlier systems.
How the linker + executable get loaded
There are two ways of invoking the dynamic linker (ld.so). If you exec an executable, the kernel will load both the executable and ld.so. It is also possible to exec ld.so directly, in which case the ld.so will load the executable.
Kernel loads the executable
$ /bin/echo hello hello
When you run a dynamically linked executable on Linux, such as /bin/echo, the following happens:
The kernel opens /bin/echo. It recognises the file as an ELF executable from its header, and proceeds to look at a part of the ELF file called the ELF Program Header. You can display this header with objdump -p /bin/echo or readelf -l /bin/echo. (These overlapping tools dump similar information but in different formats.)
$ readelf -l /bin/echo
...
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4
INTERP 0x000114 0x08048114 0x08048114 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x059b0 0x059b0 R E 0x1000
LOAD 0x0059b0 0x0804e9b0 0x0804e9b0 0x001a4 0x00334 RW 0x1000
DYNAMIC 0x0059c4 0x0804e9c4 0x0804e9c4 0x000d0 0x000d0 RW 0x4
NOTE 0x000128 0x08048128 0x08048128 0x00020 0x00020 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
...
The Program Header (also known as "PHDR") is an array of entries. The kernel looks at only two types of entry, LOAD and INTERP (in elf.h, these are named PT_LOAD and PT_INTERP).
The LOAD entries specify segments of the file to map into memory. In normal ELF objects, there are two segments: the text segment (containing code and read-only data) and the data segment (for writable data). The kernel's ELF loader will map these into the process's address space. Some ELF objects specify fixed load addresses (0x08048000 for executables on i386 Linux), while others allow the ELF loader to choose a load address.
The INTERP entry is specific to executables. It specifies the filename of another ELF object, which for normal executables is the dynamic linker. The kernel opens this file and maps its LOAD entries into memory in the same way as for the executable. Then it hands control to the process by jumping to the dynamic linker's entry point address. (You can find the entry point with objdump -f or readelf -h.) Statically-linked executables don't have an INTERP entry; for these, the kernel will jump to the executable's entry point address.
The kernel's ELF loading behaviour is fairly simple. It is all standardised in the ELF ABI standard.
Once ld.so gets control, it recursively loads all the libraries that the executable depends on, and then passes control to the executable.
Running ld.so directly
$ /lib/ld-linux.so.2 /bin/echo hello hello
It is possible to invoke the dynamic linker directly. If you run it without any arguments, it will print a help message which is pretty self-explanatory:
$ /lib/ld-linux.so.2
Usage: ld.so [OPTION]... EXECUTABLE-FILE [ARGS-FOR-PROGRAM...]
You have invoked `ld.so', the helper program for shared library executables.
This program usually lives in the file `/lib/ld.so', and special directives
in executable files using ELF shared libraries tell the system's program
loader to load the helper program from this file. This helper program loads
the shared libraries needed by the program executable, prepares the program
to run, and runs it. You may invoke this helper program directly from the
command line to load and run an ELF executable file; this is like executing
that file itself, but always uses this helper program from the file you
specified, instead of the helper program file specified in the executable
file you run. This is mostly of use for maintainers to test new versions
of this helper program; chances are you did not intend to run this program.
--list list all dependencies and how they are resolved
--verify verify that given object really is a dynamically linked
object we can handle
--library-path PATH use given PATH instead of content of the environment
variable LD_LIBRARY_PATH
--inhibit-rpath LIST ignore RUNPATH and RPATH information in object names
in LIST
The ability to run ld.so directly is useful when testing new versions of libc.
It is also useful in environments where the normal filesystem namespace is not available. Under Plash's ChrootSetuidJail, it is not possible to use the execve syscall to run /bin/echo directly, because /bin/echo is not inside the chroot jail directory. The only executable inside the chroot jail directory is ld.so, so we exec that. ld.so then acquires file descriptors for the executable and its libraries via a socket, using PlashGlibc's implementation of open().
To implement dynamic linking in NativeClient, I propose to run sel_ldr on ld.so in much the same way as ld.so is invoked directly under Plash. This means we do not have to extend sel_ldr to understand the PT_INTERP field and to know how to load two ELF objects. We only need to teach sel_ldr how to load relocatable executables (those of type ET_DYN, like ld.so, rather than ET_EXEC), and make it ignore program header entries that it does not need to look at, such as PT_DYNAMIC.
Relationship between ld.so and libc.so
Although ld.so and libc.so are separate files, they are both part of glibc. They are built from a single source package. The two are interdependent to the extent that it is not possible to upgrade one without upgrading the other. The same applies to other libraries that are part of glibc, such as libpthread.so and libdl.so.
While it might be nice to have a dynamic linker that is separate from libc, there are some obstacles to doing so:
- Memory allocation: ld.so needs to allocate memory and uses malloc() and free(), which come from libc.so. (During startup, ld.so provides its own malloc/free implementations which cannot free memory. These get replaced after libc.so is loaded because they are defined as weak symbols.)
Thread Local Storage: Libraries can contain thread-local variables (those declared with __thread) and ld.so has the responsibility of allocating storage for them. ld.so has to set up the %gs register on i386 on startup, and it implements ELF's __tls_get_addr function. An example of a TLS variable is errno, which is provided by libc.so. (Before generalised TLS was added to glibc, errno was accessed via a function, __errno_location.) libc.so has special knowledge of how thread-local storage is organised so that its TLS accesses can be reduced to the quickest instruction sequences. Many of the syscall wrappers that libc.so provides access TLS in order to check for thread cancellation points.
Other libcs
newlib (mostly BSD licensed) does not have dynamic linking support.
dietlibc (GPLv2) has dynamic linking support; it appears to be fairly experimental.
uClibc (LGPL) has dynamic linking support.
Further reading
How To Write Shared Libraries, Ulrich Drepper. This contains a lot of introductory material on how dynamic libraries work in ELF.
- ELF ABI (System V ABI Edition 4.1), and the supplement for i386 (I need to find stable links to the PDFs for these)
