What is an object / linker / toolchain / ...? (Glossary of compilation terms)

26 April 2025

This is an overview of some essential terms related to using compilers and programming in C or C++. (The examples focus on GCC-like tools on Linux but I tried to make the definitions as general as possible.)

compiler: Usually: a program which transforms source code into machine code for execution.

(The term is somewhat vague; it sometimes describes other language transformation programs, such as source-to-source “transpilers” or code generators. Also see the definition of toolchain.)

object file: An intermediate product of compilation. An object file contains fragments of compiled code and data, made from the global functions and variables defined in the input source files.

Object files exist to allow compilation to be split up into independent parts, saving time via parallelisation and partial re-compilation. Conventionally, each .c source file in a project is compiled, in parallel, to its own object file, typically with extension .o.

# compile to objects
gcc -c thing.c -o thing.o
gcc -c stuff.c -o stuff.o

GCC-like compilers (which includes Clang) do not emit object files in their default mode; you use the command-line option -c.

Object files are not usable on their own. They need to be combined or “linked” together to create a usable product.

Each fragment in an object file has a string name called a symbol. There is also a list of unresolved symbols, for functions and variables in other files that the code made reference to but did not define.

With GCC-like compilers, you can see the symbols in an object with the tool nm:

$ nm src/brogue/Monsters.o
...
                 U ___chkstk_ms
                 U advancementMessageColor
00000000000034a9 T alertMonster
                 U allocGrid
00000000000055a1 t allyFlees
                 U allySafetyMap
                 U applyInstantTileEffectsToCreature
                 U armorTable
                 U attack
0000000000000421 t attackWouldBeFutile
00000000000026bb T avoidedFlagsForMonster
0000000000003626 t awarenessDistance
000000000000372e t awareOfTarget
...

Note that not every programming language has a single globally-visible namespace like C. For example, in C++ you can define your own namespaces and classes. Functions and variables within need to have their namespace information encoded in their symbols—this process is called name mangling.

linker: The program which links together multiple object files into one final product (either an executable or a shared library). The linker matches up each unresolved symbol with a definition in another object, finishing the compilation.

With gcc, linking is done by using no mode options:

# Link two object files to create an executable 'gizmo'
# (gcc's default mode is to compile and link)
gcc thing.o stuff.o -o gizmo

The underlying tool is called ld but is only rarely used manually.

header file: A C source file which contains cross-file information and declarations.

Object files contain symbols, but making use of something defined in another file requires more information than just its name; for example, you must know the type of a variable or function in order to use it. Such type details (e.g. declarations) are conventionally contained in a header file with extension .h and then included in the .c files which need it.

// extern marks a function or variable as being defined in another file
extern int myVariable;
// extern is implicit for function declarations
int myFunction(int);

Header files also typically contain definitions like macros, types, etc.

Think of a header file as a specification of an API which can be implemented across any number of source files. You can even write different implementations of the same header, for different platforms or purposes.

Header files are not special in any way. They’re just normal C code and they can technically contain anything.

Many projects have a convention where each .c file should have its own corresponding .h file, but this isn’t a requirement. There’s nothing stopping you from, for example, putting all declarations in the project in one central header file and including that everywhere. (But this does mean that every time you change the header, your build system will recompile all source files, which for large projects is way too time-consuming.)

toolchain: A vague term for a suite of tools which work together to produce software. In C and C++ development, it mainly refers to all the tools used to produce executables or libraries from source code, such as a compiler, a linker, an assembler, and utilities for parsing object files and executables.

The concept is highly influenced by GCC, which, following the UNIX Philosophy, has split the job of compilation up into many separate tools:

Diagram of GCC's components

Usually you don’t run the separate tools themselves—GCC comes with a driver program called gcc which manages the whole compilation process. You just use gcc with the appropriate command-line options. We’ve already seen -c, which tells gcc to not link. There are other options too; see Options Controlling the Kind of Output (GCC Manual).

You should also avoid mixing tools from different toolchains because they may not interoperate well. (I once had an exe register as a virus because I used a tool from the native Windows SDK on an executable compiled by mingw-w64.) However, some tools are ok to mix because they operate on standard data/file formats; for example, on Linux there is a standard format ELF for executables and object files, so you can use any linker which works with ELF, of which there are several alternatives like mold and lld.

Often an entire compilation toolchain is just refered to as “the compiler” because the distinction between the various parts of a toolchain are usually not important; they are mostly used via the driver program. In a more zoomed-in context, “compiler” could refer to just cc1, the C-to-assembly component.

See also avr-libc: Toolchain Overview, which is documentation from a port of GCC which explains the GNU toolchain and its architecture in more detail.

static library: A collection of object files for use in another project, along with header files.

With GCC-like compilers, the object files are often bundled into a special single-file archive with extension .a, which are created with the tool ar.

When a library is said to be “statically-linked” into a program, it means that the library was compiled as a static library and then linked with the project’s own object files.

dynamically-linked library / shared library: Very similar to an object file or static library, but designed for run-time linking instead of compile-time. They are produced by the linker, and conventionally have file extension .so on Linux, .dll on Windows, and .dylib on macOS.

Executables contain the names of any shared libraries they depend on. (With GCC-like linkers, you add such dependencies with the -l flag.) When an executable runs, a system component called the dynamic linker or dynamic loader searches for installed shared libraries with those names and provides the included code and data to the executing program.

Most operating systems are designed around dynamic linking. Static linking is still possible, and well-suited for small and simple libraries; but it’s also common for app developers to just bundle dynamic libraries with their apps.

You might assume, since a shared library is linked at run-time, that the library itself doesn’t need to be available during compilation, just the headers. This is not true. The linker still needs to be told about the shared libraries the project depends on (idk why). Because of this, rarely a library may provide a “stub” version, which provides the same interface but no implementation, i.e., all the functions do nothing. They are designed to be used in place of the real library during the link stage. This is mainly useful when a library is proprietary or only available for a different platform, so the full implementation is not available when compiling.