Learning C was quite difficult for me. The basics of the language itself werenât so bad, but âprogramming in Câ requires a lot of other kinds of knowledge which arenât as easy to pick up on:
- C has no environment which smooths out platform or OS differences; you need to know about your platform too
- there are many C compiler options and build tools, making even running a simple program involve lots of decisions
- there are important concepts related to CPUs, OSes, compiled code in general
- itâs used in such varied ways that thereâs far less a centralised âcommunityâ or style than other languages
This page is a living collection of summaries, signposts, and advice for these broader points that made my journey with C and other compiled languages easier. I hope itâs useful to you! (And if it is, make sure to subscribe for any updates.)
- General resources
- Good projects to learn from
- Compilation, linking, headers, and symbols
- Undefined behaviour (UB)
- Do not use these functions
- Arrays arenât values
- Essential compiler flags
- Three types of memory, and when to use them
- Naming conventions
- static
- The struct method pattern
- const
- Platforms and standard APIs
- Integers
- Macros vs const variables
- Macros vs inline functions
General resources
- TutorialsPoint C: very basic intro
- awesome-c: big list of libraries and tools
- cppreference: technical reference for the C language and standard library
Good projects to learn from
Sometimes itâs helpful to just read some small, self-contained C code to get to grips with how it looks.
- Bloopsaphone, a Ruby library for synthesising sounds which has a small C module at its core. Has a small number of concepts and a good structure.
- esshader, a GLSL shader viewer like ShaderToy.com. A small program which just glues a few libraries together.
- Brogue CE, a roguelike video game, >30k LOC. I maintain this, and many of our contributors have sharpened their C by working on it.
- Simple Dynamic Strings (sds). Has one .c and .h file each, and is a good example of how you might do more complex resource management.
- stb single-file libraries. These are small to medium-sized modules designed to be highly portable, including targetting embedded devices and games consoles.
Compilation, linking, headers, and symbols
Some basics on how C compilation works, because it will help other things make sense.
C code is written in .c source files. Each source file is compiled to a .o object file, which is like a container for the compiled function code in the .c file. They are not executable. Object files have inside them a table of symbols, which are the names of the global functions and variables defined in that file.
# compile to objects
cc -c thing.c -o thing.o
cc -c stuff.c -o stuff.o
Source files are completely independent of each other, and can be compiled to objects in parallel.
To use function and variables across files, we use header files (.h). These are just ordinary C source files used in a specific way. Recall above that object files only contain the names of global functions and variablesâno types, macros, or even function parameters. To use symbols across files, we need to specify all this extra information needed to make use of them. We put these âdeclarationsâ1 in their own .h file, so other .c files can #include
them.
To avoid duplication, a .c file will typically not define its own types/macros etc. and will just include the header file for itself or the module/component itâs part of.
Think of a header file as a specification of an API, that can be implemented across any number of source files. You can even write different implementations of the same header, for different platforms or purposes.
When compiling a reference to a symbol that has only been declared (e.g. by an included header) and not defined, the object file will mark that this symbol is missing and needs to be filled in.
The final work of joining one or more objects together, matching up all symbol references, is done by the âlinkerâ component of the compiler. The linker outputs complete executables or shared libraries.
# link objects to executable
cc thing.o stuff.o -o gizmo
In summary, we donât âincludeâ other source files in C, like we do other languages. We include declarations, and then the code gets matched up by the linker.
Undefined behaviour (UB)
Quite a lot of behaviour in C is specified by the standard as undefined. Any undefined behaviour makes the program, in theory, badly-formed, and may lead to inconsistent behaviour or crashes. Unfortunately, it is hard to remember and suprisingly easy to encounter. In many cases, compilers will patch over UB with sensible (but compiler-specific) code, making it hard to notice it at all.
Here is a strange bug we had in Brogue, only on some platforms, due to UB: Missing item names ¡ Issue #30 ¡ tmewett/BrogueCE
For more details see Nayukiâs Undefined behavior in C and C++ programs.
Do not use these functions
C is old and tries to be highly backwards-compatible. As such it has features that ought to be avoided.
atoi()
,atol()
, and friends; they return 0 on error, but this is also a valid return value. Preferstrtoi()
, etc.gets()
is unsafe as no bounds on the destination buffer can be given. Preferfgets()
.
See also My review of the C standard library in practice, where Chris Wellons highlights many issues across the entire standard library.
Arrays arenât values
Itâs important to realise that C, as a language, deals only with known-size pieces of data. You could probably summarise C as âthe language of copying known-size values.â
I can pass a integer or a struct around a program, return them from functions, etc. and treat them as proper objects because C knows their size and hence can compile code to copy their full data around.
I canât do this with an array. The sizes of arrays are not known in any useful way to C. When I declare a variable of type int[5]
in a function, effectively I donât get a value of type int[5]
; I get an int*
value which has 5 ints allocated at it. Since this is just a pointer, the programmer, not the language, has to manage copying the data behind it and keeping it valid.
However, arrays inside structs are treated as values and are fully copied with the struct.
(Technically, sized array types are real types, not just pointers; e.g. sizeof
will tell you the size of the whole array. But you canât treat them as self-contained values.)
Essential compiler flags
Compilers have so many options and the defaults arenât very good. Here are the absolute essential flags you may need. (They are given in GCC/Clang style; syntax may vary on other compilers.)
-O2
: optimise code for release builds-g -Og
: for debug builds; enable extra information for debuggers, and optimise for debugging-Wall
to enable many warnings (kind of like a linter). You can disable specific warnings with-Wno-...
-Werror
to turn warnings into errors. I recommend always turning on at least-Werror=implicit
, which ensures calling undeclared functions results in an error(!)-DNAME
and-DNAME=value
for defining macros (useful to pass config options from the build systems to the compiler)-fsanitize=address,undefined
: for debug builds; enables two common âsanitizers,â which inject extra checks throughout the compiled code to find errors. See also all GCC instrumentation options.-std=...
: choose a standard. In most cases you can omit this to use your compilerâs default (usually the latest standard).
See also:
- the full docs for the huge number of options GCC supports
- Chris Wellonsâ favorite C compiler flags during development
Three types of memory, and when to use them
-
Automatic storage is where local variables are stored. A new region of automatic storage is created for a function when it is called, and deleted when it returns. Only the return value is kept; it is copied into the automatic storage of the function which called it. This means that it is unsafe to return a pointer to a local variable, because the underlying data will be silently deleted. Automatic storage is often called the stack.
-
Allocated storage is the result of using
malloc()
. It survives until it isfree()
âd, so can be passed wherever, including upwards to calling functions. It is often called the heap. -
Static storage is valid for the lifetime of the program. It is allocated when the process starts. Global variables are stored here.
If you want to âreturnâ memory from a function, you donât have to use malloc
/allocated storage; you can pass a pointer to a local data:
void getData(int *data) {
data[0] = 1;
data[1] = 4;
data[2] = 9;
}
void main() {
int data[3];
getData(data);
printf("%d\n", data[1]);
}
Naming conventions
C has no support for namespaces. If youâre making a public library, or want a âmoduleâ to have a name, you need to choose a prefix to add to all public API names:
- functions
- types
- enum values
- macros
Additionally, you should always include some different prefix for each enum, so you know which enum type the value belongs to:
enum color {
COLOR_RED,
COLOR_BLUE,
...
}
Thereâs no real convention about names, e.g. snake_case
vs camelCase
. Pick something and be consistent! The closest thing to a convention I know of is that some people name types like my_type_t
since many standard C types are like that (ptrdiff_t
, int32_t
, etc.).
static
On a function or file-level variable, static
makes it file-local. It wonât be exported as a symbol for use by other source files.
static
can also be used on a local variable, which makes the variable persist between calls to that function. You can think of this like a global variable that is scoped to only one function. This can be useful to compute and store data for reuse by subsequent calls; but remember, this comes with the usual caveats of global/shared state, such as clashing with multiple threads or with recursion.
(It can seem like it has multiple meanings, since in a global scope it seems to reduce the scope of the variable, but in a function scope it increases it. Really what itâs doing in both cases is making them file-linked.)
The struct method pattern
If you learned a more featureful language before C, you might find it hard to visualise how to translate that knowledge. Hereâs a common idiom which resembles object-oriented programming: the âstruct method.â You write functions which accept pointers to structs to alter them or get properties:
typedef struct {
int x;
int y;
} vec2;
void vec_add(vec2 *u, const vec2 *v) {
u->x += v->x;
u->y += v->y;
}
int vec_dot(const vec2 *u, const vec2 *v) {
return u->x * v->x + u->y * v->y;
}
You canât extend structs or do anything really OO-like, but itâs a useful pattern to think with.
const
Declaring a variable or parameter of type T
as const T
means, roughly, that the variable cannot be modified. This means that it canât be assigned to, and also that it canât be changed if T
is a pointer or array type.
You can cast T
to const T
, but not vice versa.
Itâs a good habit to declare pointer parameters to functions as const
by default, and only omit it when you need to modify them.
Platforms and standard APIs
When you pull in #include <some_header.h>
itâs hard to conceptualise what youâre depending on. It will be from one of the following:
- The standard C library (abbr. âstdlibâ). Examples:
stdio.h
,stdlib.h
,error.h
- This is part of the language specification, and should be implemented by all compliant platforms and compilers. Very safe to depend on.
- https://en.cppreference.com/w/c/header
- POSIX, a standard for operating system APIs. Examples:
unistd.h
,sys/time.h
- Generally implemented by Linux, macOS, BSDs.
- Not available by default on Windows. Some misc. POSIX APIs are available if you use MinGW. For more complete support, there is the Cygwin library.
- You can view all details of POSIX headers (incl. C stdlib) at the official OpenGroup standard page (click âHeadersâ in the sidebar), or in section 3 man pages.
- A non-standard operating system interface:
- Linux-specific APIs - documented in section 3 man pages
- Windows Win32 (FYI, a more modern C++ interface called C++/WinRT is also available.)
- (Macâs OS APIs are historically used via Objective C (now Swift), not C.)
- A third-party library, installed in a standard location.
It can be a good idea to interface with your more platform-specific code through a platform-neutral header file so it can be implemented in different ways. Lots of popular C libraries are basically just unified, well-designed abstractions over platform-specific functionality.
Integers
Integers are very cursed in C. Writing correct code takes some care:
Sizes
All integer types have a defined minimum size. On common platforms, some are larger than their minimum size, such as int
, which is 32-bit on Windows, macOS, and Linux, despite being minimum 16-bit. When writing portable code, you must assume integers can never go above their minimum size.
If you want exact control over integer sizes, you can use the standard types in stdint.h
, like int32_t
, uint64_t
, etc. There are also _least_t
and _fast_t
types.
Should you use these well-specified types everywhere you can? I must admit Iâm torn on this question, but the more I think about it, the more I think you shouldâthere are no downsides.2 The only reason you really shouldnât is when making an API which has to interface with very old C89 compilers which lack stdint.h
. Thereâs also an argument for considering what the type communicates to the reader and whether the size is actually important; however by using standard types like int
you are still implicitly relying on a certain size. Itâs probably no worse, yet clearer, to use int16_fast_t
or something over int
. (However, typically no one does this, including me!)
Arithmetic and promotion
Arithmetic in C is subject to many bizarre rules which can give unexpected or unportable results. Integer promotions are especially important to be aware of.
See Nayukiâs summary of C integer rules.
char signedness
All other integer types default to signed, but bare char
can be signed or unsigned, depending on the platform. As such, itâs only portable when used for strings; specify the sign too if you want a small/minimum-8-bit3 number.
Macros vs const variables
To define simple constant values, you have two choices:
static const int my_constant = 5;
// or
#define MY_CONSTANT 5
The difference is that the former is a real variable and the latter is a copy-pasted inline expression.
- Unlike variables, you can use macros in contexts where you need a âconstant expression,â like array lengths or switch statement cases.
- Unlike macros, you can get a pointer to a variable.
Having constants actually be âconstant expressionsâ is very useful and hence they should usually be defined as macros. Variables are better for larger or more complex values like struct instances.
If your constant is an integer, you have a third, better option, the âbare enumâ:
enum {
MY_CONSTANT = 5
}
This defines a constant expression in C, not in the pre-processor, so it can be more easily seen by debuggers etc.
In C23, you can optionally give explicit âunderlying typeâ to an enum:
enum : size_t {
BUFFER_LENGTH = 1024
}
Macros vs inline functions
Macros can have parameters, which can then expand to C code.
Advantages over functions:
-
The code is pasted right in the surrounding code, instead of compiling function call instructions. This can make code faster, as function calls have some overhead.
-
They can be type-generic. For example,
x + y
is valid syntax for any numeric type. If we made that a function, weâd have to declare them as arguments and choose their type, i.e. size and signedness, in advance, which would make it only usable in some contexts.
Disadvantages:
-
Repeated evaluation of arguments. Suppose we have a macro
MY_MACRO(x)
. Ifx
is used multiple times in the definition, then the expressionx
will be evaluated multiple times, because it is simply copied and pasted.4 Compare that with a function, where expressions as arguments are evaluated once to values and then passed into the function. -
They can be error-prone because they work at the source level. It is generally a good idea to use brackets gratuituously, always around the whole macro definition itself and any arguments, so expressions donât merge unintentionally.
// Instead of: #define MY_MACRO(x) x+x // Do: #define MY_MACRO(x) ((x)+(x))
Unless you need to be type-generic, you can get the best of both worlds by defining a function as static inline
. inline
provides a hint to compilers that the code in the function should be compiled directly into where it is used, instead of being called. You can put static inline functions in header files, just like macros, with no issues.
Additionally, since C11 you can provide overloads of functions for different types using a special macro _Generic
:
#define sin(X) _Generic((X), \
long double: sinl, \
default: sin, \
float: sinf \
)(X)
-
https://stackoverflow.com/questions/1410563/what-is-the-difference-between-a-definition-and-a-declaration ↩
-
But not always 8 bit.
char
is special because itâs the smallest addressible type on the current platform, which is not required to be (but basically always is) 8 bits. The size ofchar
in bits is available in the macroCHAR_BIT
fromlimits.h
. All other sizes in C, such as fromsizeof
, are in units ofchar
. ↩ -
If the expression has no side effects and the compiler can figure this out, it might be optimised by common subexpression elimination. ↩