Fundamentals of WebAssembly, the start of the Wasm in the wild series

Join me on a guided journey through the wild world of Wasm. Learn the secret sauce that enables Wasm runtimes to be fast and secure. At the end of this article, you will understand why Wasm was designed as a stack machine and what basic memory regions exist in Wasm.

Introduction

hike starting point

Welcome, welcome! Before we start our hike into the WebAssembly mountains, let us get acquainted. My name is Jakob and I will be your guide.

I still remember my first steps in the valley, as a young student in 2019. It had been around 3 years since I fell in love with Rust. Together, we decided to venture into WebAssembly.

On our first visit, we took the cable cart. I wrote a game in Rust, compiled it to Wasm, and ran it in the browser. Without understanding what exactly is going on. This way, I didn’t need to confront the steep slopes of the WebAssembly mountains to enjoy the view from the top.

Soon after, I wanted to explore more. I wrote an application to Mozilla for an internship, in a team that was working on WebAssembly itself. No answer for weeks. Then, in early 2020, a rejection letter in the same week when layoffs at Mozilla were announced.

That’s how my professional career dived away from Rust and Wasm. I accepted a C++ job. My heart stayed with Rust and Wasm. I could only spend weekends and evenings after work with Rust. Often, we explored the WebAssembly valley together.

In December 2021, finally I started my first Rust job, at a Crypto company. The job description paraphrased: Improve the project’s Wasm compiler and runtime which are written in Rust. Help develop the SDK for smart contracts. Of course, the smart contracts and the SDK are also written in Rust.

Today, I am a free soul and explore the WebAssembly mountains on my own terms. However, the approximately two years at my previous job gave me plenty of opportunities to learn my way around one specific mountain in the region. That mountain is still a couple of day-walks away. So let’s get moving and I will tell you about WebAssembly fundamentals on our way.

fist slope from below

What is Wasm?

WebAssembly, or Wasm, is a portable byte code specification. It describes how an executable program can be structured for execution in a compatible Wasm runtime.

In some sense, ARM and x86 likewise are byte code specifications. They run, typically, on real CPUs. But emulators and virtual machines exist as well. For Wasm, there is, to the best of my knowledge, no physical machine that can execute it. It must always be translated to the CPU’s native language for execution.

But that comparison falls short in several ways. I believe for an introduction, Wasm should better be compared to the abstract machine you will find in the C programming language specification. In this context, the abstract machine defines how C code needs to be executed. It’s the job of compiler engineers to ensure this specification is upheld by the machine code (e.g. x86) the compiler spits out.

Wasm defines a stack-based abstract machine. Sometimes people call it the Wasm virtual machine but I find that misleading. In my mind, a VM is something in the cloud where I can log in. It runs an operating system and makes everything look like I have a full machine to myself. Wasm isn’t like that at all. It’s only a conceptual abstract definition for compiler engineers to follow.

With that out of the way, let us take a brief rest and have a sip of water. After that, I will explain how Wasm is executed on an x86 CPU.

first stop with a well

Wasm Compilation Modes

As explained, Wasm cannot be executed on your CPU directly. Let’s assume you want to run it on a x86 CPU. Then there must be a step that takes Wasm as input and translates it to x86 for the CPU to do the actual work.

The timing of the translations will be our first categorization of Wasm runtimes. Translating the entire Wasm program to x86 all in one go is called ahead-of-time (AOT) compilation. The resulting x86 executable can be stored away and executed directly on the CPU as if you wrote the program in C and compiled it with Gcc.

If you want to be more sophisticated, you may choose to compile Wasm in smaller, incremental steps. This way, already translated code can run while other parts are still being compiled. This is called just-in-time (JIT) compilation.

As a third option, consider a Wasm interpreter. The interpreter takes one operation at a time and executes it. This is different from compilation because the interpreter doesn’t emit x86 instructions like a compiler would. The person programming it doesn’t need to understand x86 at all. It’s all expressed in a higher-level language, such as Rust.

But consider this. All the code written in Rust, including that for executing the Wasm operation, is still translated to x86 ahead of time. It’s just offloaded to the Rust compiler instead of doing it manually. The interpreter, once compiled and ready to run, looks at Wasm operations in the input and jumps to the corresponding x86 code. From that point of view, if you squeeze your eyes, an interpreter is a JIT compiler compiling each operation individually.

Going from an interpreter to an AOT compiler is a small step, conceptually. When interpreting a Wasm operation, instead of executing a bit of x86 that depends on the Wasm operation, just print out the x86 instead. Do it for the entire file and congratulations, you have an executable x86 program!

Now, that is a gross simplification, of course. Not only would such code be super inefficient, but you would also run into difficulties trying to plainly follow my instructions. You will understand what I mean soon when I explain how the runtime enforces the Wasm specification.

hiking path

Enforcing Correct Types

Did you know that Wasm is strongly typed? Not just dynamically typed, like Python or JavaScript. No, actual static-typing, right in the executable assembly code.

Add two things together in Wasm, and the type of both values is guaranteed to be correct. Take x86 for contrast, you can give it an f32 and an i32 and let it add the two using integer arithmetic. If you do it in Rust, the compiler will let you know that there needs to be a conversion step first (num as i32) for this to make sense. But x86 doesn’t care, it will happily interpret the bit-pattern of the f32 and look at it like an i32.

Not so in Wasm. The abstract machine defines what types each operation takes as input. The Wasm runtime must enforce it.

Of course, Wasm is usually not written by hand and compilers are good at generating Wasm that follows the typing rules. But a major requirement for Wasm is that we can store and distribute it. When receiving Wasm over the wire, for example in the browser to do fancy animations for a website, the runtime of your browser cannot simply trust that this code is well-typed. It has to be checked.

If we go back to the idea of a compiler that simply replaces each Wasm operation with a bit of x86, we need to consider how these checks can be performed, too.

An interpreter can have checks on every operation. For example, when function foo is called, check if this function exists. But the interpreter needs to maintain an internal state for this, like a list of available functions to call.

An AOT compiler does these checks through static analysis. If that sounds complicated, static analysis is really just a fancy term for reading code to ensure it follows some rules. It’s called “static” because it is done without executing the code inside. Basically, read a loop once instead of iterating it.

An example of static analysis would be that for every call to fn foo(param1:i32, param2: i32), we ensure there is corresponding code that defines param1 and param2.

Because Wasm is a stack machine, the parameters to foo need to be prepared on the abstract stack, which makes static analysis rather easy. But let me explain this in more detail after a short breather.

a bench for a breather

The WebAssembly Stack Machine

If in Rust you write foo(a,b);, the compiler looks at the names a and b, figures out where they were defined, and what the addresses of these values in memory are. A type-checking runtime needs to perform the same checks

This is non-trivial work. Work we want to avoid in a lightweight runtime. Therefore, Wasm enforces that whenever foo is called, the values for a and b must already be in a well-defined place. And this happens to be the top two values of the Wasm stack.

This makes static analysis so much easier. How? Well, for every line of code in a Wasm program, you could draw a picture of the Wasm stack and the types it contains before and after the line. You may not know the values, this depends on the dynamic execution, but you can always know the exact types.

	before	after
i32.const 0	[]	[i32]
i32.const 4	[i32]	[i32, i32]
i32.add	[i32, i32]	[i32]

Entering a function, we start with an empty stack. At this point, only operations that require no values on the stack are allowed to follow. For example, i32.const NUM is an operation that puts an i32 constant on the top of the stack. The value NUM is a constant value stored in the operation itself, also known as an immediate value. It’s not in memory or on the stack, it’s in the code.

After pushing two values to the stack, you could execute i32.add, which pops two i32 values from the stack and pushes the sum of the two back on the stack. The awesome WebAssembly Opcodes table by Pengo Wray shows all operations with their corresponding stack transition. In this example, for adding, it is [i32, i32] -> [i32].

I hope you start to see how this makes static analysis easy. If you know the stack before an operation, you can check if the operation is valid regarding types. And then you can compute the stack state after the operation, which is the stack state before the next operation.

What about loops? Remember, we should only read each line in the loop once, without executing the loop, or else it wouldn’t be a static analysis.

Here is how we do it. We already know each line must have a fixed corresponding stack height with fixed types. When entering the loop, we know how the stack looks. Now, we just need to ensure that by the end of the loop, if we loop back, the stack has the exact same shape. If it doesn’t, the Wasm code is invalid and will be rejected.

         
(loop $loop_label       ;; [] -> []
   ;; add 1 to $i
   local.get $i         ;; [] -> [i32]
   i32.const 1          ;; [i32] -> [i32, i32]
   i32.add              ;; [i32, i32] -> [i32]
   ;; set $i but also keep value on the stack		
   local.tee $i         ;; [i32] -> [i32]
         
   ;; if $i is less than 5, exit loop 		
   i32.const 5          ;; [i32] -> [i32, i32]
   i32.lt_s             ;; [i32, i32] -> [i32]
   br_if $loop_label    ;; [i32] -> []
)                       ;; [] -> []
;; Print $i
local.get $i            ;; [] -> [i32]
call $log               ;; [i32] -> []

The compiler that outputs the Wasm has to ensure these rules are followed. To make it work, it might have to rearrange the code a bit, pop leftover values from the stack, or add relevant values at the end of the loop body. All that work can be done in a heavy compiler, such as Rustc, leaving only a quick validation step for the compiler of the Wasm runtime to perform.

Pretty cool, isn’t it? The stack-based abstract machine allows for this cheap validation step and a cheap compilation, which makes type-checking and executing Wasm super lightweight, compared to, for example, JavaScript.

Take a moment to enjoy the view and rest. We are getting closer to the end of today’s leg. During the final ascent of today, I will explain where this stack and all the other memory in a Wasm runtime are stored.

a view from Nieschberg

Memory Regions

A typical program in 2024 has the entire virtual memory space to itself, which the OS allocates for it through appropriate page table entries.

When loading an executable, this memory space is subdivided into several areas. One is for the code, another for constant values, and so on. There are two regions programmers are most familiar with, the stack and the heap. Both these areas are for dynamically allocating memory at runtime. The heap for dynamic storage durations, meaning lifetimes are managed explicitly. And the stack for automatic storage durations, meaning the lifetime is implicitly bound to the stack height.

The described memory layout allows referencing everything with addresses, also known as pointers. A function pointer is still just a pointer in the same memory region. A variable on the stack has an address and can be made into a pointer. And if you return this pointer and use it after the stack frame it belonged to was cleaned up, you get undefined behavior. Very easy mistake to make in C and very not fun to debug.

Outside this addressable space, there are only CPU registers. They live in a completely different namespace and have no address, hence cannot be accessed with a pointer. As a programmer, you need to use inline assembly if you want to interact with registers directly. But most of us are happily oblivious about registers in our day-to-day coding.

The Wasm abstract machine is different from what I just described. The linear addressable memory available to it is used only for what I previously called the heap. Everything else has no address, as far as the Wasm specification is concerned.

The addressable Wasm memory is usually called linear memory. It works as you would expect it to work. A pointer points to a location and data is loaded from there. The corresponding Wasm operations are load and store.

Local variables in Wasm aren’t stored on a stack, nor are they addressable with pointers. Rather, they have a separate namespace and are accessed by an index of available local variables in a given context.

For the abstract stack, the Wasm spec gives no way to access it other than using operations that implicitly use values on the stack. A bit like returning from a function in x86 will implicitly pop the frame pointer and the return address from the stack. But in Wasm, all operations manipulate the stack.

Function code, likewise, is stored somewhere inaccessible. All the Wasm program gets to see is the table of available functions to call. Calling a function is done by giving the index inside that table. And the runtime can be sure every function in the table has been properly type-checked already.

All of this to say, the Wasm abstract machine is very restrictive in how resources are accessed. That’s on the abstract machine level. On the machine code level, e.g. x86, all these resources still need to be mapped into the previously mentioned virtual address space and CPU registers.

But look, I already see the resting place for tonight’s camp. I wanted to talk a bit more about local variables in Wasm and how the Wasm stack can be mapped to x86 registers. But I shall stop myself from getting lost in details. Enough information for one day.

almost at the top

Review

Alright, we reached the first resting place on the journey. Let’s look back at the ascent we conquered today. Three landmarks stand out to me in particular.

We learned that the runtime needs to ensure things happen according to the Wasm spec.
The stack machine representation allows to quickly perform these checks and compile it to x86 or ARM, without too much overhead.
Memory management in Wasm divides user data, code, and the Wasm stack into different memory regions with different rules to access it safely.

And with that, we shall rest. We covered a good chunk of fundamentals today. We haven’t yet reached the secluded locations in the WebAssembly mountains I want to show you on this insider trip. But we are getting closer.

After the next leg, which covers more details on sandboxing, we will be surrounded by the recognizable peaks of WebAssembly. The closest is the “Wasm in the browser” mountain, which has plenty of unexpected turns on the way up.

view from the top