Compiler Engineering In Practice - Part 2: Why is a compiler?

Ask most people what a compiler does, and you’ll get an answer about translating high-level code into machine instructions. Ask them why a compiler exists, and you’ll likely hear something about performance—getting code to run fast. This answer, while not entirely wrong, misses the forest for the trees. The real purpose of a compiler is far more interesting, and understanding it changes how you think about compiler design and language evolution.

Previous post: Part 1: What is a Compiler?

Core Thesis: Compilers as Velocity Tools

A compiler is fundamentally a software engineering velocity tool—its purpose is to help deliver features better and faster, creating more value for users.

This framing might seem reductive at first, but consider what actually matters in the software industry. Engineers ship products. Products require features. Features require engineering time. Anything that lets engineers ship more features, with fewer bugs, in less time, directly translates to value for society. A compiler sits at the foundation of this entire edifice. It’s not just translating code; it’s enabling a particular mode of working—a programming model—that determines how quickly and reliably engineers can build things. Every hour a compiler saves across thousands of developers compounds into something enormous.

The goal of a compiler is not performance directly; the productivity benefits of the programming model and system layering usually outweigh how close to optimal the compiler’s output is past a certain point.

This is perhaps the most counterintuitive claim here. Yes, compilers optimize code. Yes, performance matters. But once you’re generating code that’s “reasonably good”—say, within 2x of hand-tuned assembly for typical workloads—the incremental gains from squeezing out another 10% often pale in comparison to the productivity gains of a better programming model. Consider the transition from assembly to C: early C compilers generated notably worse code than expert assembly programmers. It didn’t matter. The productivity gains were so overwhelming that C took over systems programming anyway. The same story has played out repeatedly across computing history.

Developer Productivity Benefits

Increased developer velocity enables both more features (driving value creation) and more time for high-level performance optimizations.

There’s a lovely irony here. By accepting “good enough” low-level performance from your compiler, you often end up with better overall performance. Why? Because developer time is finite. When engineers aren’t fighting with manual memory layout or register allocation, they can spend that time on algorithmic improvements, better data structures, and architectural optimizations—the kinds of changes that yield 10x improvements rather than 10% improvements. A team using a productive language with a decent compiler will usually outperform a team hand-coding assembly, not despite the abstraction overhead, but because of the cognitive bandwidth it frees up.

Compiler-provided abstractions make it practical to use complex constructs; without them, developers might avoid or poorly implement such constructs due to difficulty.

Think about hash tables. In C++, you #include <unordered_map>, and you’re done. You get a reasonably efficient, well-tested implementation that handles all the edge cases. In assembly? You’re looking at days of work to implement something comparable, and you’ll probably get it wrong in subtle ways. The result is predictable: assembly programmers (if they actually tried to write whole applications) would avoid hash tables, even when they’d be the right tool for the job, or they implement simplistic versions that perform poorly. The compiler’s abstractions don’t just make code faster to write—they change what code gets written at all.

Significant engineering resources go into developer-time-saving tools like sanitizers.

The modern compiler is far more than an optimizer. Tools like AddressSanitizer, ThreadSanitizer, and UndefinedBehaviorSanitizer represent enormous engineering investments—tens of thousands of hours of work—all aimed at one goal: saving developer time. A memory corruption bug that might take days to track down manually can be caught instantly with the right sanitizer. This is the compiler ecosystem working as a velocity multiplier. It’s telling that some of the most impactful compiler work in recent years hasn’t been about generating faster code, but about helping developers find bugs faster.

Abstractions like inlining reduce errors, debug time, and implementation difficulty compared to manually writing equivalent low-level code.

Consider a simple example: checking if two bits are set in a flags variable. You could write if (AreBitsSet(flags, BIT_FOO | BIT_BAR)), or you could manually inline the bitmasking logic everywhere. The compiler will generate identical code for both—but the abstracted version is easier to read, easier to write correctly, and easier to modify later. Multiply this by thousands of similar micro-decisions throughout a codebase, and the compound effect is massive. The abstraction isn’t just syntactic sugar; it’s a defense against the constant low-level errors that plague manually optimized code.

The “Good Enough” Performance Philosophy

“Good enough performance” means that in 95–99% of cases, developers don’t need to drop to a lower-level, less productive programming model.

This is the real contract a compiler offers: write your code at a high level of abstraction, and the compiler will generate code that’s fast enough that you don’t need to care about the low-level details. “Fast enough” isn’t a precise threshold—it depends on your application, your latency requirements, your scale. But the key insight is that it’s a threshold, not a maximum. Once you’ve crossed it, additional compiler optimizations have diminishing returns compared to developer productivity. A compiler that lets you write natural, readable code and still hit your performance targets is far more valuable than one that generates marginally faster code but requires you to contort your source into optimizer-friendly shapes.

A compiler will never beat experts on a small piece of code, but will always beat experts on sufficiently large codebases due to human cognitive overhead when writing lower-level code.

This is a fundamental asymmetry that compiler engineers should internalize. Give an expert assembly programmer a 50-line hot loop and a week of time, and they’ll beat any compiler. Give them a 500,000-line codebase and a year, and they’ll produce something slower, buggier, and harder to maintain than the compiler’s output. The human brain simply can’t track all the interactions, all the micro-optimizations, all the subtle invariants across a large codebase. The compiler can. It doesn’t get tired, it doesn’t forget, and it applies every optimization it knows everywhere it’s applicable. At scale, mechanical consistency beats sporadic brilliance.

The compiler is the only place that can make many small improvements everywhere across a codebase.

When a compiler learns a new optimization—say, a better way to lower a particular pattern to machine code—that improvement instantly applies to every instance of that pattern in every program compiled. No human could achieve this. No human could even find all the instances. This is the leverage that compiler work provides: improvements that scale across millions of lines of code, across thousands of projects, automatically and invisibly. It’s one of the few places in software engineering where you can genuinely achieve multiplicative impact.

Hot Spots and Escape Hatches

The hot spots principle means it’s acceptable to need to rewrite a small percentage of code for performance; inline assembly and layered compilers solve this.

Real-world programs follow a power law: a small fraction of the code—typically around 5%—accounts for the vast majority of execution time. This is liberating. It means you don’t need a compiler that generates perfect code everywhere. You need a compiler that generates good-enough code everywhere and provides mechanisms to hand-optimize the hot spots. This is why inline assembly has existed since the earliest days of high-level languages, and why it’s still used today. The layered approach—high-level language for most code, escape hatches for critical sections—has proven remarkably durable because it matches the reality of how programs actually execute.

A compiler achieving 70% of peak performance with good escape hatches is better than one achieving 75% without them—this comes from the “fractal nature of performance in software.”

This point deserves emphasis because it’s so often misunderstood. Imagine two AI compilers: one generates kernels that run at 75% of theoretical peak, while the other generates kernels at 70% of peak but provides clean interfaces for dropping down to hand-written kernels. The second compiler is more valuable, even though its default output is slower. Why? Because in practice, you’ll need to push past the compiler’s limits for your most critical operations. The 75%-compiler leaves you stuck; the 70%-compiler lets you reach 90%+ where it matters. Performance in real systems has a fractal quality—there’s always another level of optimization available if you need it—and compiler design must accommodate this reality.

Modularity, IR Design, and Abstraction Value

Modularity and proper IR layering enable escape hatches and allow new interfaces to be built on top.

The internal structure of a compiler matters far beyond its immediate output quality. A well-layered compiler—with clean intermediate representations at each level—becomes a platform for innovation. Consider NVIDIA’s stack: SASS (machine code) → PTX (virtual ISA) → CUDA (and now cuTile). Each layer provides a stable interface for the layers above and below. When NVIDIA releases new hardware, they update the SASS layer; when they want new programming abstractions, they build on PTX. This modularity isn’t just elegant engineering—it’s what allows the escape hatches to exist. You can drop down to PTX when CUDA isn’t enough, or down to inline SASS when PTX isn’t enough.

Proper IR layering allows retargeting by swapping out components below a particular abstraction layer; choosing the right layer is critical.

When designing a compiler’s intermediate representations, you’re implicitly deciding how the compiler can evolve. Build at too high a level of abstraction, and you’ll need to reinvent complex lowering logic for each new target. Build at too low a level, and your IR will encode assumptions about particular hardware that make it impossible to efficiently target new architectures. The art of compiler design lies in finding the right level—abstract enough to be retargetable, concrete enough to be efficiently implementable. Get this wrong, and you’ve painted yourself into a corner. Get it right, and your compiler becomes a living system that can grow with the industry.

Building the right abstractions is extremely valuable—CUDA is attributed with a large fraction of NVIDIA’s market value; similarly C/C++.

Abstractions aren’t just implementation details; they can be trillion-dollar assets. CUDA’s programming model—not the hardware, not the drivers, but the abstract interface for GPU computing—is arguably responsible for a substantial fraction of NVIDIA’s market dominance. It’s what makes the GPU ecosystem sticky, what all the libraries are written against, what all the developers know. Similarly, C and C++ shaped decades of systems programming. The compiler isn’t just generating code; it’s defining the vocabulary in which entire industries think about their problems. Getting these abstractions right is one of the highest-leverage activities in all of computing.

Stay tuned for future parts of this series. Next time, we’ll talk about why the Dragon Book is outdated – modern compiler engineering is fundamentally different from what is presented there, especially in this age of compilers for AI programs.

First Thoughts on AI Coding

Blog Archive

Archive of all previous blog posts