About the Go Compiler
Jan 2018 - Alex Alejandre

edit: a recent discussion which will shed much more light on this https://www.reddit.com/r/golang/comments/1994ols/the_go_compiler_needs_to_be_smarter_2020/

N.b. the following makes a mountain out of a molehill, as Go’s close enough 2x slower than C to optimal performance to require significant effort optimizing before the compiler blocks you. As it stands, most optimizations are a question of diminishing returns. It’s primarily a note to myself.

To avoid complexity and enable faster iterations, the Go team went with their own optimizer and code generator (and Plan 9 based assembler). Without investing much time in them, they’re now behind both LLVM and mature JITs. gccgo can sometimes optimize better than go, although gollvm is now dead.

While Go doesn’t focus on CPU-bound performance, it’s acceptable for IO-bound tasks. However the stdlib JSON decoder and regex engine are CPU-bound due to poor implementations (5x slower than JS).

Go does optimize for quick compile times, where further passes would only reduce the developer experience. (However the lack of expessive types, lifetimes or generics methods, type embedding, aliases switches, more than makes up for small possible changes like agressive inlining and constant detection.)

Benefits of Standard Approach

Go does not have different levels of optimization, lest bugs appear in a single version. A whole pathology of flags and versions could sprout, with patterns optimized for a single version. In practice, the optimal version will receive most developer attention.

Above all, the current compiler is simple, allowing quicker rewrites or modifications when they offer significant improvements outweighing the increased complexity. Vectorization has not passed this hurdle (libraries would benefit the most, but already use inline assembly.)

The Spec

The Go spec promises certain behaviors which fundementally prevent various optimizations. (Heavy use of interfaces also prevents inlining, which could theoretically allow for some optimizations, while JITs can devirtualize them live.)

Slice aliasing: In C (but not C++), restrict tells the compiler a memory range won’t overlap with others. Because only one pointer can write to the memory, the compiler can run wild with optimizations. In Rust, this is proven for non-cell slices (“mutable or shared but not both”) with no keyword needed. In Go, we can’t prove such things.

Null pointers: In C (and C++), to deref a null pointer’s undefined behavior. In Rust, to compile, code must prove dereferences aren’t null. As a result, compilers can optimize without considering this possibility. In Go, a program must panic ‘address operators’ in the spec so the compiler inserts checks wherever pointers are found, while unable to remove pointer indirections or reorder things.

Currently Implemented Optimizations

Engineering a Compiler - Torczon & Cooper Kent Dybvig’s Chez Scheme Catalogue of Optimizing Transformations - Allen, Cocke The following came out very early in Go’s life. Anyone who’s looked into optimizing compilers would be (and was) horrified.

  • don’t scan underlying buffers for slices, channels and maps (unless there are pointers) helping GC
  • loops where you 0 everything in a slice use memclr
  • short, simple funcs are inlined if:
    • less than 80 AST nodes
    • nothing complex like closure, defer, recover, select etc.
    • neither go:noinline nor go:uintptrescapes
  • gives up on escape analysis with e.g.: Not quite an optimization…
    • indirection like *p =
    • func calls
    • package boundaries
    • slice literals
  • don’t allocate when
    • converting []byte into string when comparing
    • converting string into []byte when ranging over
    • incl. when in maps
    • putting 0 width type in interface

Of course, many things came shortly thereafter.

The nature of packages fixes most of the Link Time Optimization problem (no cyclical dependencies). The linker generates DWARF info and removes deadcode. Here is a great article detailing the linker pre-Covid.

Now we functionally have a -o flag, wherein we pass a pprof profile which informs the compiler about realworld usage, enabling JIT-like optimizations based on real usage patterns. This however requires slight extra engineering.

My journey optimizing the Go compiler

What to Work On

In practice, the runtime dominates your typical program’s profile.

Since added in 1.20 Profile guided optimization outclasses most optimizations, but requires special tooling. Once the tooling appears, we should see significant gains.

Other Compilers

I believe this is a proprietary compiler.

TinyGo llgo GCCGo

See also:

Learn Go Assembly by Example Go uses its own Assembly, a quirk of its heritage, due to the implementor’s familiarity with the Plan 9 toolchain. The assembly language isn’t directly platform independent, because it has different .s files for different architectures with a .go file for the independent version.

From Rob Pike talking about Go Assembler the Go assembly allows them to feed in PDFs to support a given platform, still abstracting platforms away!

Interactive Go compiler which lets you play with the generated assembly.