Will LLVM provide the missing glue between the languages?

The LLVM project started in 2000 at the University of Illinois at Urbana–Champaign, under the direction of Vikram Adve and Chris Lattner. LLVM was originally developed as a research infrastructure to investigate dynamic compilation techniques for static and dynamic programming languages. In 2005, Apple Inc. hired Lattner and formed a team to work on the LLVM system for various uses within Apple’s development systems.

The LLVM was adopted by many companies to build their tools, even Microsoft chose to use Clang as compiler for Visual Studio. Also they invest also the possibility to provides a .Net compiler based on LLVM. For our tool CppDepend we didn’t find better than Clang to use it as front end parser.

What makes LLVM so special?

The interesting idea behind LLVM is the use of the LLVM Intermediate Representation (IR), it’s like the bytecode for java.

LLVM IR is designed to host mid-level analyses and transformations that you find in the optimizer section of a compiler. It was designed with many specific goals in mind, including supporting lightweight runtime optimizations, cross-function/interprocedural optimizations, whole program analysis, and aggressive restructuring transformations, etc. The most important aspect of it, though, is that it is itself defined as a first class language with well-defined semantics.

With this design we can reuse a big part of the compiler to create other compilers, you can for example just change the front end part to treat other languages.

Another interesting benefit of the IR is the possibility to have a common optimizer for all languages. To explain this phase I can’t say better than Chris Lattner the father of LLVM in this post:

“To give some intuition for how optimizations work, it is useful to walk through some examples. There are lots of different kinds of compiler optimizations, so it is hard to provide a recipe for how to solve an arbitrary problem. That said, most optimizations follow a simple three-part structure:

Look for a pattern to be transformed.
Verify that the transformation is safe/correct for the matched instance.
Do the transformation, updating the code.

The optimizer reads LLVM IR in, chews on it a bit, then emits LLVM IR, which hopefully will execute faster. In LLVM (as in many other compilers) the optimizer is organized as a pipeline of distinct optimization passes each of which is run on the input and has a chance to do something. Common examples of passes are the inliner (which substitutes the body of a function into call sites), expression reassociation, loop invariant code motion, etc. Depending on the optimization level, different passes are run: for example at -O0 (no optimization) the Clang compiler runs no passes, at -O3 it runs a series of 67 passes in its optimizer (as of LLVM 2.8).

They are other benefits of using the common LLVM IR. The glue between the languages is another big possible feature of using it.

Currently many compilers are based on LLVM and for many others it’s planned to use the LLVM infrastructure. Why not provides a common and easy way to communicate between code developed with different languages? Currently there’s no common way to collaborate in process between them, and for many cases the task is very complicated. The good news is the fact that LLVM is the most advanced technology to acheive this goal.

Maybe in the near future the LLVM team will provides a spec to develop the glue between the languages based on LLVM.

Leave a Reply Cancel reply