2. Terminology

This document attempts to explain various terms and abbreviations often used in the MCI source code and documentation.

2.1. AA

Abbreviation for alias analysis. This is the technique of proving whether or not two pointers definitely, definitely not, or possibly point to the same memory location.

2.2. ALU

Abbreviation for arithmetic logic unit. This refers to the unit in a processor which performs basic arithmetic and bit-wise operations. It usually includes operations such as addition, subtraction, multiplication, and division, but also several bit-wise operations such as the bit-wise AND, OR, XOR, etc.

2.3. AOT

An abbreviation for ahead of time. It generally refers to either the technique of compiling code before program execution, or such a compiler itself.

2.4. AST

An abstract tree-based representation of source code. Most parsers emit an AST from every parsed document, as this is usually the easiest kind of data structure to work with.

2.5. BB

Abbreviation for basic block.

2.6. Basic block

A basic block (or just block) is a set of instructions which, in SSA form, contains a number of simple instructions terminated by a single terminator instruction. If one were to compare with the C programming language, a basic block can be considered a label which a goto statement can transfer control to.

2.7. CSE

Abbreviation for common sub-expression elimination. This is an optimization which eliminates duplicate computations in expressions. For instance, in x * y + x * y, the computation of x + y can be factored out to a variable z such that the expression can be rewritten as z + z, thereby avoiding doing the computation of x + y twice.

2.8. DCE

Abbreviation for dead code elimination. This is an optimization that attempts to remove code that is definitely unreachable or otherwise considered useless (i.e. has no impact on the program’s semantics). For instance:

x = 0;
// ...
if (x != 0)
{
    foo();
}
else
{
    bar();
}

It is trivial to discover that the true branch will never be taken. So, we optimize to:

x = 0;
// ...
bar();

Further optimization would remove x entirely.

2.9. EP

Abbreviation for entry point. An entry point of a main module is called upon startup and returns the exit code of the program.

2.10. FFI

Abbreviation for foreign function interface. It can either refer to the concept of calling a native function dynamically at runtime, or the actual action of doing so.

2.11. GC

An abbreviation for garbage collection (or garbage collector), which refers to the technique of using reachability analysis to determine whether memory should be freed, instead of placing this burden upon the programmer.

2.12. GC root

A GC root is a pointer which does not lie within the heap, and is used by the GC to start its reachability analysis from. This usually includes (but is not necessarily limited to) global fields, local registers, the program stack etc.

2.13. Heap

Refers to the data structure the operating system uses to manage its memory. In general, there are two heaps: The native heap and the managed heap. The former is what is usually accessed through LibC‘s malloc() and free() functions; the latter is the heap controlled by the GC.

2.14. IAL

Abbreviation for Intermediate Assembly Language. This is the IR used in the core of the MCI and is a four-address, linear representation.

It is usually in a static single assignment (SSA) form while in the analysis and optimization pipeline, but can also be in non-SSA form (for example, when doing native code generation or when executing in the interpreter).

2.15. Insn

Abbreviation for instruction.

2.16. Instr

Abbreviation for instruction.

2.17. IPA

Inter-procedural analysis. This is the practice of doing things like alias analysis and function inline cost analysis across function boundaries.

2.18. IPO

Inter-procedural optimization. This refers to optimizing across function boundaries, such as when inlining functions or doing global DCE.

2.19. IR

Abbreviation for intermediate representation. Computer programs are usually lowered to IRs to allow easier analysis and optimization for some specific tasks, but most importantly, in order to make native code generation easier.

Most IRs are in some kind of linear form, as it is hard to generate native code directly from a tree-based IR; linear code maps better to modern processors.

2.20. ISA

An abbreviation for instruction set architecture. This generally refers to the set of machine code instructions available in a processor architecture (and sometimes other features). It may also be used to describe the instruction set of IRs.

2.21. JIT

An abbreviation for just in time. It generally refers to either the technique of compiling code on demand, or such a compiler itself.

2.22. LTO

Link-time optimization. This is the practice of doing IPO across modules. As far as the MCI is concerned, this optimization comes for free, as all code must be available in IR form.

2.23. LibC

This is the standard library for the C programming language. It is typically exploited by many other languages, however, as it provides the easiest access to memory, I/O, and other such facilities which are very close to the operating system.

2.24. MCI

Abbreviation for Managed Compiler Infrastructure.

2.25. MEP

Abbreviation for module entry point. A module entry point is called once before any of the module’s code is executed.

2.26. MXP

Abbreviation for module exit point. A module’s exit point is called once when the program exits.

2.27. Main module

The main module of a program is the module that was passed to the virtual machine for execution.

2.28. PRE

Abbreviation for partial redundancy elimination. This is a form of CSE that tries to eliminate computations that are said to be partially redundant. For instance, consider this high-level code:

if (foo)
{
    x = y - 8;
}
else
{
    // ...
}
w = y - 8;

If foo is true, y - 8 is evaluated twice. This is clearly wasteful, so this code can be optimized to:

z = y - 8;
if (foo)
{
    x = z;
}
else
{
    // ...
}
w = z;

2.29. RTO

An abbreviation for RuntimeObject. Refers to the runtime format and layout of values in the MCI, which generally consists of a type pointer, GC bits, and the user data field.

2.30. RTV

An abbreviation for RuntimeValue. Refers to a rooted object that holds a reference to a managed object.

2.31. SCCP

Abbreviation for sparse conditional constant propagation. An optimization performed in SSA form. It is strictly more powerful than applying DCE and constant propagation in any order or number of repetitions.

2.32. SSA

Abbreviation for static single assignment. This is a form of IR where variables are only assigned once, and so-called phi functions are used to determine which variable should be used depending on where control flow came from.

SSA is mostly useful in analysis and optimization.

2.33. TEP

Abbreviation for thread entry point. A thread entry point of a module is called before a properly registered thread executes any code within it.

2.34. TXP

Abbreviation for thread exit point. A thread exit point of a module is called whenever a properly registered thread is exited.

2.35. TLS

Abbreviation for thread-local storage. This is a mechanism by which each thread in a program gets its own isolated version of a variable.

2.36. Target

Refers to a processor architecture that the MCI can compile code for (therefore, a target for code generation). Examples include x86, ARM, etc.

2.37. Terminator

A terminator is an instruction which, while code is in SSA form, indicates the end of a basic block. Only one terminator is allowed in a basic block, and it must appear as the last instruction.