2. Terminology¶

This document attempts to explain various terms and abbreviations often used in the MCI source code and documentation.

2.1. AA¶

Abbreviation for alias analysis. This is the technique of proving whether or not two pointers definitely, definitely not, or possibly point to the same memory location.

2.2. ALU¶

Abbreviation for arithmetic logic unit. This refers to the unit in a processor which performs basic arithmetic and bit-wise operations. It usually includes operations such as addition, subtraction, multiplication, and division, but also several bit-wise operations such as the bit-wise AND, OR, XOR, etc.

2.3. AOT¶

An abbreviation for ahead of time. It generally refers to either the technique of compiling code before program execution, or such a compiler itself.

2.4. AST¶

An abstract tree-based representation of source code. Most parsers emit an AST from every parsed document, as this is usually the easiest kind of data structure to work with.

2.5. BB¶

Abbreviation for basic block.

2.6. Basic block¶

A basic block (or just block) is a set of instructions which, in SSA form, contains a number of simple instructions terminated by a single terminator instruction. If one were to compare with the C programming language, a basic block can be considered a label which a goto statement can transfer control to.

2.7. CSE¶

Abbreviation for common sub-expression elimination. This is an optimization which eliminates duplicate computations in expressions. For instance, in x * y + x * y, the computation of x + y can be factored out to a variable z such that the expression can be rewritten as z + z, thereby avoiding doing the computation of x + y twice.

2.8. DCE¶

Abbreviation for dead code elimination. This is an optimization that attempts to remove code that is definitely unreachable or otherwise considered useless (i.e. has no impact on the program’s semantics). For instance:

x = 0;
// ...
if (x != 0)
{
    foo();
}
else
{
    bar();
}

It is trivial to discover that the true branch will never be taken. So, we optimize to:

x = 0;
// ...
bar();

Further optimization would remove x entirely.

2.9. EP¶

Abbreviation for entry point. An entry point of a main module is called upon startup and returns the exit code of the program.

2.10. FFI¶

Abbreviation for foreign function interface. It can either refer to the concept of calling a native function dynamically at runtime, or the actual action of doing so.

2.11. GC¶

An abbreviation for garbage collection (or garbage collector), which refers to the technique of using reachability analysis to determine whether memory should be freed, instead of placing this burden upon the programmer.

2.12. GC root¶

A GC root is a pointer which does not lie within the heap, and is used by the GC to start its reachability analysis from. This usually includes (but is not necessarily limited to) global fields, local registers, the program stack etc.

2.13. Heap¶

Refers to the data structure the operating system uses to manage its memory. In general, there are two heaps: The native heap and the managed heap. The former is what is usually accessed through LibC‘s malloc() and free() functions; the latter is the heap controlled by the GC.

2.14. IAL¶

Abbreviation for Intermediate Assembly Language. This is the IR used in the core of the MCI and is a four-address, linear representation.

It is usually in a static single assignment (SSA) form while in the analysis and optimization pipeline, but can also be in non-SSA form (for example, when doing native code generation or when executing in the interpreter).

2.15. Insn¶

Abbreviation for instruction.

2.16. Instr¶

Abbreviation for instruction.

2.17. IPA¶

Inter-procedural analysis. This is the practice of doing things like alias analysis and function inline cost analysis across function boundaries.

2.18. IPO¶

Inter-procedural optimization. This refers to optimizing across function boundaries, such as when inlining functions or doing global DCE.

2.19. IR¶

Abbreviation for intermediate representation. Computer programs are usually lowered to IRs to allow easier analysis and optimization for some specific tasks, but most importantly, in order to make native code generation easier.

Most IRs are in some kind of linear form, as it is hard to generate native code directly from a tree-based IR; linear code maps better to modern processors.

2.20. ISA¶

An abbreviation for instruction set architecture. This generally refers to the set of machine code instructions available in a processor architecture (and sometimes other features). It may also be used to describe the instruction set of IRs.

2.21. JIT¶

An abbreviation for just in time. It generally refers to either the technique of compiling code on demand, or such a compiler itself.

2.22. LTO¶

Link-time optimization. This is the practice of doing IPO across modules. As far as the MCI is concerned, this optimization comes for free, as all code must be available in IR form.

2.23. LibC¶

This is the standard library for the C programming language. It is typically exploited by many other languages, however, as it provides the easiest access to memory, I/O, and other such facilities which are very close to the operating system.

2.24. MCI¶

Abbreviation for Managed Compiler Infrastructure.

2.25. MEP¶

Abbreviation for module entry point. A module entry point is called once before any of the module’s code is executed.

2.26. MXP¶

Abbreviation for module exit point. A module’s exit point is called once when the program exits.

2.27. Main module¶

The main module of a program is the module that was passed to the virtual machine for execution.

2.28. PRE¶

Abbreviation for partial redundancy elimination. This is a form of CSE that tries to eliminate computations that are said to be partially redundant. For instance, consider this high-level code:

if (foo)
{
    x = y - 8;
}
else
{
    // ...
}
w = y - 8;

If foo is true, y - 8 is evaluated twice. This is clearly wasteful, so this code can be optimized to:

z = y - 8;
if (foo)
{
    x = z;
}
else
{
    // ...
}
w = z;

2.29. RTO¶

An abbreviation for RuntimeObject. Refers to the runtime format and layout of values in the MCI, which generally consists of a type pointer, GC bits, and the user data field.

2.30. RTV¶

An abbreviation for RuntimeValue. Refers to a rooted object that holds a reference to a managed object.

2.31. SCCP¶

Abbreviation for sparse conditional constant propagation. An optimization performed in SSA form. It is strictly more powerful than applying DCE and constant propagation in any order or number of repetitions.

2.32. SSA¶

Abbreviation for static single assignment. This is a form of IR where variables are only assigned once, and so-called phi functions are used to determine which variable should be used depending on where control flow came from.

SSA is mostly useful in analysis and optimization.

2.33. TEP¶

Abbreviation for thread entry point. A thread entry point of a module is called before a properly registered thread executes any code within it.

2.34. TXP¶

Abbreviation for thread exit point. A thread exit point of a module is called whenever a properly registered thread is exited.

2.35. TLS¶

Abbreviation for thread-local storage. This is a mechanism by which each thread in a program gets its own isolated version of a variable.

2.36. Target¶

Refers to a processor architecture that the MCI can compile code for (therefore, a target for code generation). Examples include x86, ARM, etc.

2.37. Terminator¶

A terminator is an instruction which, while code is in SSA form, indicates the end of a basic block. Only one terminator is allowed in a basic block, and it must appear as the last instruction.

Navigation