编译器基础

graph TB;
  SC([Source Code]) -- Scanner --> Tokens -- Parser --> AST -- Type Checker --> DA[Decorated AST] -- Translator --> IR -- Code Generator --> MC([Machine Code])
阶段分析类型工具
ScannerLexical AnalysisRegular Expression
ParserSyntax AnalysisContext-Free Grammar
Type CheckerSemantic AnalysisAttribute Grammar
(After Translator)Static Analysis

AST

  • high-level and closed to grammar structure
  • usually language dependent
  • suitable for fast type checking
  • lack of control flow information

IR

  • low-level and closed to machine code
  • usually language independent
  • compact and uniform
  • contains control flow information
  • usually considered as the basis for static analysis

IR

3-Address Code(3AC)

每个语句只出现3个地址(等号右边只出现一个操作符)

地址:变量、常量、编译器生成的临时变量

Soot(Jimple)

typed 3AC

“每个指令最多三个地址”的规则主要针对的是简单的算术和逻辑运算指令,而方法调用在 Jimple 中是一种特殊的、更复杂的指令,它本身就可以包含任意数量的参数。

Static Single Assignment (SSA)

All assignments in SSA are to variables with distinct names

  • Give each definition a fresh name
  • Propagate fresh name to subsequent uses
  • Every variable has exactly one definition
  • A special merge operator, (called phi-function), is introduced to select the values at merge nodes
  • has the value x0 if the control flow passes through the true part of the conditional and the value x1 otherwise

Why SSA?

  • Flow information is indirectly incorporated into the unique variable names
    • May help deliver some simpler analyses, e.g., flow-insensitive analysis gains partial precision of flow-sensitive analysis via SSA
  • Define-and-Use pairs are explicit Enable more effective data facts storage and propagation in some on-demand tasks
    • Some optimization tasks perform better on SSA (e.g., conditional constant propagation, global value numbering)

Why not SSA?

  • SSA may introduce too many variables and phi-functions
  • May introduce inefficiency problem when translating to machine code (due to copy operations)

How to build Basic Blocks?

INPUT: A sequence of three-address instructions of P

OUTPUT: A list of basic blocks of P

METHOD:

  1. 寻找P中的leaders
  • 第一条指令是leader
  • 所有条件或非条件跳转的目标是leader
  • 条件或非条件跳转之后的第一条指令
  1. 构建基本块
  • 每个leader到下一个leader之间的部分就是基本块