编译器基础
graph TB; SC([Source Code]) -- Scanner --> Tokens -- Parser --> AST -- Type Checker --> DA[Decorated AST] -- Translator --> IR -- Code Generator --> MC([Machine Code])
阶段 | 分析类型 | 工具 |
---|---|---|
Scanner | Lexical Analysis | Regular Expression |
Parser | Syntax Analysis | Context-Free Grammar |
Type Checker | Semantic Analysis | Attribute Grammar |
(After Translator) | Static Analysis |
AST
- high-level and closed to grammar structure
- usually language dependent
- suitable for fast type checking
- lack of control flow information
IR
- low-level and closed to machine code
- usually language independent
- compact and uniform
- contains control flow information
- usually considered as the basis for static analysis
IR
3-Address Code(3AC)
每个语句只出现3个地址(等号右边只出现一个操作符)
地址:变量、常量、编译器生成的临时变量
Soot(Jimple)
typed 3AC
“每个指令最多三个地址”的规则主要针对的是简单的算术和逻辑运算指令,而方法调用在 Jimple 中是一种特殊的、更复杂的指令,它本身就可以包含任意数量的参数。
Static Single Assignment (SSA)
All assignments in SSA are to variables with distinct names
- Give each definition a fresh name
- Propagate fresh name to subsequent uses
- Every variable has exactly one definition
- A special merge operator, (called phi-function), is introduced to select the values at merge nodes
- has the value x0 if the control flow passes through the true part of the conditional and the value x1 otherwise
Why SSA?
- Flow information is indirectly incorporated into the
unique variable names
- May help deliver some simpler analyses, e.g., flow-insensitive analysis gains partial precision of flow-sensitive analysis via SSA
- Define-and-Use pairs are explicit
Enable more effective data facts storage and propagation in
some on-demand tasks
- Some optimization tasks perform better on SSA (e.g., conditional constant propagation, global value numbering)
Why not SSA?
- SSA may introduce too many variables and phi-functions
- May introduce inefficiency problem when translating to machine code (due to copy operations)
How to build Basic Blocks?
INPUT: A sequence of three-address instructions of P
OUTPUT: A list of basic blocks of P
METHOD:
- 寻找P中的leaders
- 第一条指令是leader
- 所有条件或非条件跳转的目标是leader
- 条件或非条件跳转之后的第一条指令
- 构建基本块
- 每个leader到下一个leader之间的部分就是基本块