窺孔優化

窺孔優化是一種優化編譯器技術，在編譯的後面階段，尋找編譯器生成的特定的指令序列，將該序列替換成性能更好的等效指令序列。該算法的可視範圍極小，往往只能同時窺看幾條指令，故名曰窺孔優化（指其猶如通過窺孔觀察程序）。被優化掉的這少量指令序列被稱為窺視孔或窗口。^[1]

例如：

對於指令序列：將寄存器A壓入堆疊，然後立即將值彈出存回寄存器A的指令序列，窺孔優化刪除這兩條指令。
對於指令A加到A上，窺孔優化替換為算術左移。
對於浮點寄存器乘以8，窺孔優化會將浮點寄存器的指數部分加上3。
對於數組尋址操作將索引乘以4，將結果與基地址相加以獲得指針值，然後解引用該指針，窺孔優化可能會使用一種硬件尋址模式，通過一條指令實現相同的結果。

1965年William Marshall McKeeman提出窺孔優化一詞。^[2]

替換規則[編輯]

窺孔優化中常用的技術：^[3]

空序列——刪除無用的操作
組合操作——用一項等價操作替換多項操作
代數定律 – 使用代數定律來簡化或重新排序指令
特殊情況指令 – 使用為特殊操作數情況設計的指令
地址模式操作 – 使用地址模式來簡化代碼

還可以有其他類型的窺視孔優化。

例子[編輯]

用較快的指令替換較慢的指令[編輯]

以下為Java字節碼

...
aload 1
aload 1
mul
...

可以替換為

...
aload 1
dup
mul
...

與大多數窺孔優化一樣，這種優化對指令的效率做出了某些假設。例如，在這種情況下，假設dup操作（複製並推送到棧頂）比操作aload X（加載標識為X的局部變量並將其推送到棧上）更有效。

刪除冗餘代碼[編輯]

另一個例子是消除冗餘load stores.

 a = b + c;
 d = a + e;

直接實現為

MOV b, R0  ; Copy b to the register
ADD c, R0  ; Add  c to the register, the register is now b+c
MOV R0, a  ; Copy the register to a
MOV a, R0  ; Copy a to the register
ADD e, R0  ; Add  e to the register, the register is now a+e [(b+c)+e]
MOV R0, d  ; Copy the register to d

但可以優化為

MOV b, R0  ; Copy b to the register
ADD c, R0  ; Add c to the register, which is now b+c (a)
MOV R0, a  ; Copy the register to a
ADD e, R0  ; Add e to the register, which is now b+c+e [(a)+e]
MOV R0, d  ; Copy the register to d

刪除冗餘棧指令[編輯]

如果編譯器在調用子例程之前將寄存器保存在棧上並在返回時恢復它們，則對子例程的連續調用可能會產生冗餘棧指令。

假設編譯器為每個過程調用生成以下Z80指令：

 PUSH AF
 PUSH BC
 PUSH DE
 PUSH HL
 CALL _ADDR
 POP HL
 POP DE
 POP BC
 POP AF

如果有兩個連續的子例程調用，它們將如下所示：

 PUSH AF
 PUSH BC
 PUSH DE
 PUSH HL
 CALL _ADDR1
 POP HL
 POP DE
 POP BC
 POP AF
 PUSH AF
 PUSH BC
 PUSH DE
 PUSH HL
 CALL _ADDR2
 POP HL
 POP DE
 POP BC
 POP AF

對於相同的寄存器，POP regs 後跟 PUSH 的序列通常是多餘的。在冗餘的情況下，窺孔優化將刪除這些指令。在示例中，這將導致另一個冗餘的 POP/PUSH 對出現在窺視孔中，並且這些將依次被刪除。假設子程序 _ADDR2 不依賴於先前的寄存器值，則刪除上例中的所有冗餘代碼最終將留下以下代碼：

 PUSH AF
 PUSH BC
 PUSH DE
 PUSH HL
 CALL _ADDR1
 CALL _ADDR2
 POP HL
 POP DE
 POP BC
 POP AF

實現[編輯]

現代編譯器經常使用模式匹配算法來實現窺孔優化。^[4]

參見[編輯]

參考文獻[編輯]

^ Muchnick, Steven Stanley. Advanced Compiler Design and Implementation. Academic Press / Morgan Kaufmann. 1997-08-15 [2023-08-22]. ISBN 978-1-55860-320-2. （原始內容存檔於2023-08-22）.
^ McKeeman, William Marshall. Peephole optimization. Communications of the ACM. July 1965, 8 (7): 443–444. S2CID 9529633. doi:10.1145/364995.365000 .
^ Fischer, Charles N.; Cytron, Ron K.; LeBlanc, Jr., Richard J. Crafting a Compiler (PDF). Addison-Wesley. 2010 [2018-07-02]. ISBN 978-0-13-606705-4. （原始內容 (PDF)存檔於3 July 2018）.
^ Aho, Alfred Vaino; Lam, Monica Sin-Ling; Sethi, Ravi; Ullman, Jeffrey David. Chapter 8.9.2 Code Generation by Tiling an Input Tree. Compilers – Principles, Techniques, & Tools (PDF) 2. Pearson Education. 2007: 540 [2018-07-02]. （原始內容存檔 (PDF)於2018-06-10）.

外部連結[編輯]

[Muchnick_1997-1] Muchnick, Steven Stanley. Advanced Compiler Design and Implementation. Academic Press / Morgan Kaufmann. 1997-08-15 [2023-08-22]. ISBN 978-1-55860-320-2. （原始內容存檔於2023-08-22）.

[McKeeman_1965-2] McKeeman, William Marshall. Peephole optimization. Communications of the ACM. July 1965, 8 (7): 443–444. S2CID 9529633. doi:10.1145/364995.365000 .

[Fischer_2010-3] Fischer, Charles N.; Cytron, Ron K.; LeBlanc, Jr., Richard J. Crafting a Compiler (PDF). Addison-Wesley. 2010 [2018-07-02]. ISBN 978-0-13-606705-4. （原始內容 (PDF)存檔於3 July 2018）.

[Aho_2007-4] Aho, Alfred Vaino; Lam, Monica Sin-Ling; Sethi, Ravi; Ullman, Jeffrey David. Chapter 8.9.2 Code Generation by Tiling an Input Tree. Compilers – Principles, Techniques, & Tools (PDF) 2. Pearson Education. 2007: 540 [2018-07-02]. （原始內容存檔 (PDF)於2018-06-10）.

[1]

[2]

[3]

[4]