Sometimes, when searching for vulnerabilities, you come across protected PHP code. Often, it’s protected by commercial encoders. These encoders perform a straightforward task: they compile the source code into Zend Engine bytecode and then encode it. The obfuscation result looks something like this:
有时,在搜索漏洞时,您会遇到受保护的 PHP 代码。通常,它受到商业编码器的保护。这些编码器执行简单的任务:它们将源代码编译为 Zend Engine 字节码,然后对其进行编码。混淆结果看起来像这样:
Unfortunately, there are no free or open source tools that can decode PHP scripts protected by commercial encoders, just like there is no decompiler for the Zend VM opcodes.
不幸的是,没有免费或开源工具可以解码受商业编码器保护的 PHP 脚本,就像没有针对 Zend VM 操作码的反编译器一样。
Can we do something about that? Absolutely!
我们可以为此做点什么吗?绝对地!
Zend Engine 101 Zend 引擎 101
Zend Engine plays a key role in executing PHP scripts, acting as both a compiler and runtime environment. The execution of a PHP script starts with the source code being parsed and then transformed into an Abstract Syntax Tree (AST). Next, the AST is converted into opcodes for the Zend virtual machine, which are then executed.
Zend Engine 在执行 PHP 脚本中发挥着关键作用,充当编译器和运行时环境。 PHP 脚本的执行从源代码被解析开始,然后转换为抽象语法树 (AST)。接下来,AST 被转换为 Zend 虚拟机的操作码,然后执行。
As mentioned earlier, commercial encoders handle every step except the last one: the execution of opcodes. After compilation, the encoders don’t execute opcodes. Instead, they serialize them along with their metadata, encode the resulting blob, and encapsulate it in a bootstrap script. When executing the encoded script, the following reverse operations occur: blob decoding and deserialization, loading opcodes and their metadata into Zend VM, and finally, the opcode execution.
如前所述,商业编码器处理除最后一步之外的每一步:操作码的执行。编译后,编码器不执行操作码。相反,他们将它们与元数据一起序列化,对生成的 blob 进行编码,并将其封装在引导脚本中。执行编码脚本时,会发生以下反向操作:blob 解码和反序列化、将操作码及其元数据加载到 Zend VM 中,最后执行操作码。
Decoding and disassembling
解码与反汇编
Having understood the working principle of commercial encoders, our next step is to figure out how to extract opcodes and their metadata from an encoded blob. I’ll leave this challenge to you, dear reader, as the goal isn’t to take work away from developers of commercial encoders
了解商业编码器的工作原理后,我们的下一步是弄清楚如何从编码的 blob 中提取操作码及其元数据。亲爱的读者,我将这个挑战留给您,因为我们的目标不是夺走商业编码器开发人员的工作
In this study, I analyzed the loader of one of the most widely-used commercial encoders of PHP code. As a result, I created a tool that can extract (and disassemble) the Zend Engine opcodes from encoded PHP scripts. Below is a glimpse of how this tool works.
在这项研究中,我分析了最广泛使用的 PHP 代码商业编码器之一的加载器。因此,我创建了一个可以从编码的 PHP 脚本中提取(和反汇编)Zend Engine 操作码的工具。下面简要介绍了该工具的工作原理。
Source code: 源代码:
<?php
function get_hello_text($name) {
return "Hello, " . $name;
}
echo get_hello_text("PT SWARM");
The result of disassembling the encoded script:
反汇编编码脚本的结果:
The result of the disassembling is already sufficient to analyze and search for vulnerabilities. However, I wanted to make my job easier and finally get to something that would look more like regular PHP source code, so next came…
反汇编的结果已经足以分析和查找漏洞。然而,我想让我的工作变得更轻松,并最终得到看起来更像常规 PHP 源代码的东西,所以接下来……
Decompilation 反编译
Developing a decompiler for the Zend virtual machine is a tricky task. There are over 200 opcodes, making the development quite time-consuming. While I could have used Ghidra’s flexible SLEIGH specification language, as my colleagues did for V8, I was looking for a simpler solution… and I found it!
为 Zend 虚拟机开发反编译器是一项棘手的任务。操作码有200多个,开发相当耗时。虽然我可以使用 Ghidra 灵活的 SLEIGH 规范语言,就像我的同事为 V8 所做的那样,但我一直在寻找一个更简单的解决方案……并且我找到了!
AI 人工智能
Just for fun, I asked Microsoft’s Copilot chatbot to decompile the Zend Engine opcodes, and it nailed it!
只是为了好玩,我要求 Microsoft 的 Copilot 聊天机器人反编译 Zend Engine 操作码,它成功了!
However, that felt too easy, so I decided to try decompiling something more complex.
然而,这感觉太简单了,所以我决定尝试反编译一些更复杂的东西。
I took a random WordPress function, encoded it with a protector, then decoded and disassembled it. I sent the result to the chat with a request to convert the Zend Engine opcodes in the comment block to PHP code.
我随机选取了一个WordPress 函数,用保护器对其进行编码,然后对其进行解码和反汇编。我将结果发送到聊天室,并请求将注释块中的 Zend Engine 操作码转换为 PHP 代码。
I also took the RC4 function code and did the same thing by sending the following request: convert PHP opcodes in the comment block to PHP code.
我还获取了RC4 函数代码,并通过发送以下请求来执行相同的操作:将注释块中的 PHP 操作码转换为 PHP 代码。
Next, I ran the decompiled code to verify it.
接下来我运行反编译的代码来验证一下。
Impressive, but the quality of code decompilation using AI chatbots can vary significantly. In some cases, AI can achieve 100% accuracy by flawlessly restoring source code from opcodes. In other cases, the results are less accurate, leading to issues like skipping nested loops. Another challenge is the limit on the length of requests. When you send very long code for decompilation, a chatbot might struggle to complete the task due to limitations in handling large volumes of data. In such cases, you may need to break the code into smaller fragments or shorten the request, keeping only crucial fragments to be analyzed. Nevertheless, the results generated by AI simplify the reading of opcode listings, making this tool worthy of your time.
令人印象深刻,但使用人工智能聊天机器人的代码反编译质量可能存在很大差异。在某些情况下,人工智能可以通过从操作码完美地恢复源代码来实现 100% 的准确性。在其他情况下,结果不太准确,导致跳过嵌套循环等问题。另一个挑战是请求长度的限制。当您发送很长的代码进行反编译时,由于处理大量数据的限制,聊天机器人可能难以完成任务。在这种情况下,您可能需要将代码分解为更小的片段或缩短请求,仅保留要分析的关键片段。尽管如此,人工智能生成的结果简化了操作码列表的阅读,使这个工具值得您花时间。
原文始发于Nikita Petrov:From opcode to code: how AI chatbots can help with decompilation
转载请注明:From opcode to code: how AI chatbots can help with decompilation | CTF导航