Python310新特性：Structural Pattern Matching在VM虚拟机逆向中的妙用

逆向病毒分析 9个月前 admin

35 0 0

一

前言

这种写法最初是在2022-GoogleCTF-eldar, 国外的DiceGang的hgarrereyn师傅的wp:https://ctf.harrisongreen.me/2022/googlectf/eldar/中首次用到。也是用于解析虚拟机, 不过是用于解析的ELF metadata-driven turing weird machine。

然后后来国内的2022-强网杯中的deeprev又将这个ELF metadata-driven turing weird machine出了一次, 我也用其来写过这个重定位机的解析,确实效果非常好，可以说毫不夸张像魔法一样。

当时就在Todolist中写道，用Structural Pattern Matching这种新特性去写下正常虚拟机的解析, 肯定属于是轻轻松松。后续工作上的事情就放下了没去完成，在我的Todolist中吃灰了接近一年，这一年都在被工作推着走，每天就像机器人一样去执行自己头天写的指令，记忆好像也变差了，经常忘事情，年末项目交付了一些了才有时间弄些自己的，创业之路真的很难。

言归正传, 后续在dicectf-2022-breach这道题的wp:https://github.com/reductor/dice-ctf-2022-breach-writeup中，被正式用于了解析常规虚拟机。

直至放到了今日，才回来写，其实虚拟机解析之前我在之前已经发过不少。总结来说，这种方法属于是disassembler的升级版, 远优于之前发的disassembler，你说它优于decompiler吗？我无法给出一个肯定答案，毕竟decompiler属于一种抽象为高级语言的思路。

二

python310 Structural Pattern Matching

Learn Structural Pattern Matching

Structural Pattern Matching介绍

PEP 634 – Structural Pattern Matching: Specification（https://peps.python.org/pep-0634/）：介绍 match 语法和支持的模式

PEP 635 – Structural Pattern Matching: Motivation and Rationale（https://peps.python.org/pep-0635/）：解释语法这么设计的理由

PEP 636 – Structural Pattern Matching: Tutorial（https://peps.python.org/pep-0636/）：一个教程，介绍概念、语法和语义

match patterns：

Mapping patterns: match mapping structures like dictionaries.
Sequence patterns: match sequence structures like tuples and lists.
Capture patterns: bind values to names.
AS patterns: bind the value of subpatterns to names.
OR patterns: match one of several different subpatterns.
Wildcard patterns: match anything.
Class patterns: match class structures.
Value patterns: match values stored in attributes.
Literal patterns: match literal values.

Capture patterns（捕捉模式）

匹配一个模式，并绑定到一个name：

def sum_list(numbers):
    match numbers:
        case []: # 匹配空列表 
            return 0
        case [first, *rest]:# sequence pattern, 由两个capture pattern 组成的 sequence pattern。*rest 匹配剩下的所有元素
            return first + sum_list(rest)

def average(*args):
    match args:
        case [x, y]:           # captures the two elements of a sequence
            return (x + y) / 2
        case [x]:              # captures the only element of a sequence
            return x
        case []:
            return 0
        case a:                # captures the entire sequence
            return sum(a) / len(a)

guards（向模式添加条件）

用来进一步限制匹配模式，如下：

# 从小到大排序
def sort(seq):
    match seq:
        case [] | [_]:   # 匹配空序列[] 或者 非空列表中的任何单个元素[_]
            return seq
        case [x, y] if x <= y:
            return seq
        case [x, y]:
            return [y, x]
        case [x, y, z] if x <= y <= z:
            return seq
        case [x, y, z] if x >= y >= z:
            return [z, y, x]
        case [p, *rest]:
            a = sort([x for x in rest if x <= p])     # 比p小的去排序
            b = sort([x for x in rest if p < x])      # 比p大的去排序
            return a + [p] + b

AS Patterns（as模式）

给限制条件取别名，使其能够与bind name一起工作。

子模式在 match 语法里面是可以灵活组合的。

In : def as_pattern(obj):
...:     match obj:
...:         case str() as s:
...:             print(f'Got str: {s=}')
...:         case [0, int() as i]:
...:             print(f'Got int: {i=}')
...:         case [tuple() as tu]:
...:             print(f'Got tuple: {tu=}')
...:         case list() | set() | dict() as iterable:
...:             print(f'Got iterable: {iterable=}')
...:
...:

In : as_pattern('sss')
Got str: s='sss'

In : as_pattern([0, 1])
Got int: i=1

In : as_pattern([(1,)])
Got tuple: tu=(1,)

In : as_pattern([1, 2, 3])
Got iterable: iterable=[1, 2, 3]

In : as_pattern({'a': 1})
Got iterable: iterable={'a': 1}

def simplify_expr(tokens):
    match tokens:
        case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
            return simplify_expr(expr)
        case [0, ('+'|'-') as op, right]:
            return UnaryOp(op, right)
        case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
            return Num(left + right)
        case [(int() | float()) as value]:
            return Num(value)

OR Patterns（或模式）

第一种写法，用逗号分隔：

case 401, 403, 404:
    print("Some HTTP error")

第二种写法与C语言类似：

case 401:
case 403:
case 404:
    print("Some HTTP error")

第三种写法：

case in 401, 403, 404:
    print("Some HTTP error")

第四种写法：

case ("a"|"b"|"c"):

第五种写法：

case ("a"|"b"|"c") as letter:

Literal Patterns（字面量模式）

使用 Python 自带的基本数据结构，如字符串、数字、布尔值和 None等。

match number:
    case 0:
        print('zero')
    case 1:
        print('one')
    case 2:
        print('two')

def simplify(expr):
    match expr:
        case ('+', 0, x):   # x + 0
            return x
        case ('+' | '-', x, 0):  # x +- 0
            return x
        case ('and', True, x):   # True and x
            return x
        case ('and', False, x):
            return False
        case ('or', False, x):
            return x
        case ('or', True, x):
            return True
        case ('not', ('not', x)):
            return x
    return expr

Wildcard Pattern（通配符模式）

Wildcard Pattern 是一种特殊的 capture pattern，它接收任何值，但是不将该值绑定到任何一个变量（其实就是忽略不关心的位置）。

def is_closed(sequence):
    match sequence:
        case [_]:               # any sequence with a single element
            return True
        case [start, *_, end]:  # a sequence with at least two elements
            return start == end
        case _:                 # anything
            return False

Value Patterns（值模式）

这种模式主要匹配常量或者 enum 模块的枚举值：

In : class Color(Enum):
...:     RED = 1
...:     GREEN = 2
...:     BLUE = 3
...:

In : class NewColor:
...:     YELLOW = 4
...:

In : def constant_value(color):
...:     match color:
...:         case Color.RED:
...:             print('Red')
...:         case NewColor.YELLOW:
...:             print('Yellow')
...:         case new_color:
...:             print(new_color)
...:

In : constant_value(Color.RED)  # 匹配第一个case
Red

In : constant_value(NewColor.YELLOW)  # 匹配第二个case
Yellow

In : constant_value(Color.GREEN)  # 匹配第三个case
Color.GREEN

In : constant_value(4)  # 常量值一样都匹配第二个case
Yellow

In : constant_value(10)  # 其他常量
10

这里注意，因为 case 具有绑定的作用，所以不能直接使用 YELLOW 这种常量，例如下面这样:
YELLOW = 4

def constant_value(color):
    match color:
        case YELLOW:
            print('Yellow')
# 这样语法是错误的

就是在模式中使用其他变量的值，那么使用的其他变量与 capture 模式的绑定名如何区分呢？用 “.” 区分。

目前只能使用带 ‘.’ 的常量。

class Codes:
    SUCCESS = 200
    NOT_FOUND = 404

def handle(retcode):
    match retcode:
        case Codes.SUCCESS:
            print('success')
        case Codes.NOT_FOUND:
            print('not found')
        case _:
            print('unknown')

Sequence Patterns（序列模式）

可以在 match 里使用列表或者元组格式的结果。

不区分 [a, b, c], (a, b, c) 和 a, b, c，它们是等价的，若要明确判断类型则需要 list([a, b, c])。

加星号的模式会匹配任意长度的元素，例如 (*, 3, *), 匹配任何含有 3 的列表。
不会迭代整个迭代器，所有的元素以下标和切片的形式访问。

In : def sequence(collection):
...:     match collection:
...:         case 1, [x, *others]:
...:             print(f"Got 1 and a nested sequence: {x=}, {others=}")
...:         case (1, x):
...:             print(f"Got 1 and {x}")
...:         case [x, y, z]:
...:             print(f"{x=}, {y=}, {z=}")
...:

In : sequence([1])

In : sequence([1, 2])
Got 1 and 2

In : sequence([1, 2, 3])
x=1, y=2, z=3

In : sequence([1, [2, 3]])
Got 1 and a nested sequence: x=2, others=[3]

In : sequence([1, [2, 3, 4]])
Got 1 and a nested sequence: x=2, others=[3, 4]

In : sequence([2, 3])

In : sequence((1, 2))
Got 1 and 2

Mapping Patterns（映射模式）

为了效率，key 必须是常量(literals、value patterns)

其实就是 case 后支持使用字典做匹配。

In : def mapping(config):
...:     match config:
...:         case {'sub': sub_config, **rest}:
...:             print(f'Sub: {sub_config}')
...:             print(f'OTHERS: {rest}')
...:         case {'route': route}:
...:             print(f'ROUTE: {route}')
...:

In : mapping({})

In : mapping({'route': '/auth/login'})
ROUTE: /auth/login

# 匹配有sub键的字典，值绑定到sub_config上，字典其他部分绑定到rest上
In : mapping({'route': '/auth/login', 'sub': {'a': 1}})
Sub: {'a': 1}
OTHERS: {'route': '/auth/login'}

def change_red_to_blue(json_obj):
    match json_obj:
        case { 'color': ('red' | '#FF0000') }:
            json_obj['color'] = 'blue'
        case { 'children': children }:
            for child in children:
                change_red_to_blue(child)

Class Patterns（类模式）

Class Patterns 主要实现两个目标：检查对象是某个类的实例、从对象的特定属性中提取数据。

# case 后支持任何对象做匹配。我们先来一个错误的示例:

In : class Point:
...:     def __init__(self, x, y):
...:         self.x = x
...:         self.y = y
...:

In : def class_pattern(obj):
...:     match obj:
...:         case Point(x, y):
...:             print(f'Point({x=},{y=})')
...:

In : class_pattern(Point(1, 2))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [], in <cell line: 1>()
----> 1 class_pattern(Point(1, 2))

Input In [], in class_pattern(obj)
      1 def class_pattern(obj):
      2     match obj:
----> 3         case Point(x, y):
      4             print(f'Point({x=},{y=})')

TypeError: Point() accepts 0 positional sub-patterns (2 given)

# 这是因为对于匹配来说， 位置需要确定 ，所以需要使用位置参数来标识:

In : def class_pattern(obj):
...:     match obj:
...:         case Point(x=1, y=2):
...:             print(f'match')
...:

In : class_pattern(Point(1, 2))
match

# 另外一个解决这种自定义类不用位置参数的匹配方案，使用 __match_args__ 返回一个位置参数的数组，
# 就像这样:
In : class Point:
...:     __match_args__ = ('x', 'y')
...:
...:     def __init__(self, x, y):
...:         self.x = x
...:         self.y = y
...:

# 还有就是用dataclass, 这里的 Point2 使用了标准库的 dataclasses.dataclass 装饰器
# 它会提供 __match_args__ 属性，所以可以直接用
In : from dataclasses import dataclass

In : @dataclass
...: class Point2:
...:     x: int
...:     y: int
...:

In : def class_pattern(obj):
...:     match obj:
...:         case Point(x, y):
...:             print(f'Point({x=},{y=})')
...:         case Point2(x, y):
...:             print(f'Point2({x=},{y=})')
...:

In : class_pattern(Point(1, 2))
Point(x=1,y=2)

In : class_pattern(Point2(1, 2))
Point2(x=1,y=2)

def eval_expr(expr):
    """Evaluate an expression and return the result."""
    match expr:
        case BinaryOp('+', left, right):
            return eval_expr(left) + eval_expr(right)
        case BinaryOp('-', left, right):
            return eval_expr(left) - eval_expr(right)
        case BinaryOp('*', left, right):
            return eval_expr(left) * eval_expr(right)
        case BinaryOp('/', left, right):
            return eval_expr(left) / eval_expr(right)
        case UnaryOp('+', arg):
            return eval_expr(arg)
        case UnaryOp('-', arg):
            return -eval_expr(arg)
        case VarExpr(name):
            raise ValueError(f"Unknown value of: {name}")
        case float() | int():
            return expr
        case _:
            raise ValueError(f"Invalid expression value: {repr(expr)}")

另外一个例子：

match media_object:
    case Image(type="jpg"):
        return media_object
    case Image(type="png") | Image(type="gif"):
        return render_as(media_object, "jpg")
    case Video():
        raise ValueError("Can't extract frames from video yet")
    case other_type:
        raise Exception(f"Media type {media_object} can't be handled yet")

namedtuple 例子，也属于是 class pattern：

from collections import namedtuple
Mov = namedtuple('mov', ['dst', 'src', 'sz', 'ridx'])
switch op:
    case Mov(dst, src, 8, ridx):
        pass

Type Unions, Aliases, and Guards

numbers 的类型指定为 List，元素类型可以是 float 或 int。

def mean(numbers: list[float | int]) -> float:
    return sum(numbers) / len(numbers)

可以定义类型别名，类型检查器和程序员都可以识别到这种模式：

from typing import TypeAlias

Card: TypeAlias = tuple[str, str]          # ('', '')
Deck: TypeAlias = list[Card]               # [('', '')]

Type guards用于缩小 type union 的范围。

三

new disassembler of 2020GKCTF-EzMachine

一般这种disassembler都是逐渐去优化的，优化到最后能使用https://docs.pwntools.com/en/stable/asm.html#pwnlib.asm.make_elf_from_assembly。

Python310新特性：Structural Pattern Matching在VM虚拟机逆向中的妙用

直接装配成一个elf。

1：建立指令类型，写出parse

◆Ezmachine-disassembler-parsefunc.py

from collections import namedtuplefrom dataclasses import dataclass
@dataclassclass Regs(object):    idx: int
    def __repr__(self):        if self.idx == 0:            return "eax"        elif self.idx == 1:            return "ebx"        elif self.idx == 2:            return "ecx"        elif self.idx == 3:            return "edx"        else:            return "unknown reg {}".format(self.idx)
Nop = namedtuple("Nop", ["addr"])  # case 0: nopMovReg = namedtuple("MovReg", ["addr", "dst", "imm"])  # case 1: mov reg, immPushImm = namedtuple("PushImm", ["addr", "imm"])  # case 2: push immPushReg = namedtuple("PushReg", ["addr", "reg"])  # case 3: push regPopReg = namedtuple("PopReg", ["addr", "reg"])  # case 4: pop reg# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'PrintStr = namedtuple("PrintStr", ["addr"])
AddReg = namedtuple("AddReg", ["addr", "dst", "src"])  # case 6: add reg, regSubReg = namedtuple("SubReg", ["addr", "dst", "src"])  # case 7: sub reg, regMulReg = namedtuple("MulReg", ["addr", "dst", "src"])  # case 8: mul reg, regDivReg = namedtuple("DivReg", ["addr", "dst", "src"])  # case 9: div reg, regXorReg = namedtuple("XorReg", ["addr", "dst", "src"])  # case 10: xor reg, reg
Jmp = namedtuple("Jmp", ["addr", "target"])  # case 11: jmp addrCmp = namedtuple("Cmp", ["addr", "dst", "src"])  # case 12: cmp reg, regJz = namedtuple("Jz", ["addr", "target"])  # case 13: jz addrJnz = namedtuple("Jnz", ["addr", "target"])  # case 14: jnz addrJg = namedtuple("Jg", ["addr", "target"])  # case 15: jg addrJl = namedtuple("Jl", ["addr", "target"])  # case 16: jl addr
# case 17: gets(mem); eax=strlen(mem);InputStr = namedtuple("InputStr", ["addr"])
InitMem = namedtuple(    "InitMem", ["addr", "mem_addr", "sz"])  # case 18: memset(mem_addr, 0, sz)
MovRegStack = namedtuple(    "MovRegStack", ["addr", "dst", "src"])  # case 19: mov reg, [ebp-src]
MovRegMem = namedtuple(    "MovRegMem", ["addr", "dst", "src"])  # case 20: mov reg, mem[src]
Exit = namedtuple("Exit", ["addr"])  # case 0xff: exit(0)
def parse(buffer):    instructions = []
    pc = 0    while pc < len(buffer):        opcode = buffer[pc]
        match opcode:            case 0:                instructions.append(Nop(pc))                pc += 1            case 1:                dst = buffer[pc + 1]                imm = buffer[pc + 2]                instructions.append(MovReg(pc, Regs(dst), imm))                pc += 3            case 2:                imm = buffer[pc + 1]                instructions.append(PushImm(pc, imm))                pc += 3            case 3:                reg = buffer[pc + 1]                instructions.append(PushReg(pc, Regs(reg)))                pc += 3            case 4:                reg = buffer[pc + 1]                instructions.append(PopReg(pc, Regs(reg)))                pc += 3            case 5:                instructions.append(PrintStr(pc))                pc += 3            case 6:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(AddReg(pc, Regs(dst), Regs(src)))                pc += 3            case 7:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(SubReg(pc, Regs(dst), Regs(src)))                pc += 3            case 8:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MulReg(pc, Regs(dst), Regs(src)))                pc += 3            case 9:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(DivReg(pc, Regs(dst), Regs(src)))                pc += 3            case 10:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(XorReg(pc, Regs(dst), Regs(src)))                pc += 3            case 11:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jmp(pc, target))                pc += 3            case 12:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(Cmp(pc, Regs(dst), Regs(src)))                pc += 3            case 13:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jz(pc, target))                pc += 3            case 14:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jnz(pc, target))                pc += 3            case 15:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jg(pc, target))                pc += 3            case 16:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jl(pc, target))                pc += 3            case 17:                instructions.append(InputStr(pc))                pc += 3            case 18:                mem_addr = buffer[pc + 1]                sz = buffer[pc + 2]                instructions.append(InitMem(pc, mem_addr, sz))                pc += 3            case 19:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MovRegStack(pc, Regs(dst), Regs(src)))                pc += 3            case 20:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MovRegMem(pc, Regs(dst), Regs(src)))                pc += 3            case 255:                instructions.append(Exit(pc))                pc += 3            case _:                raise Exception(f"unknown opcode: {opcode} at {pc}")                break
    return instructions
if __name__ == '__main__':    opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]    instructions = parse(opcode)        for ins in instructions:            print(ins)

◆Ezmachine-disassembler-parsefunc.out

MovReg(addr=0, dst=edx, imm=3)PrintStr(addr=3)InputStr(addr=6)MovReg(addr=9, dst=ebx, imm=17)Cmp(addr=12, dst=eax, src=ebx)Jz(addr=15, target=27)MovReg(addr=18, dst=edx, imm=1)PrintStr(addr=21)Exit(addr=24)MovReg(addr=27, dst=ecx, imm=0)MovReg(addr=30, dst=eax, imm=17)Cmp(addr=33, dst=eax, src=ecx)Jz(addr=36, target=126)MovRegMem(addr=39, dst=eax, src=ecx)MovReg(addr=42, dst=ebx, imm=97)Cmp(addr=45, dst=eax, src=ebx)Jl(addr=48, target=75)MovReg(addr=51, dst=ebx, imm=122)Cmp(addr=54, dst=eax, src=ebx)Jg(addr=57, target=75)MovReg(addr=60, dst=ebx, imm=71)XorReg(addr=63, dst=eax, src=ebx)MovReg(addr=66, dst=ebx, imm=1)AddReg(addr=69, dst=eax, src=ebx)Jmp(addr=72, target=105)MovReg(addr=75, dst=ebx, imm=65)Cmp(addr=78, dst=eax, src=ebx)Jl(addr=81, target=105)MovReg(addr=84, dst=ebx, imm=90)Cmp(addr=87, dst=eax, src=ebx)Jg(addr=90, target=105)MovReg(addr=93, dst=ebx, imm=75)XorReg(addr=96, dst=eax, src=ebx)MovReg(addr=99, dst=ebx, imm=1)SubReg(addr=102, dst=eax, src=ebx)MovReg(addr=105, dst=ebx, imm=16)DivReg(addr=108, dst=eax, src=ebx)PushReg(addr=111, reg=ebx)PushReg(addr=114, reg=eax)MovReg(addr=117, dst=ebx, imm=1)AddReg(addr=120, dst=ecx, src=ebx)Jmp(addr=123, target=30)PushImm(addr=126, imm=7)PushImm(addr=129, imm=13)PushImm(addr=132, imm=0)PushImm(addr=135, imm=5)PushImm(addr=138, imm=1)PushImm(addr=141, imm=12)PushImm(addr=144, imm=1)PushImm(addr=147, imm=0)PushImm(addr=150, imm=0)PushImm(addr=153, imm=13)PushImm(addr=156, imm=5)PushImm(addr=159, imm=15)PushImm(addr=162, imm=0)PushImm(addr=165, imm=9)PushImm(addr=168, imm=5)PushImm(addr=171, imm=15)PushImm(addr=174, imm=3)PushImm(addr=177, imm=0)PushImm(addr=180, imm=2)PushImm(addr=183, imm=5)PushImm(addr=186, imm=3)PushImm(addr=189, imm=3)PushImm(addr=192, imm=1)PushImm(addr=195, imm=7)PushImm(addr=198, imm=7)PushImm(addr=201, imm=11)PushImm(addr=204, imm=2)PushImm(addr=207, imm=1)PushImm(addr=210, imm=2)PushImm(addr=213, imm=7)PushImm(addr=216, imm=2)PushImm(addr=219, imm=12)PushImm(addr=222, imm=2)PushImm(addr=225, imm=2)MovReg(addr=228, dst=ecx, imm=1)MovRegStack(addr=231, dst=ebx, src=ecx)PopReg(addr=234, reg=eax)Cmp(addr=237, dst=eax, src=ebx)Jnz(addr=240, target=270)MovReg(addr=243, dst=ebx, imm=34)Cmp(addr=246, dst=ecx, src=ebx)Jz(addr=249, target=264)MovReg(addr=252, dst=ebx, imm=1)AddReg(addr=255, dst=ecx, src=ebx)Jmp(addr=258, target=231)MovReg(addr=261, dst=edx, imm=0)PrintStr(addr=264)Exit(addr=267)MovReg(addr=270, dst=edx, imm=1)PrintStr(addr=273)Exit(addr=276)Nop(addr=279)

拿parsefunc.out的原因是检查parse及指定类型定义是否合理。

2：编写初步dump

◆Ezmachine-disassembler-version0.py

from collections import namedtuplefrom dataclasses import dataclass
@dataclassclass Regs(object):    idx: int
    def __repr__(self):        if self.idx == 0:            return "eax"        elif self.idx == 1:            return "ebx"        elif self.idx == 2:            return "ecx"        elif self.idx == 3:            return "edx"        else:            return "unknown reg {}".format(self.idx)
Nop = namedtuple("Nop", ["addr"])  # case 0: nopMovReg = namedtuple("MovReg", ["addr", "dst", "imm"])  # case 1: mov reg, immPushImm = namedtuple("PushImm", ["addr", "imm"])  # case 2: push immPushReg = namedtuple("PushReg", ["addr", "reg"])  # case 3: push regPopReg = namedtuple("PopReg", ["addr", "reg"])  # case 4: pop reg# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'PrintStr = namedtuple("PrintStr", ["addr"])
AddReg = namedtuple("AddReg", ["addr", "dst", "src"])  # case 6: add reg, regSubReg = namedtuple("SubReg", ["addr", "dst", "src"])  # case 7: sub reg, regMulReg = namedtuple("MulReg", ["addr", "dst", "src"])  # case 8: mul reg, regDivReg = namedtuple("DivReg", ["addr", "dst", "src"])  # case 9: div reg, regXorReg = namedtuple("XorReg", ["addr", "dst", "src"])  # case 10: xor reg, reg
Jmp = namedtuple("Jmp", ["addr", "target"])  # case 11: jmp addrCmp = namedtuple("Cmp", ["addr", "dst", "src"])  # case 12: cmp reg, regJz = namedtuple("Jz", ["addr", "target"])  # case 13: jz addrJnz = namedtuple("Jnz", ["addr", "target"])  # case 14: jnz addrJg = namedtuple("Jg", ["addr", "target"])  # case 15: jg addrJl = namedtuple("Jl", ["addr", "target"])  # case 16: jl addr
# case 17: gets(mem); eax=strlen(mem);InputStr = namedtuple("InputStr", ["addr"])
InitMem = namedtuple(    "InitMem", ["addr", "mem_addr", "sz"])  # case 18: memset(mem_addr, 0, sz)
MovRegStack = namedtuple(    "MovRegStack", ["addr", "dst", "src"])  # case 19: mov reg, [ebp-src]
MovRegMem = namedtuple(    "MovRegMem", ["addr", "dst", "src"])  # case 20: mov reg, mem[src]
Exit = namedtuple("Exit", ["addr"])  # case 0xff: exit(0)
def parse(buffer):    instructions = []
    pc = 0    while pc < len(buffer):        opcode = buffer[pc]
        match opcode:            case 0:                instructions.append(Nop(pc))                pc += 1            case 1:                dst = buffer[pc + 1]                imm = buffer[pc + 2]                instructions.append(MovReg(pc, Regs(dst), imm))                pc += 3            case 2:                imm = buffer[pc + 1]                instructions.append(PushImm(pc, imm))                pc += 3            case 3:                reg = buffer[pc + 1]                instructions.append(PushReg(pc, Regs(reg)))                pc += 3            case 4:                reg = buffer[pc + 1]                instructions.append(PopReg(pc, Regs(reg)))                pc += 3            case 5:                instructions.append(PrintStr(pc))                pc += 3            case 6:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(AddReg(pc, Regs(dst), Regs(src)))                pc += 3            case 7:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(SubReg(pc, Regs(dst), Regs(src)))                pc += 3            case 8:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MulReg(pc, Regs(dst), Regs(src)))                pc += 3            case 9:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(DivReg(pc, Regs(dst), Regs(src)))                pc += 3            case 10:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(XorReg(pc, Regs(dst), Regs(src)))                pc += 3            case 11:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jmp(pc, target))                pc += 3            case 12:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(Cmp(pc, Regs(dst), Regs(src)))                pc += 3            case 13:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jz(pc, target))                pc += 3            case 14:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jnz(pc, target))                pc += 3            case 15:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jg(pc, target))                pc += 3            case 16:                target = 3 * buffer[pc + 1] - 3                instructions.append(Jl(pc, target))                pc += 3            case 17:                instructions.append(InputStr(pc))                pc += 3            case 18:                mem_addr = buffer[pc + 1]                sz = buffer[pc + 2]                instructions.append(InitMem(pc, mem_addr, sz))                pc += 3            case 19:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MovRegStack(pc, Regs(dst), src))                pc += 3            case 20:                dst = buffer[pc + 1]                src = buffer[pc + 2]                instructions.append(MovRegMem(pc, Regs(dst), src))                pc += 3            case 255:                instructions.append(Exit(pc))                pc += 3            case _:                raise Exception(f"unknown opcode: {opcode} at {pc}")                break
    return instructions
def dump(instructions):    for ins in instructions:        match ins:            case Nop(addr):                print(f"_0x{addr:04x}: nop")            case MovReg(addr, dst, imm):                print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")            case PushImm(addr, imm):                print(f"_0x{addr:04x}: push 0x{imm:02x}")            case PushReg(addr, reg):                print(f"_0x{addr:04x}: push {reg}")            case PopReg(addr, reg):                print(f"_0x{addr:04x}: pop {reg}")            case PrintStr(addr):                print(f"_0x{addr:04x}: print_str")            case AddReg(addr, dst, src):                print(f"_0x{addr:04x}: add {dst}, {src}")            case SubReg(addr, dst, src):                print(f"_0x{addr:04x}: sub {dst}, {src}")            case MulReg(addr, dst, src):                print(f"_0x{addr:04x}: mul {dst}, {src}")            case DivReg(addr, dst, src):                print(f"_0x{addr:04x}: div {dst}, {src}")            case XorReg(addr, dst, src):                print(f"_0x{addr:04x}: xor {dst}, {src}")            case Jmp(addr, target):                print(f"_0x{addr:04x}: jmp _0x{target:04x}")            case Cmp(addr, dst, src):                print(f"_0x{addr:04x}: cmp {dst}, {src}")            case Jz(addr, target):                print(f"_0x{addr:04x}: jz _0x{target:04x}")            case Jnz(addr, target):                print(f"_0x{addr:04x}: jnz _0x{target:04x}")            case Jg(addr, target):                print(f"_0x{addr:04x}: jg _0x{target:04x}")            case Jl(addr, target):                print(f"_0x{addr:04x}: jl _0x{target:04x}")            case InputStr(addr):                print(f"_0x{addr:04x}: input_str")            case InitMem(addr, mem_addr, sz):                print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")            case MovRegStack(addr, dst, src):                print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")            case MovRegMem(addr, dst, src):                print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")            case Exit(addr):                print(f"_0x{addr:04x}: exit(0)")            case _:                raise Exception(f"unknown instruction: {ins}")                break
if __name__ == '__main__':    opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]    instructions = parse(opcode)    dump(instructions)

◆Ezmachine-disassembler-dumpfunc-version0.out

_0x0000: mov edx, 0x03_0x0003: print_str_0x0006: input_str_0x0009: mov ebx, 0x11_0x000c: cmp eax, ebx_0x000f: jz _0x001b_0x0012: mov edx, 0x01_0x0015: print_str_0x0018: exit(0)_0x001b: mov ecx, 0x00_0x001e: mov eax, 0x11_0x0021: cmp eax, ecx_0x0024: jz _0x007e_0x0027: mov eax, mem[2]_0x002a: mov ebx, 0x61_0x002d: cmp eax, ebx_0x0030: jl _0x004b_0x0033: mov ebx, 0x7a_0x0036: cmp eax, ebx_0x0039: jg _0x004b_0x003c: mov ebx, 0x47_0x003f: xor eax, ebx_0x0042: mov ebx, 0x01_0x0045: add eax, ebx_0x0048: jmp _0x0069_0x004b: mov ebx, 0x41_0x004e: cmp eax, ebx_0x0051: jl _0x0069_0x0054: mov ebx, 0x5a_0x0057: cmp eax, ebx_0x005a: jg _0x0069_0x005d: mov ebx, 0x4b_0x0060: xor eax, ebx_0x0063: mov ebx, 0x01_0x0066: sub eax, ebx_0x0069: mov ebx, 0x10_0x006c: div eax, ebx_0x006f: push ebx_0x0072: push eax_0x0075: mov ebx, 0x01_0x0078: add ecx, ebx_0x007b: jmp _0x001e_0x007e: push 0x07_0x0081: push 0x0d_0x0084: push 0x00_0x0087: push 0x05_0x008a: push 0x01_0x008d: push 0x0c_0x0090: push 0x01_0x0093: push 0x00_0x0096: push 0x00_0x0099: push 0x0d_0x009c: push 0x05_0x009f: push 0x0f_0x00a2: push 0x00_0x00a5: push 0x09_0x00a8: push 0x05_0x00ab: push 0x0f_0x00ae: push 0x03_0x00b1: push 0x00_0x00b4: push 0x02_0x00b7: push 0x05_0x00ba: push 0x03_0x00bd: push 0x03_0x00c0: push 0x01_0x00c3: push 0x07_0x00c6: push 0x07_0x00c9: push 0x0b_0x00cc: push 0x02_0x00cf: push 0x01_0x00d2: push 0x02_0x00d5: push 0x07_0x00d8: push 0x02_0x00db: push 0x0c_0x00de: push 0x02_0x00e1: push 0x02_0x00e4: mov ecx, 0x01_0x00e7: mov ebx, [ebp-2]_0x00ea: pop eax_0x00ed: cmp eax, ebx_0x00f0: jnz _0x010e_0x00f3: mov ebx, 0x22_0x00f6: cmp ecx, ebx_0x00f9: jz _0x0108_0x00fc: mov ebx, 0x01_0x00ff: add ecx, ebx_0x0102: jmp _0x00e7_0x0105: mov edx, 0x00_0x0108: print_str_0x010b: exit(0)_0x010e: mov edx, 0x01_0x0111: print_str_0x0114: exit(0)_0x0117: nop

其实这里拿到的Ezmachine-disassembler-dumpfunc-version0.out，就跟以前我们的disassembler得到的差不多。

拿这个dumpfunc-version0.out的目的，就是为了参考这个去做优化。

3：优化

– (1) 添加函数头尾

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_QUTPQFYHWRH7AJ4.webp)

由于头和尾都是直接开始的指令，没有栈帧，我们为其添加

```python
from collections import namedtuple
from dataclasses import dataclass

......

# 优化(1): 添加main函数序言和结尾
prologue = namedtuple("prologue", [])
epilogue = namedtuple("epilogue", [])
def add_main_prologue_epilogue(instructions):
    instructions.insert(0, prologue())
    instructions.append(epilogue())
    return instructions

def dump(instructions):
    for ins in instructions:
        match ins:
            case prologue():
                print(f"push ebp")
                print(f"mov ebp, esp")
            case epilogue():
                print(f"mov esp, ebp")
                print(f"pop ebp")
                print(f"ret")
           ......
            case _:
                raise Exception(f"unknown instruction: {ins}")
                break

if __name__ == '__main__':
    opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
    instructions = parse(opcode)
    instructions = add_main_prologue_epilogue(instructions)
    dump(instructions)
```

– (2) 处理VM中mem及字符串

```python
.....

# VM中要使用的内存
def dump_data():
    print("n")
    print("""right:n    .asciz "right" """)
    print("""wrong:n    .asciz "wrong" """)
    print("""plz_input:n    .asciz "plz input:" """)
    print("""hacker:n    .asciz "hacker" """)
    print("""mem:n    .space 0x100 """)

if __name__ == '__main__':
        opcode = [...]
    instructions = parse(opcode)
    instructions = add_main_prologue_epilogue(instructions)
    dump(instructions)
    dump_data()
```

– (3) 处理print_str

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_4U5PDW3GE26ETHF.webp)

我们弄出来的汇编中有这种语句

```python
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr = namedtuple("PrintStr", ["addr"])
```

其主要就是根据edx的值，来打印不同的字符串

难以避免的要进行函数调用，我们可以借用pwntools的shellcraft来产生：https://docs.pwntools.com/en/stable/shellcraft/i386.html#module-pwnlib.shellcraft.i386.linux

```python
from collections import namedtuple
from dataclasses import dataclass

.....
write_func_call = namedtuple("write_func_call", ["addr", "str_idx"])
# 优化(3): 处理print_str
def handle_print_str(instructions):
    """
    _0x0000: mov edx, 0x03
    _0x0003: print_str

    _0x0012: mov edx, 0x01
    _0x0015: print_str

    _0x0105: mov edx, 0x00
    _0x0108: print_str

    _0x010e: mov edx, 0x01
    _0x0111: print_str
    """
    idx = 0
    while idx < len(instructions):
        match instructions[idx: idx+2]:
            case [
                MovReg(addr1, Regs(3), imm),
                PrintStr(addr2)
            ] if (imm == 0x00 or imm == 0x01 or imm == 0x03 or imm == 0x04):
                instructions[idx: idx+2] = [write_func_call(addr2, imm)]
        idx += 1

def dump(instructions):
    for ins in instructions:
        match ins:
                        ......
            case write_func_call(addr, str_idx):
                if str_idx == 0:
                    print_right = f"""/* write(fd=1, buf='right', n=5) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, right
    push 5
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_right)
                elif str_idx == 1:
                    print_wrong = f"""/* write(fd=1, buf='wrong', n=5) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, wrong
    push 5
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_wrong)
                elif str_idx == 3:
                    print_plz_input = f"""/* write(fd=1, buf='plz input:', n=10) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, plz_input
    push 10
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_plz_input)
                elif str_idx == 4:
                    print_hacker = f"""/* write(fd=1, buf='hacker', n=6) */
_0x{addr:04x}: pushad
    push 1
    pop ebx
    mov ecx, hacker
    push 6
    pop edx
    push SYS_write  /* 4 */
    pop eax
    int 0x80
    popad
"""
                    print(print_hacker)
            case Nop(addr):
                print(f"_0x{addr:04x}: nop")
            case MovReg(addr, dst, imm):
                print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
            case PushImm(addr, imm):
                print(f"_0x{addr:04x}: push 0x{imm:02x}")
            case PushReg(addr, reg):
                print(f"_0x{addr:04x}: push {reg}")
            case PopReg(addr, reg):
                print(f"_0x{addr:04x}: pop {reg}")
            case PrintStr(addr):
                print(f"_0x{addr:04x}: print_str")
            case AddReg(addr, dst, src):
                print(f"_0x{addr:04x}: add {dst}, {src}")
            case SubReg(addr, dst, src):
                print(f"_0x{addr:04x}: sub {dst}, {src}")
            case MulReg(addr, dst, src):
                print(f"_0x{addr:04x}: mul {dst}, {src}")
            case DivReg(addr, dst, src):
                print(f"_0x{addr:04x}: div {dst}, {src}")
            case XorReg(addr, dst, src):
                print(f"_0x{addr:04x}: xor {dst}, {src}")
            case Jmp(addr, target):
                print(f"_0x{addr:04x}: jmp _0x{target:04x}")
            case Cmp(addr, dst, src):
                print(f"_0x{addr:04x}: cmp {dst}, {src}")
            case Jz(addr, target):
                print(f"_0x{addr:04x}: jz _0x{target:04x}")
            case Jnz(addr, target):
                print(f"_0x{addr:04x}: jnz _0x{target:04x}")
            case Jg(addr, target):
                print(f"_0x{addr:04x}: jg _0x{target:04x}")
            case Jl(addr, target):
                print(f"_0x{addr:04x}: jl _0x{target:04x}")
            case InputStr(addr):
                print(f"_0x{addr:04x}: input_str")
            case InitMem(addr, mem_addr, sz):
                print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
            case MovRegStack(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
            case MovRegMem(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
            case Exit(addr):
                print(f"_0x{addr:04x}: exit(0)")
            case _:
                raise Exception(f"unknown instruction: {ins}")
                break

......
```

– (4) 处理input_str

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_X3P96QF84NKZ4PR.webp)

```python
# case 17: gets(mem); eax=strlen(mem);
InputStr = namedtuple("InputStr", ["addr"])
```

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_JAXQMQ975NMVWXV.webp)

```python
from collections import namedtuple
from dataclasses import dataclass

......

read_strlen_func_call = namedtuple("read_func_call", ["addr"])
# 优化(4): 处理input_str
def handle_input_str(instructions):
    """
    _0x0006: input_str
    """
    idx = 0
    while idx < len(instructions):
        match instructions[idx: idx+1]:
            case [
                InputStr(addr)
            ]:
                instructions[idx: idx+1] = [read_strlen_func_call(addr)]
        idx += 1

def dump(instructions):
    for ins in instructions:
        match ins:
                        ......
            case read_strlen_func_call(addr):
                print_read_strlen = f"""/* read(fd=0, buf=mem, n=0x100) */
_0x{addr:04x}: push eax
    push ebx
    push ecx
    push edx
    xor ebx, ebx
    mov ecx, mem
    push 0x100
    pop edx
    push SYS_read  /* 3 */
    pop eax
    int 0x80

    /* strlen(mem) */
    mov edi, mem
    xor eax, eax
    push -1
    pop ecx
    repnz scas al, BYTE PTR [edi]
    inc ecx
    inc ecx
    neg ecx
    /* moving ecx into ecx, but this is a no-op */
    mov edi, ecx
    pop edx
    pop ecx
    pop ebx
    pop eax
    mov eax, edi
"""
                print(print_read_strlen)
            case Nop(addr):
                print(f"_0x{addr:04x}: nop")
            case MovReg(addr, dst, imm):
                print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
            case PushImm(addr, imm):
                print(f"_0x{addr:04x}: push 0x{imm:02x}")
            case PushReg(addr, reg):
                print(f"_0x{addr:04x}: push {reg}")
            case PopReg(addr, reg):
                print(f"_0x{addr:04x}: pop {reg}")
            case PrintStr(addr):
                print(f"_0x{addr:04x}: print_str")
            case AddReg(addr, dst, src):
                print(f"_0x{addr:04x}: add {dst}, {src}")
            case SubReg(addr, dst, src):
                print(f"_0x{addr:04x}: sub {dst}, {src}")
            case MulReg(addr, dst, src):
                print(f"_0x{addr:04x}: mul {dst}, {src}")
            case DivReg(addr, dst, src):
                print(f"_0x{addr:04x}: div {dst}, {src}")
            case XorReg(addr, dst, src):
                print(f"_0x{addr:04x}: xor {dst}, {src}")
            case Jmp(addr, target):
                print(f"_0x{addr:04x}: jmp _0x{target:04x}")
            case Cmp(addr, dst, src):
                print(f"_0x{addr:04x}: cmp {dst}, {src}")
            case Jz(addr, target):
                print(f"_0x{addr:04x}: jz _0x{target:04x}")
            case Jnz(addr, target):
                print(f"_0x{addr:04x}: jnz _0x{target:04x}")
            case Jg(addr, target):
                print(f"_0x{addr:04x}: jg _0x{target:04x}")
            case Jl(addr, target):
                print(f"_0x{addr:04x}: jl _0x{target:04x}")
            case InputStr(addr):
                print(f"_0x{addr:04x}: input_str")
            case InitMem(addr, mem_addr, sz):
                print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
            case MovRegStack(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
            case MovRegMem(addr, dst, src):
                print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
            case Exit(addr):
                print(f"_0x{addr:04x}: exit(0)")
            case _:
                raise Exception(f"unknown instruction: {ins}")
                break

# 优化(2): VM中要使用的内存
def dump_data():
    print("n")
    print("""right:n    .asciz "right" """)
    print("""wrong:n    .asciz "wrong" """)
    print("""plz_input:n    .asciz "plz input:" """)
    print("""hacker:n    .asciz "hacker" """)
    print("""mem:n    .space 0x100 """)

if __name__ == '__main__':
    opcode = [.....]
    instructions = parse(opcode)
    instructions = add_main_prologue_epilogue(instructions)
    handle_print_str(instructions)
    handle_input_str(instructions)
    dump(instructions)
    dump_data()
```

– (5) 处理exit(0)

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_XYKFP696Y5UC9UJ.webp)
```python
case Exit(addr):
                print(f"""/* exit(status=0) */
_0x{addr:04x}: xor ebx, ebx
    push SYS_exit  /* 1 */
    pop eax
    int 0x80
""")
```

– (6) 优化mov ebx, [ebp-ecx]

这种asm是会报错的

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_MVYBK4BXJK84HFF.webp)

换成如下这种

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_9RHVZ9B8HUX8NJE.webp)

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_4BK2G6J8HPXACSM.webp)

```python
case MovRegStack(addr, dst, src):
    # print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
    print(f"_0x{addr:04x}: mov {dst}, ebp")
    print(f"    sub {dst}, {src}")
    print(f"    mov {dst}, [{dst}]")
```

– (7) 优化_0x006c: div eax, ebx

![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_2T8PCPWA695R3CV.webp)

正常的div ebx执行之后，商将存储在 eax 寄存器中，余数将存储在 edx 寄存器中

它的div有所不同，是存到eax和ebx中的

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_P25BCYVPP4BEK4M.webp)

我们还需要在div eax, ebx后面，加一条mov ebx, edx

 ![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_PV5TZ5K9T466DWK.webp)

Ezmachine-disassembler.py（https://prod-files-secure.s3.us-west-2.amazonaws.com/461378ca-73a1-498e-b83e-fbb0244aa01b/0c2d246f-a2d4-484c-8671-4d65e9ac8fa1/Ezmachine-disassembler.py）

Ezmachine-disassembler-out.asm（https://prod-files-secure.s3.us-west-2.amazonaws.com/461378ca-73a1-498e-b83e-fbb0244aa01b/1edf8fac-54c2-42ba-84ac-1e46937eaf1e/Ezmachine-disassembler-out.asm）

4：调用pwntools make_elf

Ezmachine-asm_compile.py（https://prod-files-secure.s3.us-west-2.amazonaws.com/461378ca-73a1-498e-b83e-fbb0244aa01b/a6df480c-d615-4ab5-a2bd-dee5da074416/Ezmachine-asm_compile.py）

from ast import dump
from pwn import *

code = """
push ebp
mov ebp, esp
.....
ret

right:
    .asciz "right" 
wrong:
    .asciz "wrong" 
plz_input:
    .asciz "plz input:" 
hacker:
    .asciz "hacker" 
mem:
    .space 0x100 
"""

elf = make_elf_from_assembly(code)
print(elf)

效果：

Python310新特性：Structural Pattern Matching在VM虚拟机逆向中的妙用

看雪ID：SYJ-Re

https://bbs.kanxue.com/user-home-921830.htm

*本文为看雪论坛优秀文章，由 SYJ-Re 原创，转载请注明来自看雪社区

# 往期推荐

1、区块链智能合约逆向-合约创建-调用执行流程分析

2、在Windows平台使用VS2022的MSVC编译LLVM16

3、神挡杀神——揭开世界第一手游保护nProtect的神秘面纱

4、为什么在ASLR机制下DLL文件在不同进程中加载的基址相同

5、2022QWB final RDP

6、华为杯研究生国赛 adv_lua

球分享

球点赞

球在看

点击阅读原文查看更多

原文始发于微信公众号（看雪学苑）：Python310新特性：Structural Pattern Matching在VM虚拟机逆向中的妙用

版权声明：admin 发表于 2024年2月12日下午6:00。
转载请注明：Python310新特性：Structural Pattern Matching在VM虚拟机逆向中的妙用 | CTF导航

每周移动恶意程序传播渠道安全监测报告（2022年02月07日—2022年02月13日）

admin

588

病毒木马常用手段之自我创建

admin

506

Linux内核Makefile执行流程

admin

296

GuLoader_VBS恶意加载器分析报告

admin

478

Bluesky勒索病毒爆发对SQL Server数据库渗透攻击

admin

478

跟踪 Royal 勒索软件的进化之路

admin

301

Python310新特性：Structural Pattern Matching在VM虚拟机逆向中的妙用

python310 Structural Pattern Matching

Learn Structural Pattern Matching

Structural Pattern Matching介绍

Capture patterns（捕捉模式）

guards（向模式添加条件）

AS Patterns（as模式）

OR Patterns（或模式）

Literal Patterns（字面量模式）

Wildcard Pattern（通配符模式）

Value Patterns（值模式）

Sequence Patterns（序列模式）

Mapping Patterns（映射模式）

Class Patterns（类模式）

Type Unions, Aliases, and Guards

new disassembler of 2020GKCTF-EzMachine

1：建立指令类型，写出parse

2：编写初步dump

3：优化

– (1) 添加函数头尾

– (2) 处理VM中mem及字符串

– (3) 处理print_str

– (4) 处理input_str

– (5) 处理exit(0)

– (6) 优化mov ebx, [ebp-ecx]

– (7) 优化_0x006c: div eax, ebx

4：调用pwntools make_elf

效果：

Binary type inference in Ghidra

Keyhole逆向分析

相关文章

相关文章

Python310新特性：Structural Pattern Matching在VM虚拟机逆向中的妙用

python310 Structural Pattern Matching

Learn Structural Pattern Matching

Structural Pattern Matching介绍

Capture patterns（捕捉模式）

guards（向模式添加条件）

AS Patterns（as模式）

OR Patterns（或模式）

Literal Patterns（字面量模式）

Wildcard Pattern（通配符模式）

Value Patterns（值模式）

Sequence Patterns（序列模式）

Mapping Patterns（映射模式）

Class Patterns（类模式）

Type Unions, Aliases, and Guards

new disassembler of 2020GKCTF-EzMachine

1：建立指令类型，写出parse

2：编写初步dump

3：优化

– (1) 添加函数头尾

– (2) 处理VM中mem及字符串

– (3) 处理print_str

– (4) 处理input_str

– (5) 处理exit(0)

– (6) 优化mov ebx, [ebp-ecx]

– (7) 优化_0x006c: div eax, ebx

4：调用pwntools make_elf

效果：

Binary type inference in Ghidra

Keyhole逆向分析

相关文章

广告位

相关文章