一
前言
然后后来国内的2022-强网杯中的deeprev又将这个ELF metadata-driven turing weird machine出了一次, 我也用其来写过这个重定位机的解析,确实效果非常好,可以说毫不夸张像魔法一样。
当时就在Todolist中写道,用Structural Pattern Matching这种新特性去写下正常虚拟机的解析, 肯定属于是轻轻松松。后续工作上的事情就放下了没去完成,在我的Todolist中吃灰了接近一年,这一年都在被工作推着走,每天就像机器人一样去执行自己头天写的指令,记忆好像也变差了,经常忘事情,年末项目交付了一些了才有时间弄些自己的,创业之路真的很难。
言归正传, 后续在dicectf-2022-breach这道题的wp:https://github.com/reductor/dice-ctf-2022-breach-writeup中,被正式用于了解析常规虚拟机。
直至放到了今日,才回来写,其实虚拟机解析之前我在之前已经发过不少。总结来说,这种方法属于是disassembler的升级版, 远优于之前发的disassembler, 你说它优于decompiler吗?我无法给出一个肯定答案,毕竟decompiler属于一种抽象为高级语言的思路。
二
python310 Structural Pattern Matching
Learn Structural Pattern Matching
Structural Pattern Matching介绍
Mapping patterns: match mapping structures like dictionaries.
Sequence patterns: match sequence structures like tuples and lists.
Capture patterns: bind values to names.
AS patterns: bind the value of subpatterns to names.
OR patterns: match one of several different subpatterns.
Wildcard patterns: match anything.
Class patterns: match class structures.
Value patterns: match values stored in attributes.
Literal patterns: match literal values.
Capture patterns(捕捉模式)
def sum_list(numbers):
match numbers:
case []: # 匹配空列表
return 0
case [first, *rest]:# sequence pattern, 由两个capture pattern 组成的 sequence pattern。*rest 匹配剩下的所有元素
return first + sum_list(rest)
def average(*args):
match args:
case [x, y]: # captures the two elements of a sequence
return (x + y) / 2
case [x]: # captures the only element of a sequence
return x
case []:
return 0
case a: # captures the entire sequence
return sum(a) / len(a)
guards(向模式添加条件)
# 从小到大排序
def sort(seq):
match seq:
case [] | [_]: # 匹配空序列[] 或者 非空列表中的任何单个元素[_]
return seq
case [x, y] if x <= y:
return seq
case [x, y]:
return [y, x]
case [x, y, z] if x <= y <= z:
return seq
case [x, y, z] if x >= y >= z:
return [z, y, x]
case [p, *rest]:
a = sort([x for x in rest if x <= p]) # 比p小的去排序
b = sort([x for x in rest if p < x]) # 比p大的去排序
return a + [p] + b
AS Patterns(as模式)
In : def as_pattern(obj):
...: match obj:
...: case str() as s:
...: print(f'Got str: {s=}')
...: case [0, int() as i]:
...: print(f'Got int: {i=}')
...: case [tuple() as tu]:
...: print(f'Got tuple: {tu=}')
...: case list() | set() | dict() as iterable:
...: print(f'Got iterable: {iterable=}')
...:
...:
In : as_pattern('sss')
Got str: s='sss'
In : as_pattern([0, 1])
Got int: i=1
In : as_pattern([(1,)])
Got tuple: tu=(1,)
In : as_pattern([1, 2, 3])
Got iterable: iterable=[1, 2, 3]
In : as_pattern({'a': 1})
Got iterable: iterable={'a': 1}
def simplify_expr(tokens):
match tokens:
case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
return simplify_expr(expr)
case [0, ('+'|'-') as op, right]:
return UnaryOp(op, right)
case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
return Num(left + right)
case [(int() | float()) as value]:
return Num(value)
OR Patterns(或模式)
case 401, 403, 404:
print("Some HTTP error")
case 401:
case 403:
case 404:
print("Some HTTP error")
case in 401, 403, 404:
print("Some HTTP error")
case ("a"|"b"|"c"):
case ("a"|"b"|"c") as letter:
Literal Patterns(字面量模式)
match number:
case 0:
print('zero')
case 1:
print('one')
case 2:
print('two')
def simplify(expr):
match expr:
case ('+', 0, x): # x + 0
return x
case ('+' | '-', x, 0): # x +- 0
return x
case ('and', True, x): # True and x
return x
case ('and', False, x):
return False
case ('or', False, x):
return x
case ('or', True, x):
return True
case ('not', ('not', x)):
return x
return expr
Wildcard Pattern(通配符模式)
def is_closed(sequence):
match sequence:
case [_]: # any sequence with a single element
return True
case [start, *_, end]: # a sequence with at least two elements
return start == end
case _: # anything
return False
Value Patterns(值模式)
In : class Color(Enum):
...: RED = 1
...: GREEN = 2
...: BLUE = 3
...:
In : class NewColor:
...: YELLOW = 4
...:
In : def constant_value(color):
...: match color:
...: case Color.RED:
...: print('Red')
...: case NewColor.YELLOW:
...: print('Yellow')
...: case new_color:
...: print(new_color)
...:
In : constant_value(Color.RED) # 匹配第一个case
Red
In : constant_value(NewColor.YELLOW) # 匹配第二个case
Yellow
In : constant_value(Color.GREEN) # 匹配第三个case
Color.GREEN
In : constant_value(4) # 常量值一样都匹配第二个case
Yellow
In : constant_value(10) # 其他常量
10
这里注意,因为 case 具有绑定的作用,所以不能直接使用 YELLOW 这种常量,例如下面这样:
YELLOW = 4
def constant_value(color):
match color:
case YELLOW:
print('Yellow')
# 这样语法是错误的
class Codes:
SUCCESS = 200
NOT_FOUND = 404
def handle(retcode):
match retcode:
case Codes.SUCCESS:
print('success')
case Codes.NOT_FOUND:
print('not found')
case _:
print('unknown')
Sequence Patterns(序列模式)
不会迭代整个迭代器,所有的元素以下标和切片的形式访问。
In : def sequence(collection):
...: match collection:
...: case 1, [x, *others]:
...: print(f"Got 1 and a nested sequence: {x=}, {others=}")
...: case (1, x):
...: print(f"Got 1 and {x}")
...: case [x, y, z]:
...: print(f"{x=}, {y=}, {z=}")
...:
In : sequence([1])
In : sequence([1, 2])
Got 1 and 2
In : sequence([1, 2, 3])
x=1, y=2, z=3
In : sequence([1, [2, 3]])
Got 1 and a nested sequence: x=2, others=[3]
In : sequence([1, [2, 3, 4]])
Got 1 and a nested sequence: x=2, others=[3, 4]
In : sequence([2, 3])
In : sequence((1, 2))
Got 1 and 2
Mapping Patterns(映射模式)
In : def mapping(config):
...: match config:
...: case {'sub': sub_config, **rest}:
...: print(f'Sub: {sub_config}')
...: print(f'OTHERS: {rest}')
...: case {'route': route}:
...: print(f'ROUTE: {route}')
...:
In : mapping({})
In : mapping({'route': '/auth/login'})
ROUTE: /auth/login
# 匹配有sub键的字典,值绑定到sub_config上,字典其他部分绑定到rest上
In : mapping({'route': '/auth/login', 'sub': {'a': 1}})
Sub: {'a': 1}
OTHERS: {'route': '/auth/login'}
def change_red_to_blue(json_obj):
match json_obj:
case { 'color': ('red' | '#FF0000') }:
json_obj['color'] = 'blue'
case { 'children': children }:
for child in children:
change_red_to_blue(child)
Class Patterns(类模式)
# case 后支持任何对象做匹配。我们先来一个错误的示例:
In : class Point:
...: def __init__(self, x, y):
...: self.x = x
...: self.y = y
...:
In : def class_pattern(obj):
...: match obj:
...: case Point(x, y):
...: print(f'Point({x=},{y=})')
...:
In : class_pattern(Point(1, 2))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [], in <cell line: 1>()
----> 1 class_pattern(Point(1, 2))
Input In [], in class_pattern(obj)
1 def class_pattern(obj):
2 match obj:
----> 3 case Point(x, y):
4 print(f'Point({x=},{y=})')
TypeError: Point() accepts 0 positional sub-patterns (2 given)
# 这是因为对于匹配来说, 位置需要确定 ,所以需要使用位置参数来标识:
In : def class_pattern(obj):
...: match obj:
...: case Point(x=1, y=2):
...: print(f'match')
...:
In : class_pattern(Point(1, 2))
match
# 另外一个解决这种自定义类不用位置参数的匹配方案,使用 __match_args__ 返回一个位置参数的数组,
# 就像这样:
In : class Point:
...: __match_args__ = ('x', 'y')
...:
...: def __init__(self, x, y):
...: self.x = x
...: self.y = y
...:
# 还有就是用dataclass, 这里的 Point2 使用了标准库的 dataclasses.dataclass 装饰器
# 它会提供 __match_args__ 属性,所以可以直接用
In : from dataclasses import dataclass
In : @dataclass
...: class Point2:
...: x: int
...: y: int
...:
In : def class_pattern(obj):
...: match obj:
...: case Point(x, y):
...: print(f'Point({x=},{y=})')
...: case Point2(x, y):
...: print(f'Point2({x=},{y=})')
...:
In : class_pattern(Point(1, 2))
Point(x=1,y=2)
In : class_pattern(Point2(1, 2))
Point2(x=1,y=2)
def eval_expr(expr):
"""Evaluate an expression and return the result."""
match expr:
case BinaryOp('+', left, right):
return eval_expr(left) + eval_expr(right)
case BinaryOp('-', left, right):
return eval_expr(left) - eval_expr(right)
case BinaryOp('*', left, right):
return eval_expr(left) * eval_expr(right)
case BinaryOp('/', left, right):
return eval_expr(left) / eval_expr(right)
case UnaryOp('+', arg):
return eval_expr(arg)
case UnaryOp('-', arg):
return -eval_expr(arg)
case VarExpr(name):
raise ValueError(f"Unknown value of: {name}")
case float() | int():
return expr
case _:
raise ValueError(f"Invalid expression value: {repr(expr)}")
match media_object:
case Image(type="jpg"):
return media_object
case Image(type="png") | Image(type="gif"):
return render_as(media_object, "jpg")
case Video():
raise ValueError("Can't extract frames from video yet")
case other_type:
raise Exception(f"Media type {media_object} can't be handled yet")
from collections import namedtuple
Mov = namedtuple('mov', ['dst', 'src', 'sz', 'ridx'])
switch op:
case Mov(dst, src, 8, ridx):
pass
Type Unions, Aliases, and Guards
def mean(numbers: list[float | int]) -> float:
return sum(numbers) / len(numbers)
from typing import TypeAlias
Card: TypeAlias = tuple[str, str] # ('', '')
Deck: TypeAlias = list[Card] # [('', '')]
三
new disassembler of 2020GKCTF-EzMachine
1:建立指令类型,写出parse
from collections import namedtuple
from dataclasses import dataclass
@dataclass
class Regs(object):
idx: int
def __repr__(self):
if self.idx == 0:
return "eax"
elif self.idx == 1:
return "ebx"
elif self.idx == 2:
return "ecx"
elif self.idx == 3:
return "edx"
else:
return "unknown reg {}".format(self.idx)
Nop = namedtuple("Nop", ["addr"]) # case 0: nop
MovReg = namedtuple("MovReg", ["addr", "dst", "imm"]) # case 1: mov reg, imm
PushImm = namedtuple("PushImm", ["addr", "imm"]) # case 2: push imm
PushReg = namedtuple("PushReg", ["addr", "reg"]) # case 3: push reg
PopReg = namedtuple("PopReg", ["addr", "reg"]) # case 4: pop reg
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr = namedtuple("PrintStr", ["addr"])
AddReg = namedtuple("AddReg", ["addr", "dst", "src"]) # case 6: add reg, reg
SubReg = namedtuple("SubReg", ["addr", "dst", "src"]) # case 7: sub reg, reg
MulReg = namedtuple("MulReg", ["addr", "dst", "src"]) # case 8: mul reg, reg
DivReg = namedtuple("DivReg", ["addr", "dst", "src"]) # case 9: div reg, reg
XorReg = namedtuple("XorReg", ["addr", "dst", "src"]) # case 10: xor reg, reg
Jmp = namedtuple("Jmp", ["addr", "target"]) # case 11: jmp addr
Cmp = namedtuple("Cmp", ["addr", "dst", "src"]) # case 12: cmp reg, reg
Jz = namedtuple("Jz", ["addr", "target"]) # case 13: jz addr
Jnz = namedtuple("Jnz", ["addr", "target"]) # case 14: jnz addr
Jg = namedtuple("Jg", ["addr", "target"]) # case 15: jg addr
Jl = namedtuple("Jl", ["addr", "target"]) # case 16: jl addr
# case 17: gets(mem); eax=strlen(mem);
InputStr = namedtuple("InputStr", ["addr"])
InitMem = namedtuple(
"InitMem", ["addr", "mem_addr", "sz"]
) # case 18: memset(mem_addr, 0, sz)
MovRegStack = namedtuple(
"MovRegStack", ["addr", "dst", "src"]
) # case 19: mov reg, [ebp-src]
MovRegMem = namedtuple(
"MovRegMem", ["addr", "dst", "src"]
) # case 20: mov reg, mem[src]
Exit = namedtuple("Exit", ["addr"]) # case 0xff: exit(0)
def parse(buffer):
instructions = []
pc = 0
while pc < len(buffer):
opcode = buffer[pc]
match opcode:
case 0:
instructions.append(Nop(pc))
pc += 1
case 1:
dst = buffer[pc + 1]
imm = buffer[pc + 2]
instructions.append(MovReg(pc, Regs(dst), imm))
pc += 3
case 2:
imm = buffer[pc + 1]
instructions.append(PushImm(pc, imm))
pc += 3
case 3:
reg = buffer[pc + 1]
instructions.append(PushReg(pc, Regs(reg)))
pc += 3
case 4:
reg = buffer[pc + 1]
instructions.append(PopReg(pc, Regs(reg)))
pc += 3
case 5:
instructions.append(PrintStr(pc))
pc += 3
case 6:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(AddReg(pc, Regs(dst), Regs(src)))
pc += 3
case 7:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(SubReg(pc, Regs(dst), Regs(src)))
pc += 3
case 8:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(MulReg(pc, Regs(dst), Regs(src)))
pc += 3
case 9:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(DivReg(pc, Regs(dst), Regs(src)))
pc += 3
case 10:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(XorReg(pc, Regs(dst), Regs(src)))
pc += 3
case 11:
target = 3 * buffer[pc + 1] - 3
instructions.append(Jmp(pc, target))
pc += 3
case 12:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(Cmp(pc, Regs(dst), Regs(src)))
pc += 3
case 13:
target = 3 * buffer[pc + 1] - 3
instructions.append(Jz(pc, target))
pc += 3
case 14:
target = 3 * buffer[pc + 1] - 3
instructions.append(Jnz(pc, target))
pc += 3
case 15:
target = 3 * buffer[pc + 1] - 3
instructions.append(Jg(pc, target))
pc += 3
case 16:
target = 3 * buffer[pc + 1] - 3
instructions.append(Jl(pc, target))
pc += 3
case 17:
instructions.append(InputStr(pc))
pc += 3
case 18:
mem_addr = buffer[pc + 1]
sz = buffer[pc + 2]
instructions.append(InitMem(pc, mem_addr, sz))
pc += 3
case 19:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(MovRegStack(pc, Regs(dst), Regs(src)))
pc += 3
case 20:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(MovRegMem(pc, Regs(dst), Regs(src)))
pc += 3
case 255:
instructions.append(Exit(pc))
pc += 3
case _:
raise Exception(f"unknown opcode: {opcode} at {pc}")
break
return instructions
if __name__ == '__main__':
opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
instructions = parse(opcode)
for ins in instructions:
print(ins)
MovReg(addr=0, dst=edx, imm=3)
PrintStr(addr=3)
InputStr(addr=6)
MovReg(addr=9, dst=ebx, imm=17)
Cmp(addr=12, dst=eax, src=ebx)
Jz(addr=15, target=27)
MovReg(addr=18, dst=edx, imm=1)
PrintStr(addr=21)
Exit(addr=24)
MovReg(addr=27, dst=ecx, imm=0)
MovReg(addr=30, dst=eax, imm=17)
Cmp(addr=33, dst=eax, src=ecx)
Jz(addr=36, target=126)
MovRegMem(addr=39, dst=eax, src=ecx)
MovReg(addr=42, dst=ebx, imm=97)
Cmp(addr=45, dst=eax, src=ebx)
Jl(addr=48, target=75)
MovReg(addr=51, dst=ebx, imm=122)
Cmp(addr=54, dst=eax, src=ebx)
Jg(addr=57, target=75)
MovReg(addr=60, dst=ebx, imm=71)
XorReg(addr=63, dst=eax, src=ebx)
MovReg(addr=66, dst=ebx, imm=1)
AddReg(addr=69, dst=eax, src=ebx)
Jmp(addr=72, target=105)
MovReg(addr=75, dst=ebx, imm=65)
Cmp(addr=78, dst=eax, src=ebx)
Jl(addr=81, target=105)
MovReg(addr=84, dst=ebx, imm=90)
Cmp(addr=87, dst=eax, src=ebx)
Jg(addr=90, target=105)
MovReg(addr=93, dst=ebx, imm=75)
XorReg(addr=96, dst=eax, src=ebx)
MovReg(addr=99, dst=ebx, imm=1)
SubReg(addr=102, dst=eax, src=ebx)
MovReg(addr=105, dst=ebx, imm=16)
DivReg(addr=108, dst=eax, src=ebx)
PushReg(addr=111, reg=ebx)
PushReg(addr=114, reg=eax)
MovReg(addr=117, dst=ebx, imm=1)
AddReg(addr=120, dst=ecx, src=ebx)
Jmp(addr=123, target=30)
PushImm(addr=126, imm=7)
PushImm(addr=129, imm=13)
PushImm(addr=132, imm=0)
PushImm(addr=135, imm=5)
PushImm(addr=138, imm=1)
PushImm(addr=141, imm=12)
PushImm(addr=144, imm=1)
PushImm(addr=147, imm=0)
PushImm(addr=150, imm=0)
PushImm(addr=153, imm=13)
PushImm(addr=156, imm=5)
PushImm(addr=159, imm=15)
PushImm(addr=162, imm=0)
PushImm(addr=165, imm=9)
PushImm(addr=168, imm=5)
PushImm(addr=171, imm=15)
PushImm(addr=174, imm=3)
PushImm(addr=177, imm=0)
PushImm(addr=180, imm=2)
PushImm(addr=183, imm=5)
PushImm(addr=186, imm=3)
PushImm(addr=189, imm=3)
PushImm(addr=192, imm=1)
PushImm(addr=195, imm=7)
PushImm(addr=198, imm=7)
PushImm(addr=201, imm=11)
PushImm(addr=204, imm=2)
PushImm(addr=207, imm=1)
PushImm(addr=210, imm=2)
PushImm(addr=213, imm=7)
PushImm(addr=216, imm=2)
PushImm(addr=219, imm=12)
PushImm(addr=222, imm=2)
PushImm(addr=225, imm=2)
MovReg(addr=228, dst=ecx, imm=1)
MovRegStack(addr=231, dst=ebx, src=ecx)
PopReg(addr=234, reg=eax)
Cmp(addr=237, dst=eax, src=ebx)
Jnz(addr=240, target=270)
MovReg(addr=243, dst=ebx, imm=34)
Cmp(addr=246, dst=ecx, src=ebx)
Jz(addr=249, target=264)
MovReg(addr=252, dst=ebx, imm=1)
AddReg(addr=255, dst=ecx, src=ebx)
Jmp(addr=258, target=231)
MovReg(addr=261, dst=edx, imm=0)
PrintStr(addr=264)
Exit(addr=267)
MovReg(addr=270, dst=edx, imm=1)
PrintStr(addr=273)
Exit(addr=276)
Nop(addr=279)
2:编写初步dump
from collections import namedtuple
from dataclasses import dataclass
@dataclass
class Regs(object):
idx: int
def __repr__(self):
if self.idx == 0:
return "eax"
elif self.idx == 1:
return "ebx"
elif self.idx == 2:
return "ecx"
elif self.idx == 3:
return "edx"
else:
return "unknown reg {}".format(self.idx)
Nop = namedtuple("Nop", ["addr"]) # case 0: nop
MovReg = namedtuple("MovReg", ["addr", "dst", "imm"]) # case 1: mov reg, imm
PushImm = namedtuple("PushImm", ["addr", "imm"]) # case 2: push imm
PushReg = namedtuple("PushReg", ["addr", "reg"]) # case 3: push reg
PopReg = namedtuple("PopReg", ["addr", "reg"]) # case 4: pop reg
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr = namedtuple("PrintStr", ["addr"])
AddReg = namedtuple("AddReg", ["addr", "dst", "src"]) # case 6: add reg, reg
SubReg = namedtuple("SubReg", ["addr", "dst", "src"]) # case 7: sub reg, reg
MulReg = namedtuple("MulReg", ["addr", "dst", "src"]) # case 8: mul reg, reg
DivReg = namedtuple("DivReg", ["addr", "dst", "src"]) # case 9: div reg, reg
XorReg = namedtuple("XorReg", ["addr", "dst", "src"]) # case 10: xor reg, reg
Jmp = namedtuple("Jmp", ["addr", "target"]) # case 11: jmp addr
Cmp = namedtuple("Cmp", ["addr", "dst", "src"]) # case 12: cmp reg, reg
Jz = namedtuple("Jz", ["addr", "target"]) # case 13: jz addr
Jnz = namedtuple("Jnz", ["addr", "target"]) # case 14: jnz addr
Jg = namedtuple("Jg", ["addr", "target"]) # case 15: jg addr
Jl = namedtuple("Jl", ["addr", "target"]) # case 16: jl addr
# case 17: gets(mem); eax=strlen(mem);
InputStr = namedtuple("InputStr", ["addr"])
InitMem = namedtuple(
"InitMem", ["addr", "mem_addr", "sz"]
) # case 18: memset(mem_addr, 0, sz)
MovRegStack = namedtuple(
"MovRegStack", ["addr", "dst", "src"]
) # case 19: mov reg, [ebp-src]
MovRegMem = namedtuple(
"MovRegMem", ["addr", "dst", "src"]
) # case 20: mov reg, mem[src]
Exit = namedtuple("Exit", ["addr"]) # case 0xff: exit(0)
def parse(buffer):
instructions = []
pc = 0
while pc < len(buffer):
opcode = buffer[pc]
match opcode:
case 0:
instructions.append(Nop(pc))
pc += 1
case 1:
dst = buffer[pc + 1]
imm = buffer[pc + 2]
instructions.append(MovReg(pc, Regs(dst), imm))
pc += 3
case 2:
imm = buffer[pc + 1]
instructions.append(PushImm(pc, imm))
pc += 3
case 3:
reg = buffer[pc + 1]
instructions.append(PushReg(pc, Regs(reg)))
pc += 3
case 4:
reg = buffer[pc + 1]
instructions.append(PopReg(pc, Regs(reg)))
pc += 3
case 5:
instructions.append(PrintStr(pc))
pc += 3
case 6:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(AddReg(pc, Regs(dst), Regs(src)))
pc += 3
case 7:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(SubReg(pc, Regs(dst), Regs(src)))
pc += 3
case 8:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(MulReg(pc, Regs(dst), Regs(src)))
pc += 3
case 9:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(DivReg(pc, Regs(dst), Regs(src)))
pc += 3
case 10:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(XorReg(pc, Regs(dst), Regs(src)))
pc += 3
case 11:
target = 3 * buffer[pc + 1] - 3
instructions.append(Jmp(pc, target))
pc += 3
case 12:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(Cmp(pc, Regs(dst), Regs(src)))
pc += 3
case 13:
target = 3 * buffer[pc + 1] - 3
instructions.append(Jz(pc, target))
pc += 3
case 14:
target = 3 * buffer[pc + 1] - 3
instructions.append(Jnz(pc, target))
pc += 3
case 15:
target = 3 * buffer[pc + 1] - 3
instructions.append(Jg(pc, target))
pc += 3
case 16:
target = 3 * buffer[pc + 1] - 3
instructions.append(Jl(pc, target))
pc += 3
case 17:
instructions.append(InputStr(pc))
pc += 3
case 18:
mem_addr = buffer[pc + 1]
sz = buffer[pc + 2]
instructions.append(InitMem(pc, mem_addr, sz))
pc += 3
case 19:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(MovRegStack(pc, Regs(dst), src))
pc += 3
case 20:
dst = buffer[pc + 1]
src = buffer[pc + 2]
instructions.append(MovRegMem(pc, Regs(dst), src))
pc += 3
case 255:
instructions.append(Exit(pc))
pc += 3
case _:
raise Exception(f"unknown opcode: {opcode} at {pc}")
break
return instructions
def dump(instructions):
for ins in instructions:
match ins:
case Nop(addr):
print(f"_0x{addr:04x}: nop")
case MovReg(addr, dst, imm):
print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
case PushImm(addr, imm):
print(f"_0x{addr:04x}: push 0x{imm:02x}")
case PushReg(addr, reg):
print(f"_0x{addr:04x}: push {reg}")
case PopReg(addr, reg):
print(f"_0x{addr:04x}: pop {reg}")
case PrintStr(addr):
print(f"_0x{addr:04x}: print_str")
case AddReg(addr, dst, src):
print(f"_0x{addr:04x}: add {dst}, {src}")
case SubReg(addr, dst, src):
print(f"_0x{addr:04x}: sub {dst}, {src}")
case MulReg(addr, dst, src):
print(f"_0x{addr:04x}: mul {dst}, {src}")
case DivReg(addr, dst, src):
print(f"_0x{addr:04x}: div {dst}, {src}")
case XorReg(addr, dst, src):
print(f"_0x{addr:04x}: xor {dst}, {src}")
case Jmp(addr, target):
print(f"_0x{addr:04x}: jmp _0x{target:04x}")
case Cmp(addr, dst, src):
print(f"_0x{addr:04x}: cmp {dst}, {src}")
case Jz(addr, target):
print(f"_0x{addr:04x}: jz _0x{target:04x}")
case Jnz(addr, target):
print(f"_0x{addr:04x}: jnz _0x{target:04x}")
case Jg(addr, target):
print(f"_0x{addr:04x}: jg _0x{target:04x}")
case Jl(addr, target):
print(f"_0x{addr:04x}: jl _0x{target:04x}")
case InputStr(addr):
print(f"_0x{addr:04x}: input_str")
case InitMem(addr, mem_addr, sz):
print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
case MovRegStack(addr, dst, src):
print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
case MovRegMem(addr, dst, src):
print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
case Exit(addr):
print(f"_0x{addr:04x}: exit(0)")
case _:
raise Exception(f"unknown instruction: {ins}")
break
if __name__ == '__main__':
opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
instructions = parse(opcode)
dump(instructions)
_0x0000: mov edx, 0x03
_0x0003: print_str
_0x0006: input_str
_0x0009: mov ebx, 0x11
_0x000c: cmp eax, ebx
_0x000f: jz _0x001b
_0x0012: mov edx, 0x01
_0x0015: print_str
_0x0018: exit(0)
_0x001b: mov ecx, 0x00
_0x001e: mov eax, 0x11
_0x0021: cmp eax, ecx
_0x0024: jz _0x007e
_0x0027: mov eax, mem[2]
_0x002a: mov ebx, 0x61
_0x002d: cmp eax, ebx
_0x0030: jl _0x004b
_0x0033: mov ebx, 0x7a
_0x0036: cmp eax, ebx
_0x0039: jg _0x004b
_0x003c: mov ebx, 0x47
_0x003f: xor eax, ebx
_0x0042: mov ebx, 0x01
_0x0045: add eax, ebx
_0x0048: jmp _0x0069
_0x004b: mov ebx, 0x41
_0x004e: cmp eax, ebx
_0x0051: jl _0x0069
_0x0054: mov ebx, 0x5a
_0x0057: cmp eax, ebx
_0x005a: jg _0x0069
_0x005d: mov ebx, 0x4b
_0x0060: xor eax, ebx
_0x0063: mov ebx, 0x01
_0x0066: sub eax, ebx
_0x0069: mov ebx, 0x10
_0x006c: div eax, ebx
_0x006f: push ebx
_0x0072: push eax
_0x0075: mov ebx, 0x01
_0x0078: add ecx, ebx
_0x007b: jmp _0x001e
_0x007e: push 0x07
_0x0081: push 0x0d
_0x0084: push 0x00
_0x0087: push 0x05
_0x008a: push 0x01
_0x008d: push 0x0c
_0x0090: push 0x01
_0x0093: push 0x00
_0x0096: push 0x00
_0x0099: push 0x0d
_0x009c: push 0x05
_0x009f: push 0x0f
_0x00a2: push 0x00
_0x00a5: push 0x09
_0x00a8: push 0x05
_0x00ab: push 0x0f
_0x00ae: push 0x03
_0x00b1: push 0x00
_0x00b4: push 0x02
_0x00b7: push 0x05
_0x00ba: push 0x03
_0x00bd: push 0x03
_0x00c0: push 0x01
_0x00c3: push 0x07
_0x00c6: push 0x07
_0x00c9: push 0x0b
_0x00cc: push 0x02
_0x00cf: push 0x01
_0x00d2: push 0x02
_0x00d5: push 0x07
_0x00d8: push 0x02
_0x00db: push 0x0c
_0x00de: push 0x02
_0x00e1: push 0x02
_0x00e4: mov ecx, 0x01
_0x00e7: mov ebx, [ebp-2]
_0x00ea: pop eax
_0x00ed: cmp eax, ebx
_0x00f0: jnz _0x010e
_0x00f3: mov ebx, 0x22
_0x00f6: cmp ecx, ebx
_0x00f9: jz _0x0108
_0x00fc: mov ebx, 0x01
_0x00ff: add ecx, ebx
_0x0102: jmp _0x00e7
_0x0105: mov edx, 0x00
_0x0108: print_str
_0x010b: exit(0)
_0x010e: mov edx, 0x01
_0x0111: print_str
_0x0114: exit(0)
_0x0117: nop
3:优化
– (1) 添加函数头尾
![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_QUTPQFYHWRH7AJ4.webp)
由于头和尾都是直接开始的指令,没有栈帧,我们为其添加
```python
from collections import namedtuple
from dataclasses import dataclass
......
# 优化(1): 添加main函数序言和结尾
prologue = namedtuple("prologue", [])
epilogue = namedtuple("epilogue", [])
def add_main_prologue_epilogue(instructions):
instructions.insert(0, prologue())
instructions.append(epilogue())
return instructions
def dump(instructions):
for ins in instructions:
match ins:
case prologue():
print(f"push ebp")
print(f"mov ebp, esp")
case epilogue():
print(f"mov esp, ebp")
print(f"pop ebp")
print(f"ret")
......
case _:
raise Exception(f"unknown instruction: {ins}")
break
if __name__ == '__main__':
opcode = [0x01, 0x03, 0x03, 0x05, 0x00, 0x00, 0x11, 0x00, 0x00, 0x01, 0x01, 0x11, 0x0C, 0x00, 0x01, 0x0D, 0x0A, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x02, 0x00, 0x01, 0x00, 0x11, 0x0C, 0x00, 0x02, 0x0D, 0x2B, 0x00, 0x14, 0x00, 0x02, 0x01, 0x01, 0x61, 0x0C, 0x00, 0x01, 0x10, 0x1A, 0x00, 0x01, 0x01, 0x7A, 0x0C, 0x00, 0x01, 0x0F, 0x1A, 0x00, 0x01, 0x01, 0x47, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x06, 0x00, 0x01, 0x0B, 0x24, 0x00, 0x01, 0x01, 0x41, 0x0C, 0x00, 0x01, 0x10, 0x24, 0x00, 0x01, 0x01, 0x5A, 0x0C, 0x00, 0x01, 0x0F, 0x24, 0x00, 0x01, 0x01, 0x4B, 0x0A, 0x00, 0x01, 0x01, 0x01, 0x01, 0x07, 0x00, 0x01, 0x01, 0x01, 0x10, 0x09, 0x00, 0x01, 0x03, 0x01, 0x00, 0x03, 0x00, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x0B, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x00, 0x00, 0x02, 0x05, 0x00, 0x02, 0x01, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x01, 0x00, 0x02, 0x00, 0x00, 0x02, 0x00, 0x00, 0x02, 0x0D, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x00, 0x00, 0x02, 0x09, 0x00, 0x02, 0x05, 0x00, 0x02, 0x0F, 0x00, 0x02, 0x03, 0x00, 0x02, 0x00, 0x00, 0x02, 0x02, 0x00, 0x02, 0x05, 0x00, 0x02, 0x03, 0x00, 0x02, 0x03, 0x00, 0x02, 0x01, 0x00, 0x02, 0x07, 0x00, 0x02, 0x07, 0x00, 0x02, 0x0B, 0x00, 0x02, 0x02, 0x00, 0x02, 0x01, 0x00, 0x02, 0x02, 0x00, 0x02, 0x07, 0x00, 0x02, 0x02, 0x00, 0x02, 0x0C, 0x00, 0x02, 0x02, 0x00, 0x02, 0x02, 0x00, 0x01, 0x02, 0x01, 0x13, 0x01, 0x02, 0x04, 0x00, 0x00, 0x0C, 0x00, 0x01, 0x0E, 0x5B, 0x00, 0x01, 0x01, 0x22, 0x0C, 0x02, 0x01, 0x0D, 0x59, 0x00, 0x01, 0x01, 0x01, 0x06, 0x02, 0x01, 0x0B, 0x4E, 0x00, 0x01, 0x03, 0x00, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x01, 0x03, 0x01, 0x05, 0x00, 0x00, 0xFF, 0x00, 0x00, 0x00]
instructions = parse(opcode)
instructions = add_main_prologue_epilogue(instructions)
dump(instructions)
```
– (2) 处理VM中mem及字符串
```python
.....
# VM中要使用的内存
def dump_data():
print("n")
print("""right:n .asciz "right" """)
print("""wrong:n .asciz "wrong" """)
print("""plz_input:n .asciz "plz input:" """)
print("""hacker:n .asciz "hacker" """)
print("""mem:n .space 0x100 """)
if __name__ == '__main__':
opcode = [...]
instructions = parse(opcode)
instructions = add_main_prologue_epilogue(instructions)
dump(instructions)
dump_data()
```
– (3) 处理print_str
![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_4U5PDW3GE26ETHF.webp)
我们弄出来的汇编中有这种语句
```python
# case 5: print str by edx: 0:'right', 1:'wrong', 3:'plz input:', 4:'hacker'
PrintStr = namedtuple("PrintStr", ["addr"])
```
其主要就是根据edx的值,来打印不同的字符串
难以避免的要进行函数调用,我们可以借用pwntools的shellcraft来产生:https://docs.pwntools.com/en/stable/shellcraft/i386.html#module-pwnlib.shellcraft.i386.linux
```python
from collections import namedtuple
from dataclasses import dataclass
.....
write_func_call = namedtuple("write_func_call", ["addr", "str_idx"])
# 优化(3): 处理print_str
def handle_print_str(instructions):
"""
_0x0000: mov edx, 0x03
_0x0003: print_str
_0x0012: mov edx, 0x01
_0x0015: print_str
_0x0105: mov edx, 0x00
_0x0108: print_str
_0x010e: mov edx, 0x01
_0x0111: print_str
"""
idx = 0
while idx < len(instructions):
match instructions[idx: idx+2]:
case [
MovReg(addr1, Regs(3), imm),
PrintStr(addr2)
] if (imm == 0x00 or imm == 0x01 or imm == 0x03 or imm == 0x04):
instructions[idx: idx+2] = [write_func_call(addr2, imm)]
idx += 1
def dump(instructions):
for ins in instructions:
match ins:
......
case write_func_call(addr, str_idx):
if str_idx == 0:
print_right = f"""/* write(fd=1, buf='right', n=5) */
_0x{addr:04x}: pushad
push 1
pop ebx
mov ecx, right
push 5
pop edx
push SYS_write /* 4 */
pop eax
int 0x80
popad
"""
print(print_right)
elif str_idx == 1:
print_wrong = f"""/* write(fd=1, buf='wrong', n=5) */
_0x{addr:04x}: pushad
push 1
pop ebx
mov ecx, wrong
push 5
pop edx
push SYS_write /* 4 */
pop eax
int 0x80
popad
"""
print(print_wrong)
elif str_idx == 3:
print_plz_input = f"""/* write(fd=1, buf='plz input:', n=10) */
_0x{addr:04x}: pushad
push 1
pop ebx
mov ecx, plz_input
push 10
pop edx
push SYS_write /* 4 */
pop eax
int 0x80
popad
"""
print(print_plz_input)
elif str_idx == 4:
print_hacker = f"""/* write(fd=1, buf='hacker', n=6) */
_0x{addr:04x}: pushad
push 1
pop ebx
mov ecx, hacker
push 6
pop edx
push SYS_write /* 4 */
pop eax
int 0x80
popad
"""
print(print_hacker)
case Nop(addr):
print(f"_0x{addr:04x}: nop")
case MovReg(addr, dst, imm):
print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
case PushImm(addr, imm):
print(f"_0x{addr:04x}: push 0x{imm:02x}")
case PushReg(addr, reg):
print(f"_0x{addr:04x}: push {reg}")
case PopReg(addr, reg):
print(f"_0x{addr:04x}: pop {reg}")
case PrintStr(addr):
print(f"_0x{addr:04x}: print_str")
case AddReg(addr, dst, src):
print(f"_0x{addr:04x}: add {dst}, {src}")
case SubReg(addr, dst, src):
print(f"_0x{addr:04x}: sub {dst}, {src}")
case MulReg(addr, dst, src):
print(f"_0x{addr:04x}: mul {dst}, {src}")
case DivReg(addr, dst, src):
print(f"_0x{addr:04x}: div {dst}, {src}")
case XorReg(addr, dst, src):
print(f"_0x{addr:04x}: xor {dst}, {src}")
case Jmp(addr, target):
print(f"_0x{addr:04x}: jmp _0x{target:04x}")
case Cmp(addr, dst, src):
print(f"_0x{addr:04x}: cmp {dst}, {src}")
case Jz(addr, target):
print(f"_0x{addr:04x}: jz _0x{target:04x}")
case Jnz(addr, target):
print(f"_0x{addr:04x}: jnz _0x{target:04x}")
case Jg(addr, target):
print(f"_0x{addr:04x}: jg _0x{target:04x}")
case Jl(addr, target):
print(f"_0x{addr:04x}: jl _0x{target:04x}")
case InputStr(addr):
print(f"_0x{addr:04x}: input_str")
case InitMem(addr, mem_addr, sz):
print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
case MovRegStack(addr, dst, src):
print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
case MovRegMem(addr, dst, src):
print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
case Exit(addr):
print(f"_0x{addr:04x}: exit(0)")
case _:
raise Exception(f"unknown instruction: {ins}")
break
......
```
– (4) 处理input_str
![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_X3P96QF84NKZ4PR.webp)
```python
# case 17: gets(mem); eax=strlen(mem);
InputStr = namedtuple("InputStr", ["addr"])
```
![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_JAXQMQ975NMVWXV.webp)
```python
from collections import namedtuple
from dataclasses import dataclass
......
read_strlen_func_call = namedtuple("read_func_call", ["addr"])
# 优化(4): 处理input_str
def handle_input_str(instructions):
"""
_0x0006: input_str
"""
idx = 0
while idx < len(instructions):
match instructions[idx: idx+1]:
case [
InputStr(addr)
]:
instructions[idx: idx+1] = [read_strlen_func_call(addr)]
idx += 1
def dump(instructions):
for ins in instructions:
match ins:
......
case read_strlen_func_call(addr):
print_read_strlen = f"""/* read(fd=0, buf=mem, n=0x100) */
_0x{addr:04x}: push eax
push ebx
push ecx
push edx
xor ebx, ebx
mov ecx, mem
push 0x100
pop edx
push SYS_read /* 3 */
pop eax
int 0x80
/* strlen(mem) */
mov edi, mem
xor eax, eax
push -1
pop ecx
repnz scas al, BYTE PTR [edi]
inc ecx
inc ecx
neg ecx
/* moving ecx into ecx, but this is a no-op */
mov edi, ecx
pop edx
pop ecx
pop ebx
pop eax
mov eax, edi
"""
print(print_read_strlen)
case Nop(addr):
print(f"_0x{addr:04x}: nop")
case MovReg(addr, dst, imm):
print(f"_0x{addr:04x}: mov {dst}, 0x{imm:02x}")
case PushImm(addr, imm):
print(f"_0x{addr:04x}: push 0x{imm:02x}")
case PushReg(addr, reg):
print(f"_0x{addr:04x}: push {reg}")
case PopReg(addr, reg):
print(f"_0x{addr:04x}: pop {reg}")
case PrintStr(addr):
print(f"_0x{addr:04x}: print_str")
case AddReg(addr, dst, src):
print(f"_0x{addr:04x}: add {dst}, {src}")
case SubReg(addr, dst, src):
print(f"_0x{addr:04x}: sub {dst}, {src}")
case MulReg(addr, dst, src):
print(f"_0x{addr:04x}: mul {dst}, {src}")
case DivReg(addr, dst, src):
print(f"_0x{addr:04x}: div {dst}, {src}")
case XorReg(addr, dst, src):
print(f"_0x{addr:04x}: xor {dst}, {src}")
case Jmp(addr, target):
print(f"_0x{addr:04x}: jmp _0x{target:04x}")
case Cmp(addr, dst, src):
print(f"_0x{addr:04x}: cmp {dst}, {src}")
case Jz(addr, target):
print(f"_0x{addr:04x}: jz _0x{target:04x}")
case Jnz(addr, target):
print(f"_0x{addr:04x}: jnz _0x{target:04x}")
case Jg(addr, target):
print(f"_0x{addr:04x}: jg _0x{target:04x}")
case Jl(addr, target):
print(f"_0x{addr:04x}: jl _0x{target:04x}")
case InputStr(addr):
print(f"_0x{addr:04x}: input_str")
case InitMem(addr, mem_addr, sz):
print(f"_0x{addr:04x}: memset(0x{mem_addr:02x},0,{sz})")
case MovRegStack(addr, dst, src):
print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
case MovRegMem(addr, dst, src):
print(f"_0x{addr:04x}: mov {dst}, mem[{src}]")
case Exit(addr):
print(f"_0x{addr:04x}: exit(0)")
case _:
raise Exception(f"unknown instruction: {ins}")
break
# 优化(2): VM中要使用的内存
def dump_data():
print("n")
print("""right:n .asciz "right" """)
print("""wrong:n .asciz "wrong" """)
print("""plz_input:n .asciz "plz input:" """)
print("""hacker:n .asciz "hacker" """)
print("""mem:n .space 0x100 """)
if __name__ == '__main__':
opcode = [.....]
instructions = parse(opcode)
instructions = add_main_prologue_epilogue(instructions)
handle_print_str(instructions)
handle_input_str(instructions)
dump(instructions)
dump_data()
```
– (5) 处理exit(0)
![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_XYKFP696Y5UC9UJ.webp)
```python
case Exit(addr):
print(f"""/* exit(status=0) */
_0x{addr:04x}: xor ebx, ebx
push SYS_exit /* 1 */
pop eax
int 0x80
""")
```
– (6) 优化mov ebx, [ebp-ecx]
这种asm是会报错的
![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_MVYBK4BXJK84HFF.webp)
换成如下这种
![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_9RHVZ9B8HUX8NJE.webp)
![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_4BK2G6J8HPXACSM.webp)
```python
case MovRegStack(addr, dst, src):
# print(f"_0x{addr:04x}: mov {dst}, [ebp-{src}]")
print(f"_0x{addr:04x}: mov {dst}, ebp")
print(f" sub {dst}, {src}")
print(f" mov {dst}, [{dst}]")
```
– (7) 优化_0x006c: div eax, ebx
![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_2T8PCPWA695R3CV.webp)
正常的div ebx执行之后,商将存储在 eax 寄存器中,余数将存储在 edx 寄存器中
它的div有所不同,是存到eax和ebx中的
![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_P25BCYVPP4BEK4M.webp)
我们还需要在div eax, ebx后面,加一条mov ebx, edx
![图片描述](https://bbs.kanxue.com/upload/attach/202311/921830_PV5TZ5K9T466DWK.webp)
4:调用pwntools make_elf
from ast import dump
from pwn import *
code = """
push ebp
mov ebp, esp
.....
ret
right:
.asciz "right"
wrong:
.asciz "wrong"
plz_input:
.asciz "plz input:"
hacker:
.asciz "hacker"
mem:
.space 0x100
"""
elf = make_elf_from_assembly(code)
print(elf)
效果:
看雪ID:SYJ-Re
https://bbs.kanxue.com/user-home-921830.htm
# 往期推荐
2、在Windows平台使用VS2022的MSVC编译LLVM16
3、神挡杀神——揭开世界第一手游保护nProtect的神秘面纱
球分享
球点赞
球在看
点击阅读原文查看更多
原文始发于微信公众号(看雪学苑):Python310新特性:Structural Pattern Matching在VM虚拟机逆向中的妙用