野蛮fuzz:提升性能





简介


在这一期的“野蛮fuzz”中,我们将专注于提升我们之前模糊测试器的性能。这意味着不会有任何大规模的变更,我们只是希望在之前的基础上进行改进。因此,在这篇博客文章结束时,我们仍然会得到一个非常基础的变异模糊测试器(希望它能更快!),并且希望在不同的目标上发现更多的漏洞。我们不会在这篇文章中涉及多线程或多进程的内容,这些将留待后续的模糊测试文章中讨论。


我需要在这里添加一个免责声明,我并不是一个专业的开发人员,离这个目标还很远。目前我在编程方面的经验还不足以像一个更有经验的程序员那样识别出提升性能的机会。我将使用我粗糙的技能和有限的编程知识来改进我们之前的模糊测试器,仅此而已。生成的代码不会很漂亮,也不会很完美,但它会比我们在上一篇文章中的代码更好。还需要提到的是,所有的测试都是在一台配有1个CPU和1个核心的x86 Kali虚拟机上使用 VMWare Workstation 进行的。


我们也需要在本文的上下文中定义“更好”的含义。我在这里所说的“更好”是指我们能够更快地完成n次模糊测试迭代,仅此而已。我们会在以后的时间里重新编写模糊测试器,使用一种酷炫的语言,选择一个强化的目标,并采用更先进的模糊测试技术。

显然,如果你没有读过上一篇文章,你会感到迷茫!




分析我们的模糊测试器


我们上一个模糊测试器相当简单,但有效!我们在目标中发现了一些漏洞。但我们知道,当我们交作业时,留下一些优化的空间。让我们再来看看上一篇文章中的模糊测试器(为了测试目的做了一些小改动):


#!/usr/bin/env python3
import sys
import random
from pexpect import run
from pipes import quote

# read bytes from our valid JPEG and return them in a mutable bytearray
def get_bytes(filename):

f = open(filename, "rb").read()

return bytearray(f)

def bit_flip(data):

num_of_flips = int((len(data) - 4) * .01)

indexes = range(4, (len(data) - 4))

chosen_indexes = []

# iterate selecting indexes until we've hit our num_of_flips number
counter = 0
while counter < num_of_flips:
chosen_indexes.append(random.choice(indexes))
counter += 1

for x in chosen_indexes:
current = data[x]
current = (bin(current).replace("0b",""))
current = "0" * (8 - len(current)) + current

indexes = range(0,8)

picked_index = random.choice(indexes)

new_number = []

# our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0']
for i in current:
new_number.append(i)

# if the number at our randomly selected index is a 1, make it a 0, and vice versa
if new_number[picked_index] == "1":
new_number[picked_index] = "0"
else:
new_number[picked_index] = "1"

# create our new binary string of our bit-flipped number
current = ''
for i in new_number:
current += i

# convert that string to an integer
current = int(current,2)

# change the number in our byte array to our new number we just constructed
data[x] = current

return data

def magic(data):

magic_vals = [
(1, 255),
(1, 255),
(1, 127),
(1, 0),
(2, 255),
(2, 0),
(4, 255),
(4, 0),
(4, 128),
(4, 64),
(4, 127)
]

picked_magic = random.choice(magic_vals)

length = len(data) - 8
index = range(0, length)
picked_index = random.choice(index)

# here we are hardcoding all the byte overwrites for all of the tuples that begin (1, )
if picked_magic[0] == 1:
if picked_magic[1] == 255: # 0xFF
data[picked_index] = 255
elif picked_magic[1] == 127: # 0x7F
data[picked_index] = 127
elif picked_magic[1] == 0: # 0x00
data[picked_index] = 0

# here we are hardcoding all the byte overwrites for all of the tuples that begin (2, )
elif picked_magic[0] == 2:
if picked_magic[1] == 255: # 0xFFFF
data[picked_index] = 255
data[picked_index + 1] = 255
elif picked_magic[1] == 0: # 0x0000
data[picked_index] = 0
data[picked_index + 1] = 0

# here we are hardcoding all of the byte overwrites for all of the tuples that being (4, )
elif picked_magic[0] == 4:
if picked_magic[1] == 255: # 0xFFFFFFFF
data[picked_index] = 255
data[picked_index + 1] = 255
data[picked_index + 2] = 255
data[picked_index + 3] = 255
elif picked_magic[1] == 0: # 0x00000000
data[picked_index] = 0
data[picked_index + 1] = 0
data[picked_index + 2] = 0
data[picked_index + 3] = 0
elif picked_magic[1] == 128: # 0x80000000
data[picked_index] = 128
data[picked_index + 1] = 0
data[picked_index + 2] = 0
data[picked_index + 3] = 0
elif picked_magic[1] == 64: # 0x40000000
data[picked_index] = 64
data[picked_index + 1] = 0
data[picked_index + 2] = 0
data[picked_index + 3] = 0
elif picked_magic[1] == 127: # 0x7FFFFFFF
data[picked_index] = 127
data[picked_index + 1] = 255
data[picked_index + 2] = 255
data[picked_index + 3] = 255

return data

# create new jpg with mutated data
def create_new(data):

f = open("mutated.jpg", "wb+")
f.write(data)
f.close()

def exif(counter,data):

command = "exif mutated.jpg -verbose"

out, returncode = run("sh -c " + quote(command), withexitstatus=1)

if b"Segmentation" in out:
f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
f.write(data)
print("Segfault!")

#if counter % 100 == 0:
# print(counter, end="r")

if len(sys.argv) < 2:
print("Usage: JPEGfuzz.py <valid_jpg>")

else:
filename = sys.argv[1]
counter = 0
while counter < 1000:
data = get_bytes(filename)
functions = [0, 1]
picked_function = random.choice(functions)
picked_function = 1
if picked_function == 0:
mutated = magic(data)
create_new(mutated)
exif(counter,mutated)
else:
mutated = bit_flip(data)
create_new(mutated)
exif(counter,mutated)

counter += 1


你可能注意到了一些变化。我们做了以下改动:


◆注释掉了每100次迭代打印一次计数器的语句。

◆添加了打印语句,用于通知我们是否发生了段错误(Segfault)。

◆硬编码了1000次迭代。

◆临时添加了这行代码:picked_function = 1,以便在测试中消除任何随机性,我们只使用一种变异方法(bit_flip())。


让我们使用一些性能分析工具运行这个版本的模糊测试器,这样我们可以真正分析程序执行过程中在哪些地方花费了最多的时间。


我们可以利用Python的cProfile模块,看看在1000次模糊测试迭代中,我们在哪些地方花费了时间。如果你还记得,这个程序需要一个有效的JPEG文件路径作为参数,所以完整的命令行语法将是:python3 -m cProfile -s cumtime JPEGfuzzer.py ~/jpegs/Canon_40D.jpg


还需要注意的是,添加这个cProfile性能分析工具可能会降低性能。我在没有使用它的情况下进行了测试,对于我们在本文中使用的迭代次数,它似乎没有显著的影响。


运行这个程序后,我们可以看到程序的输出,并了解到执行过程中花费时间最多的地方。


2476093 function calls (2474812 primitive calls) in 122.084 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
33/1 0.000 0.000 122.084 122.084 {built-in method builtins.exec}
1 0.108 0.108 122.084 122.084 blog.py:3(<module>)
1000 0.090 0.000 118.622 0.119 blog.py:140(exif)
1000 0.080 0.000 118.452 0.118 run.py:7(run)
5432 103.761 0.019 103.761 0.019 {built-in method time.sleep}
1000 0.028 0.000 100.923 0.101 pty_spawn.py:316(close)
1000 0.025 0.000 100.816 0.101 ptyprocess.py:387(close)
1000 0.061 0.000 9.949 0.010 pty_spawn.py:36(__init__)
1000 0.074 0.000 9.764 0.010 pty_spawn.py:239(_spawn)
1000 0.041 0.000 8.682 0.009 pty_spawn.py:312(_spawnpty)
1000 0.266 0.000 8.641 0.009 ptyprocess.py:178(spawn)
1000 0.011 0.000 7.491 0.007 spawnbase.py:240(expect)
1000 0.036 0.000 7.479 0.007 spawnbase.py:343(expect_list)
1000 0.128 0.000 7.409 0.007 expect.py:91(expect_loop)
6432 6.473 0.001 6.473 0.001 {built-in method posix.read}
5432 0.089 0.000 3.818 0.001 pty_spawn.py:415(read_nonblocking)
7348 0.029 0.000 3.162 0.000 utils.py:130(select_ignore_interrupts)
7348 3.127 0.000 3.127 0.000 {built-in method select.select}
1000 0.790 0.001 1.777 0.002 blog.py:15(bit_flip)
1000 0.015 0.000 1.311 0.001 blog.py:134(create_new)
1000 0.100 0.000 1.101 0.001 pty.py:79(fork)
1000 1.000 0.001 1.000 0.001 {built-in method posix.forkpty}
-----SNIP-----


对于这种类型的分析,我们并不太关心发生了多少次段错误(segfault),因为我们并没有对变异方法进行太多调整或比较不同的方法。当然,这里会有一些随机性,因为崩溃会导致额外的处理,但目前这样做已经足够了。


我只截取了那些累计花费时间超过1.0秒的代码部分。你可以看到,我们在blog.py:140(exif)上花费了最多的时间。总共122秒中,有118秒花费在这个函数上。显然,我们的exif()函数是性能的主要问题。


我们可以看到,大部分时间都花费在这个函数内部,这直接与函数本身有关。我们看到大量调用了pty模块,这是由于我们使用了pexpect。让我们使用subprocess模块中的Popen重写这个函数,看看是否能在这里提升性能!


以下是重新定义的exif()函数:


def exif(counter,data):

p = Popen(["exif", "mutated.jpg", "-verbose"], stdout=PIPE, stderr=PIPE)
(out,err) = p.communicate()

if p.returncode == -11:
f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
f.write(data)
print("Segfault!")

#if counter % 100 == 0:
# print(counter, end="r")


以下是我们的性能报告:


2065580 function calls (2065443 primitive calls) in 2.756 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
15/1 0.000 0.000 2.756 2.756 {built-in method builtins.exec}
1 0.038 0.038 2.756 2.756 subpro.py:3(<module>)
1000 0.020 0.000 1.917 0.002 subpro.py:139(exif)
1000 0.026 0.000 1.121 0.001 subprocess.py:681(__init__)
1000 0.099 0.000 1.045 0.001 subprocess.py:1412(_execute_child)
-----SNIP-----


多么大的差别啊。这个模糊测试器在重新定义了exif()函数后,只用了2秒钟就完成了相同的工作量!太不可思议了!旧的模糊测试器需要122秒,而新的只需要2.7秒。





进一步优化Python代码


让我们尝试在Python中继续优化我们的模糊测试器。首先,让我们获得一个好的基准来进行对比。我们将让优化后的Python模糊测试器进行50,000次迭代,并再次使用cProfile模块获取一些细粒度的统计数据,看看我们在哪些地方花费了时间。


102981395 function calls (102981258 primitive calls) in 141.488 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
15/1 0.000 0.000 141.488 141.488 {built-in method builtins.exec}
1 1.724 1.724 141.488 141.488 subpro.py:3(<module>)
50000 0.992 0.000 102.588 0.002 subpro.py:139(exif)
50000 1.248 0.000 61.562 0.001 subprocess.py:681(__init__)
50000 5.034 0.000 57.826 0.001 subprocess.py:1412(_execute_child)
50000 0.437 0.000 39.586 0.001 subprocess.py:920(communicate)
50000 2.527 0.000 39.064 0.001 subprocess.py:1662(_communicate)
208254 37.508 0.000 37.508 0.000 {built-in method posix.read}
158238 0.577 0.000 28.809 0.000 selectors.py:402(select)
158238 28.131 0.000 28.131 0.000 {method 'poll' of 'select.poll' objects}
50000 11.784 0.000 25.819 0.001 subpro.py:14(bit_flip)
7950000 3.666 0.000 10.431 0.000 random.py:256(choice)
50000 8.421 0.000 8.421 0.000 {built-in method _posixsubprocess.fork_exec}
50000 0.162 0.000 7.358 0.000 subpro.py:133(create_new)
7950000 4.096 0.000 6.130 0.000 random.py:224(_randbelow)
203090 5.016 0.000 5.016 0.000 {built-in method io.open}
50000 4.211 0.000 4.211 0.000 {method 'close' of '_io.BufferedRandom' objects}
50000 1.643 0.000 4.194 0.000 os.py:617(get_exec_path)
50000 1.733 0.000 3.356 0.000 subpro.py:8(get_bytes)
35866791 2.635 0.000 2.635 0.000 {method 'append' of 'list' objects}
100000 0.070 0.000 1.960 0.000 subprocess.py:1014(wait)
100000 0.252 0.000 1.902 0.000 selectors.py:351(register)
100000 0.444 0.000 1.890 0.000 subprocess.py:1621(_wait)
100000 0.675 0.000 1.583 0.000 selectors.py:234(register)
350000 0.432 0.000 1.501 0.000 subprocess.py:1471(<genexpr>)
12074141 1.434 0.000 1.434 0.000 {method 'getrandbits' of '_random.Random' objects}
50000 0.059 0.000 1.358 0.000 subprocess.py:1608(_try_wait)
50000 1.299 0.000 1.299 0.000 {built-in method posix.waitpid}
100000 0.488 0.000 1.058 0.000 os.py:674(__getitem__)
100000 1.017 0.000 1.017 0.000 {method 'close' of '_io.BufferedReader' objects}
-----SNIP-----


50,000次迭代总共花费了141秒,这相比之前的表现已经非常好了。我们之前需要122秒来完成1,000次迭代!再次过滤掉花费时间超过1.0秒的部分,我们发现大部分时间仍然花费在exif()函数上,但我们也看到在bit_flip()函数上有一些性能问题,因为我们在那里累计花费了25秒。让我们尝试优化一下这个函数。


下面我们来回顾一下旧的bit_flip()函数的样子:


def bit_flip(data):

num_of_flips = int((len(data) - 4) * .01)

indexes = range(4, (len(data) - 4))

chosen_indexes = []

# iterate selecting indexes until we've hit our num_of_flips number
counter = 0
while counter < num_of_flips:
chosen_indexes.append(random.choice(indexes))
counter += 1

for x in chosen_indexes:
current = data[x]
current = (bin(current).replace("0b",""))
current = "0" * (8 - len(current)) + current

indexes = range(0,8)

picked_index = random.choice(indexes)

new_number = []

# our new_number list now has all the digits, example: ['1', '0', '1', '0', '1', '0', '1', '0']
for i in current:
new_number.append(i)

# if the number at our randomly selected index is a 1, make it a 0, and vice versa
if new_number[picked_index] == "1":
new_number[picked_index] = "0"
else:
new_number[picked_index] = "1"

# create our new binary string of our bit-flipped number
current = ''
for i in new_number:
current += i

# convert that string to an integer
current = int(current,2)

# change the number in our byte array to our new number we just constructed
data[x] = current

return data


这个函数确实有点笨拙。通过使用更好的逻辑,我们可以大大简化它。根据我有限的编程经验,我发现这种情况经常发生:你可以拥有所有复杂难懂的编程知识,但如果程序背后的逻辑不合理,那么程序的性能就会受到影响。


让我们减少类型转换的次数,例如从整数转换为字符串或反之亦然,并且减少代码量。我们可以通过重新定义bit_flip()函数来实现我们的目标,如下所示:


def bit_flip(data):

length = len(data) - 4

num_of_flips = int(length * .01)

picked_indexes = []

flip_array = [1,2,4,8,16,32,64,128]

counter = 0
while counter < num_of_flips:
picked_indexes.append(random.choice(range(0,length)))
counter += 1


for x in picked_indexes:
mask = random.choice(flip_array)
data[x] = data[x] ^ mask

return data


如果我们采用这个新函数并监控结果,我们得到的性能评分是:


59376275 function calls (59376138 primitive calls) in 135.582 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
15/1 0.000 0.000 135.582 135.582 {built-in method builtins.exec}
1 1.940 1.940 135.582 135.582 subpro.py:3(<module>)
50000 0.978 0.000 107.857 0.002 subpro.py:111(exif)
50000 1.450 0.000 64.236 0.001 subprocess.py:681(__init__)
50000 5.566 0.000 60.141 0.001 subprocess.py:1412(_execute_child)
50000 0.534 0.000 42.259 0.001 subprocess.py:920(communicate)
50000 2.827 0.000 41.637 0.001 subprocess.py:1662(_communicate)
199549 38.249 0.000 38.249 0.000 {built-in method posix.read}
149537 0.555 0.000 30.376 0.000 selectors.py:402(select)
149537 29.722 0.000 29.722 0.000 {method 'poll' of 'select.poll' objects}
50000 3.993 0.000 14.471 0.000 subpro.py:14(bit_flip)
7950000 3.741 0.000 10.316 0.000 random.py:256(choice)
50000 9.973 0.000 9.973 0.000 {built-in method _posixsubprocess.fork_exec}
50000 0.163 0.000 7.034 0.000 subpro.py:105(create_new)
7950000 3.987 0.000 5.952 0.000 random.py:224(_randbelow)
202567 4.966 0.000 4.966 0.000 {built-in method io.open}
50000 4.042 0.000 4.042 0.000 {method 'close' of '_io.BufferedRandom' objects}
50000 1.539 0.000 3.828 0.000 os.py:617(get_exec_path)
50000 1.843 0.000 3.607 0.000 subpro.py:8(get_bytes)
100000 0.074 0.000 2.133 0.000 subprocess.py:1014(wait)
100000 0.463 0.000 2.059 0.000 subprocess.py:1621(_wait)
100000 0.274 0.000 2.046 0.000 selectors.py:351(register)
100000 0.782 0.000 1.702 0.000 selectors.py:234(register)
50000 0.055 0.000 1.507 0.000 subprocess.py:1608(_try_wait)
50000 1.452 0.000 1.452 0.000 {built-in method posix.waitpid}
350000 0.424 0.000 1.436 0.000 subprocess.py:1471(<genexpr>)
12066317 1.339 0.000 1.339 0.000 {method 'getrandbits' of '_random.Random' objects}
100000 0.466 0.000 1.048 0.000 os.py:674(__getitem__)
100000 1.014 0.000 1.014 0.000 {method 'close' of '_io.BufferedReader' objects}
-----SNIP-----


从指标中可以看出,我们现在在bit_flip()函数上累计只花费了14秒!在上一次测试中,我们在这里花费了25秒,现在几乎快了一倍。我认为我们在这里的优化做得很好。


现在我们有了理想的Python基准测试(请记住,可能还有多进程或多线程的机会,但我们将这个想法留到以后),让我们将模糊测试器移植到一个新语言——C++,并测试其性能。





C++中的新模糊测试器


首先,让我们直接运行我们新优化的Python模糊测试器进行100,000次迭代,看看总共需要多长时间。


118749892 function calls (118749755 primitive calls) in 256.881 seconds


100,000次迭代只用了256秒!这比我们之前的模糊测试器快得多。


这将是我们在C++中尝试超越的基准。现在,尽管我对Python开发的细微差别不太熟悉,但如果将这种不熟悉程度乘以10,你就会知道我对C++的不熟悉程度。这段代码可能对某些人来说很可笑,但这是我目前能做到的最好水平,我们可以解释每个函数与之前的Python代码的关系。


让我们逐个函数地描述它们的实现。


//
// this function simply creates a stream by opening a file in binary mode;
// finds the end of file, creates a string 'data', resizes data to be the same
// size as the file moves the file pointer back to the beginning of the file;
// reads the data from the into the data string;
//
std::string get_bytes(std::string filename)
{
std::ifstream fin(filename, std::ios::binary);

if (fin.is_open())
{
fin.seekg(0, std::ios::end);
std::string data;
data.resize(fin.tellg());
fin.seekg(0, std::ios::beg);
fin.read(&data[0], data.size());

return data;
}

else
{
std::cout << "Failed to open " << filename << ".n";
exit(1);
}

}


正如我的注释所说,这个函数只是从目标文件中检索一个字节字符串,在我们的测试中,目标文件仍然是Canon_40D.jpg


//
// this will take 1% of the bytes from our valid jpeg and
// flip a random bit in the byte and return the altered string
//
std::string bit_flip(std::string data)
{

int size = (data.length() - 4);
int num_of_flips = (int)(size * .01);

// get a vector full of 1% of random byte indexes
std::vector<int> picked_indexes;
for (int i = 0; i < num_of_flips; i++)
{
int picked_index = rand() % size;
picked_indexes.push_back(picked_index);
}

// iterate through the data string at those indexes and flip a bit
for (int i = 0; i < picked_indexes.size(); ++i)
{
int index = picked_indexes[i];
char current = data.at(index);
int decimal = ((int)current & 0xff);

int bit_to_flip = rand() % 8;

decimal ^= 1 << bit_to_flip;
decimal &= 0xff;

data[index] = (char)decimal;
}

return data;
}


这个函数是我们Python脚本中bit_flip()函数的直接等效实现。


//
// takes mutated string and creates new jpeg with it;
//
void create_new(std::string mutated)
{
std::ofstream fout("mutated.jpg", std::ios::binary);

if (fout.is_open())
{
fout.seekp(0, std::ios::beg);
fout.write(&mutated[0], mutated.size());
}
else
{
std::cout << "Failed to create mutated.jpg" << ".n";
exit(1);
}
}


这个函数将简单地创建一个临时的mutated.jpg文件,类似于我们在Python脚本中使用的create_new()函数。


//
// function to run a system command and store the output as a string;
// https://www.jeremymorgan.com/tutorials/c-programming/how-to-capture-the-output-of-a-linux-command-in-c/
//
std::string get_output(std::string cmd)
{
std::string output;
FILE * stream;
char buffer[256];

stream = popen(cmd.c_str(), "r");
if (stream)
{
while (!feof(stream))
if (fgets(buffer, 256, stream) != NULL) output.append(buffer);
pclose(stream);
}

return output;

}

//
// we actually run our exiv2 command via the get_output() func;
// retrieve the output in the form of a string and then we can parse the string;
// we'll save all the outputs that result in a segfault or floating point except;
//
void exif(std::string mutated, int counter)
{
std::string command = "exif mutated.jpg -verbose 2>&1";

std::string output = get_output(command);

std::string segfault = "Segmentation";
std::string floating_point = "Floating";

std::size_t pos1 = output.find(segfault);
std::size_t pos2 = output.find(floating_point);

if (pos1 != -1)
{
std::cout << "Segfault!n";
std::ostringstream oss;
oss << "/root/cppcrashes/crash." << counter << ".jpg";
std::string filename = oss.str();
std::ofstream fout(filename, std::ios::binary);

if (fout.is_open())
{
fout.seekp(0, std::ios::beg);
fout.write(&mutated[0], mutated.size());
}
else
{
std::cout << "Failed to create " << filename << ".jpg" << ".n";
exit(1);
}
}
else if (pos2 != -1)
{
std::cout << "Floating Point!n";
std::ostringstream oss;
oss << "/root/cppcrashes/crash." << counter << ".jpg";
std::string filename = oss.str();
std::ofstream fout(filename, std::ios::binary);

if (fout.is_open())
{
fout.seekp(0, std::ios::beg);
fout.write(&mutated[0], mutated.size());
}
else
{
std::cout << "Failed to create " << filename << ".jpg" << ".n";
exit(1);
}
}
}


这两个函数协同工作。get_output函数将一个C++字符串作为参数,并在操作系统上运行该命令并捕获输出。然后,该函数将输出作为字符串返回给调用函数exif()


exif()函数将接收输出并查找分段错误(Segmentation fault)或浮点异常(Floating point exception)错误,如果发现这些错误,将把这些字节写入一个文件并保存为crash.<counter>.jpg文件。这与我们的Python模糊测试器非常相似。


//
// simply generates a vector of strings that are our 'magic' values;
//
std::vector<std::string> vector_gen()
{
std::vector<std::string> magic;

using namespace std::string_literals;

magic.push_back("xff");
magic.push_back("x7f");
magic.push_back("x00"s);
magic.push_back("xffxff");
magic.push_back("x7fxff");
magic.push_back("x00x00"s);
magic.push_back("xffxffxffxff");
magic.push_back("x80x00x00x00"s);
magic.push_back("x40x00x00x00"s);
magic.push_back("x7fxffxffxff");

return magic;
}

//
// randomly picks a magic value from the vector and overwrites that many bytes in the image;
//
std::string magic(std::string data, std::vector<std::string> magic)
{

int vector_size = magic.size();
int picked_magic_index = rand() % vector_size;
std::string picked_magic = magic[picked_magic_index];
int size = (data.length() - 4);
int picked_data_index = rand() % size;
data.replace(picked_data_index, magic[picked_magic_index].length(), magic[picked_magic_index]);

return data;

}

//
// returns 0 or 1;
//
int func_pick()
{
int result = rand() % 2;

return result;
}


些函数也与我们的Python实现非常相似。vector_gen()基本上只是创建了我们的“魔术值”向量,然后后续的函数如magic()使用该向量随机选择一个索引,并相应地用变异数据覆盖有效的JPEG数据。


func_pick()非常简单,只返回01,这样我们的模糊测试器可以随机选择bit_flip()magic()来变异我们的有效JPEG。为了保持一致性,让我们的模糊测试器暂时只选择bit_flip(),通过在程序中添加一行临时代码function = 1,这样我们就能与Python测试匹配。


以下是我们的main()函数,它执行我们目前为止的所有代码:


int main(int argc, char** argv)
{

if (argc < 3)
{
std::cout << "Usage: ./cppfuzz <valid jpeg> <number_of_fuzzing_iterations>n";
std::cout << "Usage: ./cppfuzz Canon_40D.jpg 10000n";
return 1;
}

// start timer
auto start = std::chrono::high_resolution_clock::now();

// initialize our random seed
srand((unsigned)time(NULL));

// generate our vector of magic numbers
std::vector<std::string> magic_vector = vector_gen();

std::string filename = argv[1];
int iterations = atoi(argv[2]);

int counter = 0;
while (counter < iterations)
{

std::string data = get_bytes(filename);

int function = func_pick();
function = 1;
if (function == 0)
{
// utilize the magic mutation method; create new jpg; send to exiv2
std::string mutated = magic(data, magic_vector);
create_new(mutated);
exif(mutated,counter);
counter++;
}
else
{
// utilize the bit flip mutation; create new jpg; send to exiv2
std::string mutated = bit_flip(data);
create_new(mutated);
exif(mutated,counter);
counter++;
}
}

// stop timer and print execution time
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(stop - start);
std::cout << "Execution Time: " << duration.count() << "msn";

return 0;
}


我们从命令行参数中获取一个有效的JPEG文件进行变异,并设置模糊测试的迭代次数。然后,我们使用std::chrono命名空间中的一些计时机制来测量程序的执行时间。


这里我们有点作弊,只选择bit_flip()类型的变异,但这也是我们在Python中所做的,所以我们希望进行一个“苹果对苹果”的比较。


让我们继续进行100,000次迭代,并将其与Python模糊测试器的256秒基准进行比较。


一旦我们运行了C++模糊测试器,我们会得到一个以毫秒为单位的打印时间:执行时间:172638毫秒或172秒。


所以我们轻松地用新的C++模糊测试器击败了我们的Python模糊测试器!这太令人兴奋了。让我们来做一些数学计算:172/256 = 67%。因此,我们的C++实现大约快了33%。(天啊,我希望你不是某个智商200的数学天才,看到这些会呕吐在键盘上)。


让我们带着优化后的Python和C++模糊测试器来挑战一个新目标!





选择一个新目标


看看Kali Linux预装了什么,因为那是我们的操作环境,让我们瞄准位于/usr/bin/exiv2exiv2


root@kali:~# exiv2 -h
Usage: exiv2 [ options ] [ action ] file ...

Manipulate the Exif metadata of images.

Actions:
ad | adjust Adjust Exif timestamps by the given time. This action
requires at least one of the -a, -Y, -O or -D options.
pr | print Print image metadata.
rm | delete Delete image metadata from the files.
in | insert Insert metadata from corresponding *.exv files.
Use option -S to change the suffix of the input files.
ex | extract Extract metadata to *.exv, *.xmp and thumbnail image files.
mv | rename Rename files and/or set file timestamps according to the
Exif create timestamp. The filename format can be set with
-r format, timestamp options are controlled with -t and -T.
mo | modify Apply commands to modify (add, set, delete) the Exif and
IPTC metadata of image files or set the JPEG comment.
Requires option -c, -m or -M.
fi | fixiso Copy ISO setting from the Nikon Makernote to the regular
Exif tag.
fc | fixcom Convert the UNICODE Exif user comment to UCS-2. Its current
character encoding can be specified with the -n option.

Options:
-h Display this help and exit.
-V Show the program version and exit.
-v Be verbose during the program run.
-q Silence warnings and error messages during the program run (quiet).
-Q lvl Set log-level to d(ebug), i(nfo), w(arning), e(rror) or m(ute).
-b Show large binary values.
-u Show unknown tags.
-g key Only output info for this key (grep).
-K key Only output info for this key (exact match).
-n enc Charset to use to decode UNICODE Exif user comments.
-k Preserve file timestamps (keep).
-t Also set the file timestamp in 'rename' action (overrides -k).
-T Only set the file timestamp in 'rename' action, do not rename
the file (overrides -k).
-f Do not prompt before overwriting existing files (force).
-F Do not prompt before renaming files (Force).
-a time Time adjustment in the format [-]HH[:MM[:SS]]. This option
is only used with the 'adjust' action.
-Y yrs Year adjustment with the 'adjust' action.
-O mon Month adjustment with the 'adjust' action.
-D day Day adjustment with the 'adjust' action.
-p mode Print mode for the 'print' action. Possible modes are:
s : print a summary of the Exif metadata (the default)
a : print Exif, IPTC and XMP metadata (shortcut for -Pkyct)
t : interpreted (translated) Exif data (-PEkyct)
v : plain Exif data values (-PExgnycv)
h : hexdump of the Exif data (-PExgnycsh)
i : IPTC data values (-PIkyct)
x : XMP properties (-PXkyct)
c : JPEG comment
p : list available previews
S : print structure of image
X : extract XMP from image
-P flgs Print flags for fine control of tag lists ('print' action):
E : include Exif tags in the list
I : IPTC datasets
X : XMP properties
x : print a column with the tag number
g : group name
k : key
l : tag label
n : tag name
y : type
c : number of components (count)
s : size in bytes
v : plain data value
t : interpreted (translated) data
h : hexdump of the data
-d tgt Delete target(s) for the 'delete' action. Possible targets are:
a : all supported metadata (the default)
e : Exif section
t : Exif thumbnail only
i : IPTC data
x : XMP packet
c : JPEG comment
-i tgt Insert target(s) for the 'insert' action. Possible targets are
the same as those for the -d option, plus a modifier:
X : Insert metadata from an XMP sidecar file <file>.xmp
Only JPEG thumbnails can be inserted, they need to be named
<file>-thumb.jpg
-e tgt Extract target(s) for the 'extract' action. Possible targets
are the same as those for the -d option, plus a target to extract
preview images and a modifier to generate an XMP sidecar file:
p[<n>[,<m> ...]] : Extract preview images.
X : Extract metadata to an XMP sidecar file <file>.xmp
-r fmt Filename format for the 'rename' action. The format string
follows strftime(3). The following keywords are supported:
:basename: - original filename without extension
:dirname: - name of the directory holding the original file
:parentname: - name of parent directory
Default filename format is %Y%m%d_%H%M%S.
-c txt JPEG comment string to set in the image.
-m file Command file for the modify action. The format for commands is
set|add|del <key> [[<type>] <value>].
-M cmd Command line for the modify action. The format for the
commands is the same as that of the lines of a command file.
-l dir Location (directory) for files to be inserted from or extracted to.
-S .suf Use suffix .suf for source files for insert command.


看帮助指南后,让我们随机试一下pr(打印图像元数据)和-v(程序运行时详细输出)。从帮助指南中可以看到,这里有很多攻击面可供我们探索,但目前让我们保持简单。


现在,我们的模糊测试器中的命令字符串将类似于exiv2 pr -v mutated.jpg


让我们继续更新我们的模糊测试器,看看是否能在这个更难的目标上找到更多的漏洞。值得一提的是,这个目标目前是受支持的,并不像我们上一个目标那样是一个在Github上未受支持的7年老项目,这次的目标并不简单。


这个目标已经被更高级的模糊测试器测试过了,你可以简单地在谷歌上搜索“ASan exiv2”,会找到很多模糊测试器在这个二进制文件中创建段错误(Segmentation fault)并将ASan输出转发到Github仓库作为漏洞报告。这比我们上一个目标有了显著的提升。


exiv2 在 Github 上(https://github.com/Exiv2/exiv2)

exiv2 网站(https://www.exiv2.org/)


模糊测试我们的新目标


让我们从我们新改进的Python模糊测试器开始,监控其在50,000次迭代中的性能。让我们添加一些代码来监控浮点异常,除了检测段错误(Segmentation fault)之外(凭直觉!)。我们的新exif()函数将如下所示:


def exif(counter,data):

p = Popen(["exiv2", "pr", "-v", "mutated.jpg"], stdout=PIPE, stderr=PIPE)
(out,err) = p.communicate()

if p.returncode == -11:
f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
f.write(data)
print("Segfault!")

elif p.returncode == -8:
f = open("crashes2/crash.{}.jpg".format(str(counter)), "ab+")
f.write(data)
print("Floating Point!")


查看python3 -m cProfile -s cumtime subpro.py ~/jpegs/Canon_40D.jpg的输出:


75780446 function calls (75780309 primitive calls) in 213.595 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
15/1 0.000 0.000 213.595 213.595 {built-in method builtins.exec}
1 1.481 1.481 213.595 213.595 subpro.py:3(<module>)
50000 0.818 0.000 187.205 0.004 subpro.py:111(exif)
50000 0.543 0.000 143.499 0.003 subprocess.py:920(communicate)
50000 6.773 0.000 142.873 0.003 subprocess.py:1662(_communicate)
1641352 3.186 0.000 122.668 0.000 selectors.py:402(select)
1641352 118.799 0.000 118.799 0.000 {method 'poll' of 'select.poll' objects}
50000 1.220 0.000 42.888 0.001 subprocess.py:681(__init__)
50000 4.400 0.000 39.364 0.001 subprocess.py:1412(_execute_child)
1691919 25.759 0.000 25.759 0.000 {built-in method posix.read}
50000 3.863 0.000 13.938 0.000 subpro.py:14(bit_flip)
7950000 3.587 0.000 9.991 0.000 random.py:256(choice)
50000 7.495 0.000 7.495 0.000 {built-in method _posixsubprocess.fork_exec}
50000 0.148 0.000 7.081 0.000 subpro.py:105(create_new)
7950000 3.884 0.000 5.764 0.000 random.py:224(_randbelow)
200000 4.582 0.000 4.582 0.000 {built-in method io.open}
50000 4.192 0.000 4.192 0.000 {method 'close' of '_io.BufferedRandom' objects}
50000 1.339 0.000 3.612 0.000 os.py:617(get_exec_path)
50000 1.641 0.000 3.309 0.000 subpro.py:8(get_bytes)
100000 0.077 0.000 1.822 0.000 subprocess.py:1014(wait)
100000 0.432 0.000 1.746 0.000 subprocess.py:1621(_wait)
100000 0.256 0.000 1.735 0.000 selectors.py:351(register)
100000 0.619 0.000 1.422 0.000 selectors.py:234(register)
350000 0.380 0.000 1.402 0.000 subprocess.py:1471(<genexpr>)
12066004 1.335 0.000 1.335 0.000 {method 'getrandbits' of '_random.Random' objects}
50000 0.063 0.000 1.222 0.000 subprocess.py:1608(_try_wait)
50000 1.160 0.000 1.160 0.000 {built-in method posix.waitpid}
100000 0.519 0.000 1.143 0.000 os.py:674(__getitem__)
1691352 0.902 0.000 1.097 0.000 selectors.py:66(__len__)
7234121 1.023 0.000 1.023 0.000 {method 'append' of 'list' objects}
-----SNIP-----


看来我们总共花了213秒,并没有发现任何漏洞,这有点遗憾,但可能只是运气不好。让我们在相同的情况下运行我们的C++模糊测试器,并监控输出。


我们开始吧,得到了类似的时间,但有了很大的改进:


root@kali:~# ./blogcpp ~/jpegs/Canon_40D.jpg 50000
Execution Time: 170829ms


这是一个相当显著的改进,43秒。比我们的Python时间快了20%。(再次向数学爱好者道歉。)


让我们的C++模糊测试器继续运行一段时间,看看是否能找到任何漏洞 :)。


新目标上的漏洞


在再次运行模糊测试器大约10秒后,我得到了这个终端输出:


root@kali:~# ./blogcpp ~/jpegs/Canon_40D.jpg 1000000
Floating Point!


看来我们满足了浮点异常的要求。我们应该在cppcrashes目录中有一个不错的jpg文件。


root@kali:~/cppcrashes# ls
crash.522.jpg


让我们通过对这个样本运行exiv2来确认这个漏洞:


root@kali:~/cppcrashes# exiv2 pr -v crash.522.jpg
File 1/1: crash.522.jpg
Error: Offset of directory Image, entry 0x011b is out of bounds: Offset = 0x080000ae; truncating the entry
Warning: Directory Image, entry 0x8825 has unknown Exif (TIFF) type 68; setting type size 1.
Warning: Directory Image, entry 0x8825 doesn't look like a sub-IFD.
File name : crash.522.jpg
File size : 7958 Bytes
MIME type : image/jpeg
Image size : 100 x 68
Camera make : Aanon
Camera model : Canon EOS 40D
Image timestamp : 2008:05:30 15:56:01
Image number :
Exposure time : 1/160 s
Aperture : F7.1
Floating point exception


我们确实发现了一个新漏洞!这非常令人兴奋。我们应该在Github上向exiv2开发者提交一个漏洞报告。





结论


我们首先优化了Python中的模糊测试器,然后用C++重写了它。我们获得了巨大的性能提升,甚至在一个更难的新目标上发现了一些新漏洞。


为了好玩,我们来比较一下我们原始模糊测试器在50,000次迭代中的性能:


123052109 function calls (123001828 primitive calls) in 6243.939 seconds


如你所见,6,243秒显著慢于我们C++模糊测试器的170秒基准。


附录 2020年5月15日

在将C++模糊测试器移植到C语言时,我自己进行了一些适度的改进。我做的一个逻辑更改是仅收集一次原始有效图像的数据,然后在每次模糊测试迭代中将这些数据复制到一个新分配的缓冲区中,然后对新分配的缓冲区进行变异操作。这个C版本的模糊测试器与C++模糊测试器相比表现得相当不错。以下是两者在200,000次迭代中的比较(你可以忽略崩溃发现,因为这个模糊测试器非常简单且完全随机):


h0mbre:~$ time ./cppfuzz Canon_40D.jpg 200000
<snipped_results>

real 10m45.371s
user 7m14.561s
sys 3m10.529s

h0mbre:~$ time ./cfuzz Canon_40D.jpg 200000
<snipped_results>

real 10m7.686s
user 7m27.503s
sys 2m20.843s


因此,在200,000次迭代中,我们最终节省了大约35-40秒。这在我的测试中是相当典型的。所以仅仅通过一些逻辑更改和减少使用C++提供的抽象,我们就节省了大量的系统时间。我们将速度提高了大约5%。


监控子进程退出状态

完成C语言翻译后,我在Twitter上寻求关于性能改进的建议。AFL的创建者@lcamtuf向我解释说,我不应该在代码中使用popen(),因为它会生成一个shell,性能非常差。以下是我请求帮助的代码片段:


void exif(int iteration) {

FILE *fileptr;

//fileptr = popen("exif_bin target.jpeg -verbose >/dev/null 2>&1", "r");
fileptr = popen("exiv2 pr -v mutated.jpeg >/dev/null 2>&1", "r");

int status = WEXITSTATUS(pclose(fileptr));
switch(status) {
case 253:
break;
case 0:
break;
case 1:
break;
default:
crashes++;
printf("r[>] Crashes: %d", crashes);
fflush(stdout);
char command[50];
sprintf(command, "cp mutated.jpeg ccrashes/crash.%d.%d",
iteration,status);
system(command);
break;
}
}


如你所见,我们使用了popen(),运行一个shell命令,然后关闭指向子进程的文件指针,并使用WEXITSTATUS宏返回退出状态以进行监控。我过滤掉了一些我不关心的退出代码,比如25301,并希望看到一些与我们使用C++模糊测试器已经发现的浮点错误相关的代码,甚至可能是段错误。@lcamtuf建议我不要使用popen(),而是调用fork()来生成一个子进程,使用execvp()让子进程执行命令,然后最终使用waitpid()等待子进程终止并返回退出状态。


由于在这个系统调用路径中我们没有一个真正的shell,我还必须打开一个指向/dev/null的句柄,并调用dup2()stdoutstderr都重定向到那里,因为我们不关心命令的输出。我还使用了WTERMSIG宏来获取终止子进程的信号,以防WIFSIGNALED宏返回true,这将表明我们遇到了段错误或浮点异常等。所以现在,我们更新后的函数如下:


void exif(int iteration) {

char* file = "exiv2";
char* argv[4];
argv[0] = "pr";
argv[1] = "-v";
argv[2] = "mutated.jpeg";
argv[3] = NULL;
pid_t child_pid;
int child_status;

child_pid = fork();
if (child_pid == 0) {
// this means we're the child process
int fd = open("/dev/null", O_WRONLY);

// dup both stdout and stderr and send them to /dev/null
dup2(fd, 1);
dup2(fd, 2);
close(fd);

execvp(file, argv);
// shouldn't return, if it does, we have an error with the command
printf("[!] Unknown command for execvp, exiting...n");
exit(1);
}
else {
// this is run by the parent process
do {
pid_t tpid = waitpid(child_pid, &child_status, WUNTRACED |
WCONTINUED);
if (tpid == -1) {
printf("[!] Waitpid failed!n");
perror("waitpid");
}
if (WIFEXITED(child_status)) {
//printf("WIFEXITED: Exit Status: %dn", WEXITSTATUS(child_status));
} else if (WIFSIGNALED(child_status)) {
crashes++;
int exit_status = WTERMSIG(child_status);
printf("r[>] Crashes: %d", crashes);
fflush(stdout);
char command[50];
sprintf(command, "cp mutated.jpeg ccrashes/%d.%d", iteration,
exit_status);
system(command);
} else if (WIFSTOPPED(child_status)) {
printf("WIFSTOPPED: Exit Status: %dn", WSTOPSIG(child_status));
} else if (WIFCONTINUED(child_status)) {
printf("WIFCONTINUED: Exit Status: Continued.n");
}
} while (!WIFEXITED(child_status) && !WIFSIGNALED(child_status));
}
}


你可以看到,这大大提高了我们200,000次迭代基准测试的性能:


h0mbre:~$ time ./cfuzz2 Canon_40D.jpg 200000
<snipped_results>

real 8m30.371s
user 6m10.219s
sys 2m2.098s


结果总结

C++ 模糊测试器 – 每秒310次迭代
C 模糊测试器 – 每秒329次迭代(+ 6%)
C 模糊测试器 2.0 – 每秒392次迭代(+ 26%)
感谢@lcamtuf和@carste1n的帮助。

我已经将代码上传到这里:https://github.com/h0mbre/Fuzzing/tree/master/JPEGMutation



野蛮fuzz:提升性能


看雪ID:pureGavin

https://bbs.kanxue.com/user-home-960900.htm

*本文为看雪论坛文章,由 pureGavin 翻译,转载请注明来自看雪社区


# 往期推荐

1、Alt-Tab Terminator注册算法逆向

2、恶意木马历险记

3、VMP源码分析:反调试与绕过方法

4、Chrome V8 issue 1486342浅析

5、Cython逆向-语言特性分析


野蛮fuzz:提升性能



野蛮fuzz:提升性能

球分享

野蛮fuzz:提升性能

球点赞

野蛮fuzz:提升性能

球在看



野蛮fuzz:提升性能

点击阅读原文查看更多

原文始发于微信公众号(看雪学苑):野蛮fuzz:提升性能

版权声明:admin 发表于 2024年10月26日 下午6:00。
转载请注明:野蛮fuzz:提升性能 | CTF导航

相关文章