原文始发于breachproof:Lethal Injection: How We Hacked Microsoft’s Healthcare Chat Bot
We have discovered multiple security vulnerabilities in the Azure Health Bot service, a patient-facing chatbot that handles medical information. The vulnerabilities, if exploited, could allow access to sensitive infrastructure and confidential medical data.
我们在 Azure Health Bot 服务中发现了多个安全漏洞,该服务是一种处理医疗信息的面向患者的聊天机器人。如果利用这些漏洞,可以访问敏感的基础设施和机密医疗数据。
All vulnerabilities have been fixed quickly following our report to Microsoft. Microsoft has not detected any sign of abuse of these vulnerabilities. We want to thank the people from Microsoft for their cooperation in remediating these issues: Dhawal, Kirupa, Gaurav, Madeline, and the engineering team behind the service.
在我们向 Microsoft 报告后,所有漏洞都已迅速修复。Microsoft 尚未检测到任何滥用这些漏洞的迹象。我们要感谢 Microsoft 在修复这些问题方面的合作:Dhawal、Kirupa、Gaurav、Madeline 以及该服务背后的工程团队。
The first vulnerability allowed access to authentication credentials belonging to the customers. With continued research, we’ve found vulnerabilities allowing us to take control of a backend server of the service. That server is shared across multiple customers and has access to several databases that contain information belonging to multiple tenants.
第一个漏洞允许访问属于客户的身份验证凭据。通过不断的研究,我们发现了允许我们控制服务的后端服务器的漏洞。该服务器在多个客户之间共享,并有权访问包含属于多个租户的信息的多个数据库。
Vulnerabilities Reported 报告的漏洞
- Multiple sandbox escapes, unrestricted code execution as root on the bot backend
多个沙盒转义,在机器人后端以 root 身份不受限制地执行代码 - Unrestricted access to authentication secrets & integration auth providers
不受限制地访问身份验证密钥和集成身份验证提供程序 - Unrestricted memory read in the bot backend, exposing sensitive secrets & cross tenant data
在机器人后端读取不受限制的内存,暴露敏感机密和跨租户数据 - Unrestricted deletion of other tenants’ public resources
无限制删除其他租户的公共资源
The Discovery 发现
The initial research started at the Azure Health Bot management portal website. Skimming through the features available, we saw that it’s possible to connect your bot to remote data sources, and also provide authentication details.
最初的研究始于 Azure Health Bot 管理门户网站。浏览可用的功能,我们发现可以将机器人连接到远程数据源,并提供身份验证详细信息。
Since customers would likely connect their bot to 3rd party data, such as patient databases, appointment calendars, and so forth, it’s a very interesting target for an attacker. It’s unlikely to imagine a scenario where the customers wouldn’t want to connect the bot to their data.
由于客户可能会将其机器人连接到第三方数据,例如患者数据库、约会日历等,因此对于攻击者来说,这是一个非常有趣的目标。不太可能想象客户不想将机器人连接到他们的数据的场景。
After fiddling with this feature, we noticed something interesting in the request that retrieves our data connection details and auth secrets. This is what a regular request looks like:
在摆弄了这个功能之后,我们注意到请求中有一些有趣的东西,它检索了我们的数据连接详细信息和身份验证密钥。这是常规请求的样子:
In this URL, “test-301x6x6” is our unique health bot instance ID, and “1679070537717” is the ID of the unique data connection we created.
在此 URL 中,“test-301x6x6”是我们唯一的运行状况机器人实例 ID,“1679070537717”是我们创建的唯一数据连接的 ID。
The response to this request was the following JSON:
对此请求的响应如下:
People familiar with Azure will recognize this as an Azure Table API response. And it makes sense, the service stores our connection data in the Azure Table service, and it pulls that data directly from there.
熟悉 Azure 的人会将此识别为 Azure 表 API 响应。这是有道理的,该服务将我们的连接数据存储在 Azure 表服务中,并直接从那里提取该数据。
Our intuition was to start toying with the ID number of our data connection. We believe that the data connections of all customers are in the same table, and if we can query whatever ID we want from the table, we can view the data connections of other customers.
我们的直觉是开始玩弄我们数据连接的 ID 号。我们认为所有客户的数据连接都在同一个表中,如果我们可以从表中查询到我们想要的任何 ID,我们就可以查看其他客户的数据连接。
Per the Azure Table API documentation, here’s how a request to retrieve data from a table looks like:
根据 Azure 表 API 文档,下面是请求从表中检索数据的样子:
So here we have 3 variables we must fill:
所以这里我们有 3 个变量必须填充:
- table name 表名
- partition key 分区键
- row key 行键
We have all the required variables since the previous Table API response discloses all that information. Our guess was, that was the URL the backend server uses to get the information behind the scenes:
我们拥有所有必需的变量,因为上一个 Table API 响应公开了所有这些信息。我们的猜测是,这是后端服务器用来获取幕后信息的 URL:
Here you can see: 在这里你可以看到:
- hbstenant2steausprod – the account name Microsoft used for storing the data.
hbstenant2steausprod – Microsoft 用于存储数据的帐户名。 - test301x6x6 – our Azure health bot instance ID. This is not a secret.
test301x6x6 – 我们的 Azure 运行状况机器人实例 ID。这不是秘密。 - (PartitionKey=’DataConnection’,RowKey=’1679070537717’): Pulling DataConnection with the ID from the request.
(PartitionKey=’DataConnection’,RowKey=’1679070537717’):从请求中拉取具有 ID 的 DataConnection。
The input in our control is the ID. The idea was to send an ID that would allow us to “break out” of our tenant and read other tenants’ data. How do we do that?
我们控件中的输入是 ID。我们的想法是发送一个 ID,允许我们“突破”我们的租户并读取其他租户的数据。我们是怎么做到的?
Since it’s all appended to a URL, the idea was to leverage URL traversal to cancel out the prepended information added by the server, and then add our own:
由于它都附加到 URL 中,因此我们的想法是利用 URL 遍历来抵消服务器添加的预置信息,然后添加我们自己的信息:
As you can see, we encoded the slashes (%2F) which were injected into the URL, effectively turning the request into:
如您所见,我们对注入 URL 的斜杠 (%2F) 进行了编码,有效地将请求转换为:
And voila! This request successfully returned the connection data of the other tenant.
瞧!此请求已成功返回其他租户的连接数据。
Hacking The Bot Backend – 3 ways to pwn the Node.js vm2 sandbox
Hacking The Bot Backend – 3 种方法对 Node.js vm2 沙盒进行 pwn
Exploring further into the service, we saw that you can execute your JavaScript code in an isolated environment. This feature lets you process data coming from the chat as part of the conversation with the end customer.
进一步探索该服务,我们发现您可以在隔离的环境中执行 JavaScript 代码。此功能允许您在与最终客户的对话中处理来自聊天的数据。
We started by doing simple JS recon inside the sandbox – looking at global variables, we figured we were running inside a vm2 sandbox, a popular Node.js sandboxing library that has since been discontinued due to multiple, unrelated security flaws.
我们首先在沙盒内进行简单的 JS 侦察 – 查看全局变量,我们认为我们在 vm2 沙盒中运行,这是一个流行的Node.js沙盒库,由于多个不相关的安全漏洞而已停产。
The goal was simple: to be able to execute shell commands and try to find a way to access cross-tenant data.
目标很简单:能够执行 shell 命令并尝试找到访问跨租户数据的方法。
How do you usually execute shell commands with Node.js? Simple, you import the child_process module and call exec/execSync:
您通常如何用 Node.js 执行 shell 命令?很简单,导入 child_process 模块并调用 exec/execSync:
But you didn’t think it’d be that easy, did you? In general, require inside the vm2 sandbox is a patched version that doesn’t let you import anything harmful. However, Microsoft wanted to provide a few standard modules to make your life easier. So what we have is a custom require function, which has a very specific whitelist of boring modules.
但你没想到会那么容易,是吗?通常,vm2 沙盒中的 require 是一个修补版本,不允许导入任何有害内容。但是,Microsoft希望提供一些标准模块,以使您的生活更轻松。因此,我们有一个自定义的 require 函数,它有一个非常具体的无聊模块白名单。
But we wanted to understand what’s going on under the hood. Lucky for us, Javascript lets you view the source code of any function. You call .toString() on the function, and voila, you get the source code:
但我们想知道引擎盖下发生了什么。幸运的是,Javascript 允许您查看任何函数的源代码。你在函数上调用 .toString(),瞧,你得到了源代码:
Looks pretty harmless at first glance. It’s a simple check if the required module is in the whitelisted array, and if it is, the original Node.js require function will be called.
乍一看很无害。这是一个简单的检查所需的模块是否在白名单数组中,如果是,则调用原始Node.js require 函数。
Well, if you look closer, they called _.indexOf() instead of the native array indexOf function for some reason. And _.indexOf() is a function from the underscore module. Which is whitelisted. Can you see where we’re going with this?
好吧,如果你仔细观察,他们出于某种原因调用了 _.indexOf() 而不是原生数组 indexOf 函数。_.indexOf() 是下划线模块中的一个函数。哪个被列入白名单。你能看出我们要去哪里吗?
Bypassing the whitelist and achieving remote code execution is no problem when you can just override the indexOf function, which is conveniently already present as a global, you don’t even need to import it.
绕过白名单并实现远程代码执行是没有问题的,因为您可以覆盖 indexOf 函数,该函数已经方便地作为全局函数存在,您甚至不需要导入它。
And then: 然后:
Since that backend is shared, we were running as root inside a server that processed the chats of other customers. All research was done in the “debug” environment and was done carefully to not expose any sensitive information.
由于该后端是共享的,因此我们以 root 身份在处理其他客户聊天的服务器内运行。所有研究都是在“调试”环境中完成的,并且经过仔细研究,以免暴露任何敏感信息。
Microsoft quickly patched the bug within 24 hours, but we’re not done with this sandbox yet.
Microsoft 在 24 小时内迅速修补了该错误,但我们还没有完成这个沙盒。
Underscore strikes again 再次下划线罢工
After Microsoft patched the require() flaw, we dove deeper into understanding the mechanics of the vm2 sandbox. We knew that the modules that are whitelisted are part of the unisolated Node.js root context, the idea was to look into each module individually and try to find interesting functionalities that can be abused.
在 Microsoft 修补了 require() 缺陷后,我们更深入地了解了 VM2 沙盒的机制。我们知道,被列入白名单的模块是非隔离Node.js根上下文的一部分,我们的想法是单独研究每个模块,并尝试找到可能被滥用的有趣功能。
We spent a few hours reading the documentation and code of all whitelisted modules, most of them were just boring data parsing libraries that didn’t help. But then something in Underscore.js caught our attention:
我们花了几个小时阅读所有列入白名单的模块的文档和代码,其中大多数只是无聊的数据解析库,没有帮助。但后来Underscore.js一些事情引起了我们的注意:
Hmm, a function that compiles JavaScript templates, with an arbitrary code execution feature. We’re sensing a pattern here.
嗯,一个编译 JavaScript 模板的函数,具有任意代码执行功能。我们在这里感觉到一种模式。
To understand why it’s interesting, you need to understand a simple concept of how the vm2 sandboxing works. In simple terms, they create a “bridge” between the sandbox and the host, and everything you execute inside the sandbox goes through proxy functions which restrict what you can do to a very limited set of features.
要了解它为什么有趣,您需要了解 vm2 沙盒工作原理的简单概念。简单来说,它们在沙盒和主机之间创建了一个“桥梁”,你在沙盒中执行的所有内容都通过代理函数进行,这些代理函数将你可以做的事情限制在一组非常有限的功能上。
For example, if we try to access the Node.js global “process” variable from within the sandbox, the variable won’t be found as it’s not part of the sandboxed context.
例如,如果我们尝试从沙盒中访问Node.js全局“进程”变量,则不会找到该变量,因为它不是沙盒上下文的一部分。
However, when you pass down functions from the root context to the sandbox, the code is already “compiled”. It’s usually pretty dangerous since code inside the sandbox can tamper with the modules and cause unexpected behavior outside the sandbox.
但是,当您将函数从根上下文传递到沙箱时,代码已经“编译”好了。这通常非常危险,因为沙盒内的代码可能会篡改模块并导致沙盒外的意外行为。
Back to the template function, since the underscore module was passed down from outside the sandbox, the code will be compiled in the non-sandboxed context, therefore, we can achieve code execution simply:
回到模板函数,由于下划线模块是从沙箱外传下来的,代码会在非沙箱上下文中编译,因此,我们可以简单地实现代码执行:
Microsoft quickly patched this as well, and we move on to the final flaw.
Microsoft也很快修补了这个问题,我们继续讨论最后一个缺陷。
A Distant Memory 遥远的记忆
This time we had to think a little bit “outside the box” since we were running out of interesting features in the whitelisted modules. We looked into the “buffer” module which is a built-in Node.js module.
这一次,我们不得不“跳出框框”思考,因为我们在白名单模块中已经用完了有趣的功能。我们研究了“缓冲区”模块,这是一个内置的Node.js模块。
The thing that caught our attention was “Buffer.allocUnsafe”. This function lets you allocate an uninitialized memory buffer. To explain what it means in simple terms, let’s compare Buffer.alloc and Buffer.allocUnsafe:
引起我们注意的是“Buffer.allocUnsafe”。此函数允许您分配未初始化的内存缓冲区。为了简单解释它的含义,让我们比较一下 Buffer.alloc 和 Buffer.allocUnsafe:
- Buffer.alloc: will provide a memory buffer that is zeroed out. If we try to read from the allocated buffer, we’ll get a bunch of zeroes.
Buffer.alloc:将提供一个清零的内存缓冲区。如果我们尝试从分配的缓冲区中读取,我们将得到一堆零。 - Buffer.allocUnsafe: faster than alloc, will provide a memory buffer that hasn’t been zeroed out. That means that if the memory allocated was previously used for an HTTP request for example, we will be able to see the HTTP request by reading from the newly allocated buffer.
Buffer.allocUnsafe:比 alloc 更快,将提供尚未清零的内存缓冲区。这意味着,如果分配的内存以前用于 HTTP 请求,我们将能够通过从新分配的缓冲区读取来查看 HTTP 请求。
This is pretty dangerous since if we can use allocUnsafe inside the sandbox, we might be able to access sensitive info from the memory of the application. The vm2 developers were aware of this and restricted the use of Buffer.allocUnsafe.
这是非常危险的,因为如果我们可以在沙盒中使用 allocUnsafe,我们可能能够从应用程序的内存中访问敏感信息。vm2 开发人员意识到了这一点,并限制了 Buffer.allocUnsafe 的使用。
Since the entire buffer module was whitelisted, we had access to SlowBuffer, which is the same as allocUnsafe. This one was not restricted by the sandbox, since it’s not supposed to be there by default:
由于整个缓冲区模块都被列入白名单,因此我们可以访问 SlowBuffer,这与 allocUnsafe 相同。这个不受沙盒的限制,因为它默认不应该在那里:
Running this code a few times yielded interesting data from the application, for example, a few JWT secrets for internal Azure identities, Kubernetes API calls, cross-tenant data, and more.
运行此代码几次会从应用程序生成有趣的数据,例如,内部 Azure 标识的一些 JWT 机密、Kubernetes API 调用、跨租户数据等。
After that, Microsoft made multiple important security changes:
之后,Microsoft 进行了多项重要的安全更改:
- They had changed the service architecture to run a completely separate ACI instance per customer. Making any future sandbox breach irrelevant.
他们更改了服务架构,以便为每个客户运行一个完全独立的 ACI 实例。使任何未来的沙盒漏洞变得无关紧要。 - They changed the sandboxing from vm2 to the isolated-vm library, which uses V8 isolates, a much better and more secure solution.
他们将沙盒从 vm2 更改为 isolated-vm 库,该库使用 V8 隔离,这是一个更好、更安全的解决方案。
转载请注明:Lethal Injection: How We Hacked Microsoft’s Healthcare Chat Bot | CTF导航