What really is the Entry Point of a .NET Module?

渗透技巧 2年前 (2023) admin
565 0 0

public static void Main(); This is what most people associate with the entry point of a .NET module. However, as it so turns out, this is not the place where it all begins. In this post, we will review different types of entry points that are available to us, and go on a quest to find the holy grail the actual place where user code really starts.

What really is the Entry Point of a .NET Module?Where did this extra line come from?

The full source code for this post can be found on my GitHub:

 Full Source Code

Program::Main()

In almost every major programming language, the starting point of every application is often defined by its Main function. Most people are taught this when they learn a new language, and .NET applications are no exception to this rule. When you create a new C# project, you typically will start with a file called Program.cs that has a public static void Main defined. It should look something similar to the following:

1
2
3
4
5
6
7
8
9
using System;

internal static class Program
{
    public static void Main()
    {
        Console.WriteLine("Program::Main()");
    }
}

It should not come as a surprise that this indeed prints what we expect and nothing else:

Z:> EntryPoints.exe
Program::Main()

Static Class Constructors (Program::.cctor)

Intermediate C# programmers will know that Main is actually not the first method that is called by the runtime. Like any other method, Main resides in a class (in our case Program). Since classes can define static class constructors (sometimes also referred to as class initializers) that are invoked upon the first access of the class itself or one of its members, we can trigger the execution of some code while the runtime is accessing our Program class to invoke the main entry point:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
using System;

internal static class Program
{
    static Program()
    {
        Console.WriteLine("Program::.cctor()");
    }

    public static void Main()
    {
        Console.WriteLine("Program::Main()");
    }
}

Indeed, when we try this, we get the following output:

Z:> EntryPoints.exe
Program::.cctor()
Program::Main()

However, this is rather obvious. Anyone that opens the application in a decompiler will immediately see it.

We can do better.

Static Module Constructors (<Module>::.cctor)

More advanced users of the .NET Framework, especially the ones that have some experience with reverse engineering and/or are writing .NET obfuscators and deobfuscators, will know that there is a special hidden type defined in every .NET assembly that is out there. This type has the name <Module> and is often referred to as the Global Type or Module Type of the .NET module. Typically, it is used to define either global methods (as can be done in e.g., C++/CLI), or to initialize some global variables in a DLL.

Upon startup of the application, when the .NET module is accessed for the first time by the runtime, the <Module> type is accessed and initialized first. As we have seen in the previous section, this means the static class constructor of <Module> is invoked as well. In simple terms, if we define a static constructor in the <Module> type of our application, it will be invoked before both our Program::.cctor() and Program::Main(), or any other class defined in our program for that matter.

Unfortunately, C# does not support module initializers out of the box. However, we can inject the constructor into our application with libraries such as AsmResolver after our program was compiled:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
using AsmResolver.DotNet;

// Open file.
var module = ModuleDefinition.FromFile(@"Z:\EntryPoints.exe");

// Inject <Module>::.cctor() that writes `<Module>::.cctor()` to the standard output.
var cctor = module.GetOrCreateModuleConstructor();
cctor.CilMethodBody!.Instructions.InsertRange(0, new[]
{
    new CilInstruction(CilOpCodes.Ldstr, $"<Module>::.cctor()"),
    new CilInstruction(CilOpCodes.Call, module.CorLibTypeFactory.CorLibScope
        .CreateTypeReference("System", "Console")
        .CreateMemberReference("WriteLine", MethodSignature.CreateStatic(
            module.CorLibTypeFactory.Void,
            module.CorLibTypeFactory.String))
        .ImportWith(module.DefaultImporter))
});

// Save
module.Write(@"Z:\EntryPoints.exe");

If we look at it in a decompiler, we can see it has the following effect:

What really is the Entry Point of a .NET Module?<Module> constructor.

And indeed, if we run the patched program, we can see <Module>::.cctor() is called before anything else is:

Z:> EntryPoints.exe
<Module>::.cctor()
Program::.cctor()
Program::Main()

However, it so turns out we can go even before that.

External Module Constructors

The .NET runtime is a virtual machine. This means that methods defined in a .NET module are typically not implemented using machine code (such as x86), but rather consist of code written in the Common Intermediate Language (CIL). Whenever a method is being invoked for the first time, the runtime’s Just-In-Time (JIT) Compiler reads this CIL code and generates on-the-fly machine code designed to run natively and efficiently on your processor. This way, similar to the JVM, it can achieve high portability across all platforms (Compile once, Run anywhere).

What really is the Entry Point of a .NET Module?A(n over-)simplified sketch of the JIT compilation process

When the JIT compiler generates code for a method, it needs to know the addresses of all the external symbols (e.g., types, methods and fields) that the method’s body uses. Indeed, the processor needs to know where to jump to when it is trying to call a method defined in an external module.

This has an interesting implication. If we write some specific code into our <Module>::.cctor() method that we constructed earlier, such that it references a method from an external module, then this external module will be accessed for the first time during the compilation of the method. We do not even need to actually implement anything substantial in this method, the reference is enough to let the runtime trigger the initialization of the module by calling its <Module>::.cctor(). Therefore, we can use it to run code even before the first instruction of our main application’s module constructor is executed:

What really is the Entry Point of a .NET Module?<Module> constructor referencing a class in an external library, triggering the invocation of an external <Module> constructor during JIT compilation.

For clarity, I started tagging the Console.WriteLine(string) calls with the name of the module it originates from:

Z:\> EntryPoints.exe
[ClassLibrary.dll]: <Module>::.cctor()
[EntryPoints.exe]: <Module>::.cctor()
[EntryPoints.exe]: Program::.cctor()
[EntryPoints.exe]: Program::Main()

This is already much less obvious, but let’s go even deeper.

COR20 Native Entry Point

If you have dabbled with the .NET module file format a bit, you may have come across the .NET Data Directory structure. This data directory contains a header (also known as IMAGE_COR20_HEADER), that specifies some basic information about the .NET module, such as where to find the metadata directories in the file, as well as the required .NET runtime version to run the application.

One of the other things it lists is a set of bit-flags that describe the general nature of the .NET module. In particular, one of these flags specifies that a Native Entry Point can be used:

What really is the Entry Point of a .NET Module?CFF Explorer showing the Native Entry-Point flag in the IMAGE_COR20_HEADER structure.

The official source code of the runtime has some interesting comments for this flag:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
typedef struct IMAGE_COR20_HEADER
{
    /* ... */

    // The main program if it is an EXE (not used if a DLL?)
    // If COMIMAGE_FLAGS_NATIVE_ENTRYPOINT is not set, EntryPointToken represents a managed entrypoint.
    // If COMIMAGE_FLAGS_NATIVE_ENTRYPOINT is set, EntryPointRVA represents an RVA to a native entrypoint
    // (deprecated for DLLs, use modules constructors instead).
    union {
        uint32_t            EntryPointToken;
        uint32_t            EntryPointRVA;
    };

    /* ... */
}

Side note: I like how, according to this comment, the .NET runtime developers themselves aren’t even sure whether it is used for DLL files or not :^).

This leads us to believe the EntryPointToken attribute can actually be used to reference a native function, similar to how a DllMain works in a native executable file. Let’s build one using AsmResolver that prints the string "Unmanaged Entry Point from CLR directory" to the output using puts:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Turn our ModuleDefinition into a PEImage such that we can access lower level structures.
var image = module.ToPEImage();

// Import `ucrtbase!puts`.
var ucrtbase = new ImportedModule("ucrtbase.dll");
var puts = new ImportedSymbol(0, "puts");
ucrtbase.Symbols.Add(puts);
image.Imports.Add(ucrtbase);

// Set the native entry point flag.
image.DotNetDirectory!.Flags |= DotNetDirectoryFlags.NativeEntryPoint;

// Write some code.
var code = new DataSegment(new byte[]
  {
    /* 00000000: */ 0x68, 0x00, 0x00, 0x00, 0x00,       // push  &message
    /* 00000005: */ 0xFF, 0x15, 0x00, 0x00, 0x00, 0x00, // call  [&puts]
    /* 0000000B: */ 0x83, 0xC4, 0x04,                   // add   esp, 4
    /* 0000000E: */ 0xB8, 0x01, 0x00, 0x00, 0x00,       // mov   eax, 1
    /* 00000013: */ 0xC2, 0x0c, 0x00,                   // ret   0xc
    
    /* 00000016: */                                     // message:
  }.Concat(Encoding.ASCII.GetBytes($"[{prefix}]: Unmanaged Entry Point from CLR directory")).ToArray())
  .AsPatchedSegment()
  .Patch(relativeOffset: 0x1, AddressFixupType.Absolute32BitAddress, symbolOffset: +0x16 /* &message */) 
  .Patch(relativeOffset: 0x7, AddressFixupType.Absolute32BitAddress, puts)
;

/* ... add to section ... */

// Update entry point field in COR20 header.
image.DotNetDirectory!.EntryPoint = nativeClrEntryPoint.Rva;

This gives us a file that looks a bit like the following in Ghidra:

What really is the Entry Point of a .NET Module?A .NET class library assigned a native entry point.

When we run the file, we get the following output:

Z:\> EntryPoints.exe
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory
[ClassLibrary.dll]: <Module>::.cctor()
[EntryPoints.exe]: <Module>::.cctor()
[EntryPoints.exe]: Program::.cctor()
[EntryPoints.exe]: Program::Main()
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory

Interestingly enough, this results in a file that invokes this native entry point before the <Module>::.cctor() of our library. Note how this is different from the managed entry point of the main application, which was executed after the module constructor. We can also see that it is not just called once at startup, but also at program exit, further confirming this native entry point really behaves like a DllMain function of a native library (i.e., DLL_PROCESS_ATTACH and DLL_PROCESS_DETACH).

PE Native Entry Point

This last test led me to think: What about an actual DllMain function as the one in languages such as C/C++?.

Typically, .NET PE files have a very simple DllMain function that merely transfers control to mscoree!_CorDllMain using a single jmp instruction. But nobody is stopping us really from extending this and letting it do some additional work before this jump is made. Using a similar approach as before, we can let the AddressOfEntryPoint field in the optional header of our PE file point to a chunk of code that first prints some text to the output before calling _CorDllMain:

What really is the Entry Point of a .NET Module?A .NET class library assigned a native PE entry point.

And indeed, it is invoked even before our native entry point as specified by the IMAGE_COR20_HEADER:

Z:\> EntryPoints.exe
[ClassLibrary.dll]: DllMain
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory
[ClassLibrary.dll]: <Module>::.cctor()
[EntryPoints.exe]: <Module>::.cctor()
[EntryPoints.exe]: Program::.cctor()
[EntryPoints.exe]: Program::Main()
[ClassLibrary.dll]: DllMain
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory

Unfortunately, this doesn’t seem to work if we try to inject a new entry point into the main executable file (even if we try it with _CorExeMain). While it does not throw any errors, it seems the Windows PE loader just skips this field altogether.

However, it doesn’t quite stop there just yet…

TLS Callbacks

We’re far beyond the scope of normal .NET modules at this point, but let’s go down the rabbit hole for just a bit longer.

Multithreaded applications sometimes require static (non-stack) memory that is local to a specific thread. The PE file format allows for defining segments of memory within the file that specifies what this memory should like, and how it should be initialized. This information is stored inside the Thread Local Storage (TLS) data directory.

The TLS data directory contains a field that lists a set of functions known as TLS Callbacks. These are functions that are invoked by Windows every time a new thread is created or destroyed. Typically, they are used for initializing or destructing objects stored in the TLS, however, they have also been used in the past before by malware to hide malicious code in (e.g., see here and here).

It so turns out we can do the same for .NET modules. We can inject a new TLS section using AsmResolver as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/* ... */  
var tlsCallbackCode = new DataSegment(/* ... */)

// Construct new TLS data directory.
var templateBlock = new DataSegment(new byte[100]);
var indexBlock = new DataSegment(new byte[8]);
image.TlsDirectory = new TlsDirectory
{
  TemplateData = templateBlock,
  Index = indexBlock.ToReference(),
  CallbackFunctions = { tlsCallbackCode.ToReference() },
  Characteristics = TlsCharacteristics.Align4Bytes
};

/* ... */  

// Add to new .tls section.
file.Sections.Add(new PESection(
  ".tls",
  SectionFlags.ContentInitializedData | SectionFlags.MemoryRead | SectionFlags.MemoryWrite | SectionFlags.MemoryExecute,
  new SegmentBuilder
  {
    { tlsCallbackCode, 8 },
    { templateBlock, 8 },
    { indexBlock, 8 },
    { image.TlsDirectory, 8 },
    { image.TlsDirectory.CallbackFunctions, 8 }
  }));

/* ... */  

// Update data directory.
directories[(int) DataDirectoryIndex.TlsDirectory] = new(image.TlsDirectory.Rva, image.TlsDirectory.GetPhysicalSize());

This will give us the following:

What really is the Entry Point of a .NET Module?A .NET class library with a TLS callback registered.

… and surprise, a TLS callback is called even before the actual DllMain of the PE file!

Z:> EntryPoints.exe
[ClassLibrary.dll]: TLS Callback
[ClassLibrary.dll]: DllMain
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory
[ClassLibrary.dll]: <Module>::.cctor()
[EntryPoints.exe]: <Module>::.cctor()
[EntryPoints.exe]: Program::.cctor()
[EntryPoints.exe]: Program::Main()
[ClassLibrary.dll]: TLS Callback
[ClassLibrary.dll]: DllMain
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory

Interestingly enough, while the native PE entry point did not work on our main executable file, TLS callbacks actually do work. Since our main executable file is loaded before any of its dependencies, such a TLS callback will thus be invoked even before the TLS callback of our external library:

What really is the Entry Point of a .NET Module?A .NET executable file with a TLS callback registered.

Z:> EntryPoints.exe
[EntryPoints.exe]: TLS Callback
[EntryPoints.exe]: TLS Callback
[EntryPoints.exe]: TLS Callback
[ClassLibrary.dll]: TLS Callback
[ClassLibrary.dll]: DllMain
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory
[ClassLibrary.dll]: <Module>::.cctor()
[EntryPoints.exe]: <Module>::.cctor()
[EntryPoints.exe]: Program::.cctor()
[EntryPoints.exe]: Program::Main()
[ClassLibrary.dll]: TLS Callback
[ClassLibrary.dll]: DllMain
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory
[EntryPoints.exe]: TLS Callback

You have to be careful though with putting anything substantial in here. As can be seen from the output, the TLS callbacks are called for every process and thread attach/detach event, meaning the TLS callback has a very high likelihood of being called not once but a few times in sequence.

External Unmanaged DLL Pre-Injection

OK, one more step down before we will call ourselves too crazy to continue…

The PE file has another interesting data directory known as the imports directory. This directory lists all external modules and functions the Windows PE Loader should load and resolve upon the startup of the executable file. A more elaborate explanation can be found in a previous post.

What really is the Entry Point of a .NET Module?A .NET application with an import directory.

As TLS callbacks are also allowed to invoke symbols from external modules, this resolution is performed before any of the TLS callbacks are invoked. Furthermore, as we have seen before, loading a new PE file triggers its entry point is called. We can thus make yet another DLL file with some exported function…

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <cstdio>
#include <windows.h>

extern "C" __declspec(dllexport) void UnmanagedExport()
{
  puts("Unmanaged Export");
}

BOOL APIENTRY DllMain(HMODULE hModule, DWORD  ul_reason_for_call, LPVOID lpReserved)
{
  puts("[DynamicLibrary.dll]: DllMain");
  return TRUE;
}

… and register it as one of the main executable’s dependencies using AsmResolver before writing it to the disk:

1
2
3
4
var dynamicLibrary = new ImportedModule("DynamicLibrary.dll");
var export = new ImportedSymbol(0, "UnmanagedExport");
dynamicLibrary.Symbols.Add(export);
image.Imports.Add(dynamicLibrary);

What really is the Entry Point of a .NET Module?A .NET application with a reference to an external unmanaged dynamic library.

Even if we do not end up calling the UnmanagedExport function itself, the Windows PE Loader will happily call the DllMain for us when loading our main executable:

Z:\> EntryPoints.exe
[DynamicLibrary.dll]: DllMain
[EntryPoints.exe]: TLS Callback
[DynamicLibrary.dll]: DllMain
[EntryPoints.exe]: TLS Callback
[DynamicLibrary.dll]: DllMain
[EntryPoints.exe]: TLS Callback
[ClassLibrary.dll]: TLS Callback
[ClassLibrary.dll]: DllMain
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory
[ClassLibrary.dll]: <Module>::.cctor()
[EntryPoints.exe]: <Module>::.cctor()
[EntryPoints.exe]: Program::.cctor()
[EntryPoints.exe]: Program::Main()
[ClassLibrary.dll]: TLS Callback
[ClassLibrary.dll]: DllMain
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory
[DynamicLibrary.dll]: DllMain
[EntryPoints.exe]: TLS Callback

Of course, this DLL can also define its own TLS callbacks. Adding some #pragma magic to the code like the following will instruct the compiler to define one TLS callback in our dynamic library…

1
2
3
4
5
6
7
8
9
10
11
12
VOID WINAPI TlsCallback(PVOID DllHandle, DWORD Reason, PVOID Reserved)
{
    puts("[DynamicLibrary.dll]: TLS Callback");
}

#pragma comment (linker, "/INCLUDE:__tls_used")
#pragma comment (linker, "/INCLUDE:_tls_callback_func1")

#pragma data_seg(".CRT$XLF")
PIMAGE_TLS_CALLBACK tls_callback_func1 = TlsCallback;
PIMAGE_TLS_CALLBACK tls_callback_end = NULL;
#pragma data_seg()

.. and run this callback before the DllMain again:

Z:\> EntryPoints.exe
[DynamicLibrary.dll]: TLS Callback
[DynamicLibrary.dll]: DllMain
[EntryPoints.exe]: TLS Callback
[DynamicLibrary.dll]: TLS Callback
[DynamicLibrary.dll]: DllMain
[EntryPoints.exe]: TLS Callback
[DynamicLibrary.dll]: TLS Callback
[DynamicLibrary.dll]: DllMain
[EntryPoints.exe]: TLS Callback
[ClassLibrary.dll]: TLS Callback
[ClassLibrary.dll]: DllMain
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory
[ClassLibrary.dll]: <Module>::.cctor()
[EntryPoints.exe]: <Module>::.cctor()
[EntryPoints.exe]: Program::.cctor()
[EntryPoints.exe]: Program::Main()
[ClassLibrary.dll]: TLS Callback
[ClassLibrary.dll]: DllMain
[ClassLibrary.dll]: Unmanaged Entry Point from CLR directory
[DynamicLibrary.dll]: TLS Callback
[DynamicLibrary.dll]: DllMain
[EntryPoints.exe]: TLS Callback

We can continue this forever, by adding more dependencies to our dependency, and so on…

To keep my own sanity, I decided to call it quits here.

Conclusion

Phew… that was a tumble and a half.

We have seen that even though Main or the class constructor <Module>::.cctor is often considered to be the first piece of user code to be called in a .NET module, this actually is not the case. As .NET modules are a special case of normal PE files, we can exploit some of the features found in PE that will allow us to put code in places that is not always considered when reverse engineering a .NET module.

I do not encourage you to do half of the things I just described in production. Especially when you start dabbling with TLS callbacks or beyond, this can actually be quite unstable since most of these will be invoked even before the .NET runtime itself is properly initialized. However, it is an interesting study anyways on how Windows ends up loading and executing .NET modules.

The full source code for this post can be found on my GitHub:

 Full Source Code

Thanks for reading and happy hacking!

 

原文始发于WashiWhat really is the Entry Point of a .NET Module?

版权声明:admin 发表于 2023年5月9日 上午8:30。
转载请注明:What really is the Entry Point of a .NET Module? | CTF导航

相关文章

暂无评论

您必须登录才能参与评论!
立即登录
暂无评论...