.NET malware: De-obfuscation, decryption and debugging - tips and tricks.

There are a number of advantages to .NET-written malware from a malware writer’s perspective. 

  • You can target multiple platforms (x32, x64) with the same executable. There’s even an open source version of  .NET in the works: http://www.dotnetfoundation.org/
  • Development is relatively easy, as a number of frameworks and technologies are readily available
  • Your executable may have a lighter footprint compared to its Win32/64 malware counterparts, since most of the functionality is carried out in the.NET framework (and not in the statically linked binary)

 Of course, there are also disadvantages to writing malware for .NET:

 

  • A successful compromise depends on the presence of the .NET redistributable package on the targeted computer
  • Your executable can be statically analyzed relatively easily since the MSIL is relatively straightforward to decompile (compared to high level languages such as C or C++)

But the landscape is changing. More and more systems are utilizing .NET applications. The .NET redistributable package is becoming increasingly ubiquitous on targeted systems and thus more often utilized by malware authors.

 

Hiding in plain sight

There are a number of obfuscation techniques becoming popular among .NET malware writers that complicate the static and dynamic analysis of their executables.

 

To make it more difficult to understand and follow the underlying logic of a malicious file, malware authors may obscure the names of the classes, members and methods used in a .NET assembly.

 

As an example let’s look at a recent .NET malware sample (sha1:8f573a74b71d6db58e129c9ec08c93c151a10c289e22af76886cec412d937e07, virustotal reference) using the dotPeek 1.1 decompiler. Detailed examination shows the use of randomized and mangled names, which makes it significantly harder to follow and understand the malware’s logic.

 

 Figure 1.png

 

Figure 1: dotPeek 1.1 decompiler result, malware sample mangled names.

 

  When faced with mangled names inside assemblies, I find it useful to push these through the de4dot tool (https://github.com/0xd4d/de4dot), a .NET de-obfuscator and unpacker).

 

Figure 2.png

 

Figure 2: de4dot de-obfuscator command line basic use results

 

 De4dot identified this sample as being processed with the DeepSea 4.1 obfuscator and attempted to de-mangle the names into a somewhat easier-to-follow format.

 

Figure 3.png

 

Figure 3: dotPeek 1.1 decompiler result, malware sample de-obfuscated names after de4dot processing.

 

 Here’s a different example of name obfuscation: (Malware sample sha1:3ad4d3ce337dcb7d736103947ec1f0419760919f2707d4ec64fd78e82f86949c, virustotal reference)

 

Figure 4.png

  

Figure 4: dotPeek 1.1 decompiler results, malware sample – empty space obfuscated function name

 

 The function appears to not have a name. This is possible because method names also depend on the list of methods arguments. For instance, a compiler can distinguish between two methods such as <space bar>(a) and <space bar>(a,b) based on a difference between the list of arguments used in each method call.

 

Running it through de4dot yields somewhat less mangled and obfuscated code:

 

Figure 5.png

 

Figure 5: dotPeek 1.1 decompiler results, malware sample – function name after de4dot processing

 

Another obfuscation method used is to construct or decrypt Intermediate Language (IL) byte code dynamically during an execution of the sample. As an example, let’s look at the malware sample sha1:201430d35ea404e30431f4b28c82378bcff2623d0d54aba4871011ab7c425e8a, virustotal reference

 

 Figure 6.png

 

Figure 6: dotPeek 1.1 decompiler results, malware sample – IL code injector

 

 The decompiler provides quite detailed information regarding the .NET assembly structure and its methods, yet this sample’s behavioral characteristics tell us that there’s more to what we managed to decompile. To find out where this additional code is coming from, let’s fire up the sample under a debugger and step through it.

 

.NET malware analysis 101

Here are a few tips and tricks I find useful when looking at .NET applications using WinDbg.

 

When running malware, make sure you do so in a controlled environment, so you most certainly should be “off the grid”. For that matter, have the debug symbols files cached locally while you are still connected to the web. You can define a local folder cache using .sympath from the command prompt and force reload.

 

sympath .SRV*C:\symbols*http://msdl.microsoft.com/download/symbols/

.reload /f

 

When run from the WinDbg command prompt, it dumps symbols into the c:\symbols directory from the online symbol server and sets the symbol path to this location. (Note that it only takes care of the modules which are already loaded into the application process space.)

 

You can run predominately clean .NET applications, study what modules you might be interested in and have those symbols cached. Or you can pre-load the symbols selectively, or pre-load all of them using the symcheck.exe command line utility supplied with Debugger Tools for the windows package. For example:

 

symchk /r c:\windows\system32 /s SRV*c:\symbols\*http://msdl.microsoft.com/download/symbols

 

should create a local cache for all modules found in c:\windows\system32.

 

For .NET applications, we are also interested in DLLs that may correspond to those used by the assembly framework. For .NET 2.0 these could be in the following location:

 

%windows%\Microsoft.NET\Framework\v2.0.50727

 

Loading all Program Database (pdb) symbol files for a given framework is a lengthy and time consuming process. It’s a better idea to first identify which DLLs are being used by the malicious .NET assembly you want to study while off network, then boot/restore a clean image, connect to internet, build the necessary symbols cache based on DLLs you identified earlier, go off network and then run the .NET malware again. I would say that for the majority of cases system32, mscoree.dll, mscorwks.dll, mscorlib.dll, mscorjit.dll (for .net 2+) or clrjit.dll (for .net 4+) should be enough to start with.

 

So, coming back to our sample - when first loaded as an executable the .NET assembly has very few DLLs loaded into the process space. Firing the lm (list modules) command in WinDbg gives a list:

 

Figure 7.png 

Figure 7: lm command in windbg on just loaded .NET assembly

 

 This is due to the fact that the system is not aware of the modules used until the creation of the Common Language Runtime (CLR) environment, such as the application domain, Garbage Collector (GC) heap, and thread pool. Some DLLs are not loaded until Common Intermediate Language (CIL) in our .NET assembly is started to be Just In Time (JIT) compiled and executed. Hence, the majority of the modules are dynamically loaded and unloaded as needed.


Note MSCOREE.dll in the list of DLLs. This is a launcher for a default CLR environment and almost guaranteed to be present for .NET assemblies.  If the .NET assembly is not wrapped by unmanaged code we can assume that nothing interesting will happen until CIL blocks inside the assembly start to be JIT-compiled and executed.


We want to run up the application’s initialization process until a DLL responsible for JIT is loaded for .NET 2-3.5 - that would be mscorjit.dll; for .NET 4+, clrjit.dll.

Using WinDbg we can set a break on a DLL load event and run until the break.

 

sxe ld:mscorjit.dll g

 

Once stopped, we should have an updated list of the DLL’s loaded.

 

Figure 8.png

 

Figure 8: list of DLLs already loaded before mscorjit.dll load event break point

 

This would be a good time to set any other breakpoints in loaded DLLs which we might be interested in. It would also be helpful to load sos.dll. This is a WinDbg extensions plugin DLL aimed at dealing with managed code. The corresponding framework sos.dll is located in the same folder as mscorwks or clr for .net 4+, hence we can use .loadby sos mscorwks to load the extension. As we mentioned before, the SOS extension is helpful in dealing with managed objects. Here’s a snippet of useful extension commands:

 

!threads - view managed threads

!clrstack - view the managed call stack

!dumpstack - view combined unmanaged & managed call stack

!clrstack -p - view function call arguments

!clrstack –l - view stack (local) variables

!name2ee module class - view addresses associated with a class or method

!dumpmt –md address - view the method table & methods for a class

!dumpmd address - view detailed information about a method

!do address - view information about an object

 

Continuing with our sample -

to get a list of available names in the modules use x, for instance:

 

x  mscorjit!CIL*

 

gives us API names inside mscorjit beginning with CIL. This is very useful, especially if symbols are available.

 

 Figure 9.png

  

Figure 9: list of API names inside mscorjit.dll beginning with CIL


We are interested in the mscorjit!CILJit::compileMethod. This function is called every time IL byte code needs to be JIT compiled into native code. Looking at compileMethod in IDA we see that it takes a number of parameters:

 

Figure 10.png

 

Figure 10: IDA snippet of mscorjit!CILJit::compileMethod

 

What we want to look at is the CORINFO_METHOD_INFO structure. Its definition is found inside the Microsoft's Shared Source CLI (Common Language Infrastructure - more commonly known as Rotor project) clr/src/inc/corinfo.h header file. Examining the structure members we find that there’s a pointer to an IL block to be JIT compiled within a method: ILCode and also its size: ILCodeSize

 

Figure 11.png

 

Figure 11: CORINFO_METHOD_INFO definition in clr/src/inc/corinfo.h file from Rotor project

 

Breaking into the mscorjit!CILJit::compileMethod and looking at the call stack we have:

 

Figure 12.png

 

Figure 12: WinDbg – call stack after breaking in to mscorjit!CILJit::compileMethod

 

dd dwo(esp+c) gives us a glimpse into a CORINFO_METHOD_INFO structure:

 

Figure 13.png

 

Figure 13: WinDbg – look at CORINFO_METHOD_INFO when stopped at mscorjit!CILJit::compileMethod


0x01042441 points to the method’s IL code, whereas 0x7 shows its size.

 

Looking further into CORINFO_METHOD_INFO we see that the first member, CORINFO_METHOD_HANDLE, is in fact a pointer to an 8-byte aligned data structure, the significance of which is not entirely clear. After looking at many examples of the data it looks like the first two bytes of the structure identify a token or a number of the method.

 

Figure 14.png

 

Figure 14: CORINFO_METHOD_HANDLE memory listing

 

Drawing from the Rotor project, we see that unsigned char ** and unsigned long* arguments of the compileMethod function are pointers to a pointer to the resulting native code and its size respectively after the IL code is JIT-ed.

 

For example, this is CORINFO_METHOD_INFO, passed to compileMethod. Note the location of the IL and its size (0x0104213c and 0x0b):

 

 Figure 15.png

 

Figure 15: CORINFO_METHOD_INFO passed to mscorjit!CILJit::compileMethod


This is actual IL code:

 

Figure 16.png

 

Figure 16: IL code pointed by ILCode from CORINFO_METHOD_INFO structure

 

 and here it is JIT-compiled to native code:

 

Figure 17.png

 

Figure 17: JIT-compiled code from IL code pointed earlier

 

Step-by-step

A breakpoint can be set at the native code for further examination of the malware’s execution. Further, the accessibility of the native code makes it possible for malware to dynamically alter it in order to perform actions which are not associated with corresponding to IL code.

 

To log all the IL code, its size and a method number that went through mscorjit!CILJit::compileMethod, we can set the breakpoint at compileMethod.We then have all the relevant information printed and continue execution until the next break point with compileMethod. The resulting breakpoint directive would look like this:

 

bp mscorjit!CILJit::compileMethod "db dwo(dwo(esp+c)+8) Ldwo(dwo(esp+c)+c);dd (dwo(esp+c)+c) L1;dw dwo(dwo(esp+c)) L1; gc"

 

dwo is an expression which returns a double word from a specified address. You can also use poi (pointer of int) when dealing with memory addresses. 

 

This breakpoint outputs the IL, its size and the method’s number at every break.

 

 Figure 18.png

Figure 18: WinDbg – output results of a break point set earlier.


If we enable logging using .logopen, this information is collected in a file and can be used for static and dynamic analysis of the decrypted and executed IL code. Note that the file can be closed using .logclose.

 

Looking at the created log we can determine the locations of the IL code which went through the compileMethod and dump the modules directly from memory using .writemem.

 

Looking at our malware sample we see that there are two memory ranges the IL code was coming from. The first is the location of the loaded main assembly, 0x01040000-0x01082000 which we can also see through .lm (list of loaded modules):

 

Figure 19.png 

Figure 19: WinDbg – lm results, memory location of a main assembly module

 

 The second begins at 0x00d80000. Looking at the dump it is apparent that it is an MZ header.

 

Figure 20.png

 

Figure 20: Memory location of a second range of the IL code, MZ header

 

 followed by a PE header where we can clearly identify section names.

 

Figure 21.png

 

Figure 21: Memory location of a second range of the IL code, PE header

 

Figure 22.png

 

Figure 22: Memory location of a second range of the IL code, PE header, section names


This memory range does not appear in the list of modules loaded and is created by the application dynamically. To write the memory image to file using writemem we need to know the size of the image. The size can be read from the SizeOfImage field within the optional PE header.

 

Using:

 

Figure 23.png 

Figure 23: WinDbg - Size of image taken out of the PE header

 

would give us an image size which we can use to dump the image to a file using .writemem:

 

Figure 24.png 

Figure 24: WinDbg – writemem command to dump the second IL memory range to file


The resulting file can be statically decompiled and analyzed by a number of tools, such as IDA or JetBrain dotPeek. Once decompiled it shows the number of functions that were not apparent in the original assembly and which are used for malicious activity.

Figure 25.png

 

Figure 25: dotPeek 1.1 – injected IL module decompiled


Do note that the techniques and methods of analysis, decryption and de-obfuscation of .NET applications presented in this write up are not exhaustive. But they should give you an introduction, an initial tool set and an idea of how this can be done. The tools mentioned in this article, at the time of the write up, are free and readily available. They provide a powerful set of instruments to manually dissect and study the complex methods of .NET obfuscations while getting a deeper look into malware behavior by combining dynamic and static analysis of .NET assemblies.

  

 

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.