Hey again! This is part 2 of my Introduction to x86 Exploit Development. If you didn’t check out my first part of this series, I highly recommend it before reading this.
In this post, we go a little more in depth as to how things get laid out in memory from a program standpoint, now that we have a little better understanding of memory on a computer.
I heavily reference the book: Hacking: The Art of Exploitation in this post.
Memory Segmentation Of An ELF:
In Linux/UNIX computing, the standard binary file format is the ELF (Executable and Linkable Format) file format. This file format is flexible, extensible, and cross-platform, as it supports multiple endiannesses and address sizes and is made up of a specific file layout. First, it has an ELF header, followed by file data, which can include:
- Program header table, describing zero or more memory segments
- Section header table, describing zero or more sections
- Data referred to by entries in the program header table or section header table
The data that is referred to by the Program header table and the Section header table divided into five different segments: text, data, bss, heap, and stack. Each of these segments holds a specific purpose that will be described more in depth in the sections to follow.
+----------------+ Lowest Address 0x00000000 | Read Only | | Text segment | Text/Code | | Segment +----------------+------------ | Initialized | Data | Data | Segment +----------------+------------ | Uninitialized | BSS | Data | Segment +----------------+------------ | The heap grows | HEAP
| down toward | Segment | higher memory |
+------------+---+------------ | | | Room for heap | | | segment to | V | grow +----------------+------------
| (libc.so) | Shared Libraries, if any.
| (printf.o) |Dynamically linked functions
| ^ | Room for stack | | | segment to
| | | grow +--+-------------+------------ top of stack | The stack grows| Stack frame(s) | up toward lower| Segment
| memory |
+----------------+------------ bottom of stack | cmd line args | frame(s) | env Variables | +----------------+ Highest Address 0X7fffffff
The above diagram shows the layout of the 4GB virtual memory address space and how a 32-bit binary would look like when loaded into memory. This can be viewed in the opposite direction as well, where higher memory addresses are on top and lower memory addresses are on the bottom, but I prefer low memory on top and high on bottom, as most people are familiar with seeing numbered lists that count downward.
The Text Segment:
The text segment (or .text), also known as the code segment, is where a portion of an object file or the corresponding section of the program’s virtual address space that contains executable instructions is stored and is generally read-only and fixed size. In other words, this is where the assembled machine language instructions of the program are located.
The execution of the instructions in this segment is nonlinear, thanks to the high level control structures and functions created by the assembly language, such as branching, jumps, and call instructions.
As a program begins executing, EIP is set to the first instruction in the text segment. The processor then follows an execution loop that does the following:
- Read the instruction that EIP is pointing to
- Add the byte length of the instruction to EIP
- Execute the instruction that was read in step 1
- Go back to step 1
Sometimes the instruction will be a jump or a call instruction, which changes EIP to a different address of memory. The processor doesn’t care about the change, because it’s expecting the execution to be nonlinear anyway.
Write permission is disabled in the text segment, as it is not used to store variables, only code. This prevents people from actually modifying the program code; any attempt to write to this segment of memory will cause the program to alert the user that something bad happened, and the program will be killed.
The Data and BSS Segments:
A global variable is a variable with a global scope, which means that it is accessible throughout the program. Programs and functions can typically only see variables that are within their scope, which is their self contained program or function, but global variables can be accessed from any part of the program at any time.
A static variable is a variable that has been allocated “statically”, which means that it lasts the entire run of the program, as opposed to shorter-lived automatic variables that are stack allocated (created and destroyed within the scope of a single function), as well as variables/objects whose storage is dynamically allocated and deallocated in heap memory.
The data segment is composed of initialized global and static variables, whereas the bss segment is composed of their uninitialized counterparts.
The Heap Segment:
The heap segment is a segment of memory that is dynamically allocated (via a malloc function call) and deallocated (via a free function call). It is memory that a programmer can directly control. Blocks of memory in this segment can be allocated and used for whatever the programmer might need.
One notable point about the heap segment is that it isn’t of fixed size, so it can grow larger or smaller as needed, or in other words, is a variable size. All of the memory within the heap is managed by allocator and deallocator algorithms, which respectively reserves a region of memory in the heap for use and remove reservations to allow that portion of memory to be reused for later reservations.
Another notable point about the heap segment is that it grows downward, towards higher memory addresses. But remember, the direction is arbitrary, it depends on how you choose to view memory, and in my case, I choose to view it low memory addresses on the top and higher memory addresses on the bottom, but in either case, heaps grow toward higher memory addresses.
The Stack Segment:
The stack segment is a segment of memory that also has a variable size and is used as a temporary storage area for local function variables and context during function calls. When a program calls a function, that function will have its own set of passed in variables (arguments) and local variables, and the functions code will be at a different memory location in the text (code) segment. Since EIP (the instruction pointer that always points to the next instruction to be executed, inside the text segment) and the context must change when a function gets called, the computer uses the stack to remember all of the passed in variables (arguments) and local variables, as well as store the location that EIP should return to upon exiting the function that was called. All of this information is stored together on the stack and is collectively known as a ‘stack frame‘. The stack can contain many stack frames.
Example of Stack Section Being Used in a Program:
Below is a program that we will be disassembling and stepping through the x86 assembly instructions to better understand how functions work in a C program.
This simple program will perform a function call to ‘test_function’ and pass it four arguments: 1, 2, 3, and 4. We will first use GDB to debug and disassemble the program so that we can see what the underlying computer is actually performing. This will help us get a better understanding of how computers work at a lower level, which will in turn help us perform our vulnerability research, in order to develop an exploit (when the time comes). With this disassembly, we will be exploring how the CDECL calling convention works in particular.