Working with Memory
As previously stated, from a programmer's perspective, memory is abstracted into variables. This hides most of the lower level abstractions. Each variable is characterized by an address (or location in memory), type and access rights. Some languages require that the developer spells out these attributes explicitly (statically typed languages - notable examples: C\C++, D, Java) whereas others deduce them by analyzing the context (dynamically typed languages - notable examples: Python, JavaScript). Nevertheless, the language compiler needs to handle this information and, based on it, generate code that manages memory correctly and efficiently.
Memory Access
Accessing memory is defined by reading or writing values to or from a variable. From a programmer's perspective, this looks pretty straightforward:
int main(void)
{
int a; // declare variable
a = 42; // write 42 to variable a
printf("%d\n", a); // read variable a and print its contents
return 0;
}
However, from a lower level perspective, there are other attributes that need to be taken care of.
For instance, variable a
needs to have a correspondent area that is reserved in memory.
That specific chunk of memory is described by an address and a size.
The address for a
is automatically generated by going through multiple layers of abstractions, but the size is spelled out indirectly by the programmer by using the keyword int
.
Another aspect is represented by the access rights for a specific memory area.
In our example, a
is defined as being plain mutable, however, it is possible to declare constant variables which are stored in memory location with no writing rights.
Using the above information, the compiler and the operating system co-work to allocate memory that can represent the contents of the variable.
No matter what sort of language you are using, statically or dynamically typed, a variable is always described by the (address, size, access rights) triplet. By using this triplet, the content of a variable is stored, retrieved or rewritten.
Practice
Navigate to the support/memory-access/
directory.
Inspect the mem_access.c
source file.
Describe each variable by completing its (address, size, access rights) tuple.
Try to modify the
ca
,cp
andcp2
variables by assigning some other value to them. Explain the behavior.
Quiz
Memory Protection
Memory contents (both code and data) are separated into sections or zones. This makes it easier to manage. More than that, it allows different zones to have different permissions. This follows the principle of least privilege where only required permissions are part of a given section.
Code is usually placed in a section (.text
) with read and execute permissions;
no write permissions.
Variables are placed in different sections (.data
, .bss
, stack, heap) with read and write permissions;
no execute permissions.
Let's navigate to the support/memory-protection/
directory and inspect the mem_prot.c
source file.
The file uses different access types for the data
variable and the do_nothing
function.
Build it:
student@os:~/.../lab/support/memory-protection$ make
gcc -g -Wall -Wextra -Werror -I../../../../../common/makefile/../utils -I../../../../../common/makefile/../utils/log -c -o mem_prot.o mem_prot.c
gcc mem_prot.o ../../../../../common/makefile/../utils/log/log.o -o mem_prot
student@os:~/.../lab/support/memory-protection$ ./mem_prot
reading from .data section
writing to .data section
reading from .text section
executing .text section
All current actions in the program are valid.
Let's uncomment each commented line in the program and try again:
student@os:~/.../lab/support/memory-protection$ ./mem_prot
reading from .data section
writing to .data section
reading from .text section
executing .text section
executing .data section
Segmentation fault (core dumped)
We now receive the dreaded Segmentation fault message when we try to access a memory section with wrong permissions.
Permissions come into play when we control the memory address via pointers. But even for programming languages that don't offer pointers (such as Python) issues may still arise.
In the str.py
file, we look to modify str[1]
, but this fails:
student@os:~/.../lab/support/memory-protection$ ./str.py
n, 110, n
Traceback (most recent call last):
File "./str.py", line 5, in <module>
str[1] = 'z'
TypeError: 'str' object does not support item assignment
This fails because strings are, in Python, immutable. Once a string is being created, it can not be modified; you have to create a new string.
Practice
Go to the support/memory-protection/
folder and solve the practice items below.
Add a variable named
ro
that you define asconst
. The variable will be placed on a read-only section (.rodata
) such as that write and execution access would result in Segmentation fault.Access the
ro
variable and show that, indeed, for write and execution access, Segmentation fault is issued.
Memory Allocation Strategy
Navigate to the support/memory-alloc/
directory.
It contains 3 implementations of the same program in different languages: C, Python and D.
The program creates a list of entries, each entry storing a name and an id.
The purpose of this exercise is to present the different strategies that programming languages adopt to manage memory.
C
The C implementation manages the memory manually.
You can observe that all allocations are performed via malloc
and the memory is freed using free
.
Arrays can be defined as static (on the stack) or dynamic (a pointer to some heap memory).
Stack memory need not be freed, hence static arrays are automatically deallocated.
Heap memory, however, is managed by the user, therefore it is the burden of the programmer to find the optimal memory strategy.
This offers the advantage that you can fine tune the memory usage depending on your application, but this comes with a cost: more often than not, managing memory is a highly complex error-prone task.
Python
The Python implementation of the program has no notion of memory allocation. It simply defines variables and the garbage collector takes care of allocating and deallocating memory. Notice how the destructor is called automatically at some point when the garbage collector deems that the list is not used anymore. Garbage collection lifts the burden of memory management from the user, however, it may be unsuitable for certain scenarios. For example, real-time applications that need to take action immediately once a certain event occurs cannot use a garbage collector (GC). That is because the GC usually stops the application to free dead objects.
D
The previous 2 examples have showcased extreme situations: fully manual vs fully automatic memory management.
In D, both worlds are combined: variables may be allocated manually on the stack/heap or allocated via the garbage collector (for brevity, malloc
-based allocation is not presented in this example).
Arrays that are allocated on the stack behave the same as in C, whereas array allocated with the garbage collector mimic Python lists.
Classes are also garbage collected.
Memory Vulnerabilities
The purpose of this exercise is to provide examples on how memory corruption may occur and what are the safety guards implemented by different programming languages.
Navigate to the support/memory-vuln/
directory.
It features 3 files, each showcasing what happens in case of actions that may lead to memory corruption.
C
The C implementation showcases some of the design flaws of the language can lead to memory corruption.
The first example demonstrates how a pointer to an expired stack frame may be leaked to an outer scope. The C language does not implement any guards against such behavior, although data flow analysis could be used to detect such cases.
The second example highlights the fact that C does not check any bounds when performing array operations.
This leads to all sorts of undefined behavior.
In this scenario, some random memory is overwritten with 5
.
The third example exhibits a manifestation of the previous design flaw, where the return address of the main
function is overwritten with 0
, thus leading to a segmentation fault.
Although today it seems obvious that such behavior should not be accepted, we should take into account that the context in which the C language was created was entirely different from today. At that time the resource constraints - DRAM memory was around a few KBs, operating systems were in their infancy, branch predictors did not exist etc. - were overwhelming. Moreover, security was not a concern because the internet basically did not exist. As a consequence, the language was not developed with memory safety in mind.
Python
Technically, it is not possible to do any memory corruption in Python (that is, if you avoid calling C functions from it).
Pointers do not formally exist, and any kind of array access is checked to be within its bounds.
The example simply showcases what happens when an out-of-bounds access is performed - an IndexError
is thrown and execution halts.
D
The D implementation uses almost the same code as the C implementation, but suffers from minor syntax modifications.
In essence, the two implement the same logic.
When compiling this code, it can be observed that the D compiler notices at compile time that an out-of-bounds access is performed.
This makes sense, since a static array cannot modify its length and therefore the compiler has all the information to spot the mistake.
The only way to make the code compile is to comment the faulting lines or to replace the out-of-bounds index with a correct one.
After doing so, the program compiles and we can see that memory corruption occurs.
However, D also has safety checks, however, these are not performed by default.
To enable such checks, the user must annotate a function with the @safe
keyword:
int* bad() @safe
By doing so, the mechanical checks are enabled and a new set of criteria needs to be followed for the code to be accepted.
Taking the address of a local, doing pointer arithmetic, reinterpret casts, calling non-@safe
functions etc. are not allowed in @safe
code.
If any of these unsafe features are manually proven to be safe, the @trusted
keyword may be used to disable the checks but still consider the code @safe
.
This is to allow writing system code, which by its nature is unsafe.
Memory Corruption
For this practice item, you will need to identify the programming mistake that makes it possible to corrupt memory.
Navigate to the support/memory-corruption
folder.
Inspect the source file segfault.c
.
- What does the program do? (this could be a quiz in the final form)
- Compile and run it. What happens?
- Debug the program and find the line that causes the segfault.
Note: Although using
printf()
calls is a viable option, we strongly suggest you use GDB. - Fix the program.
- Analyze the corresponding Python and D implementation.
What is the expected result in each case? Why? Run the programs and see what happens.