Memory, Registers, and Arithmetic

type

status

date

slug

summary

Inside an idealized computer

Computer memory is composed of a series of memory cells, each uniquely identified by a specific address. While these cells typically store numerical values, the term "number" is used broadly to encompass not only actual numbers but also the instructions and data integral to a program’s functionality.

Given that accessing memory is often slower than executing arithmetic operations, processors are equipped with registers. These registers act as high-speed storage locations that temporarily hold data, thereby significantly accelerating complex computational tasks. Essentially, registers can be thought of as specialized memory cells within the CPU, identified by their unique names, which serve as addresses.

Inside an Intel 64-bit PC

Intel 64-bit PC, memory addresses for locations containing integer values typically differ by 44 or 88 bytes, depending on the system's architecture. This architecture also includes two specific registers: %RAX and %RDX. Notably, the lower 32 bits of these registers are referred to as %EAX and %EDX, respectively.

Memory layout and registers

Our uses two memory addresses called a and b. We can think of a and b as names for these addresses. We'll use a special notation where (a) means "the contents at memory address a." In C or C++, we declare and define these memory locations as:

When we load a program, static memory locations are initially filled with zeros. The figure below shows our initial memory layout after loading the program:

Assigning Numbers to Memory Locations

Remember, a represents both the address (location) of the storage unit and the name of the address 000055555555802c. Here, (a) refers to the contents (number) stored at address a.

Assigning Numbers in C and C++

In C and C++, a is a variable. We assign it a number just like any other variable:

Assigning Numbers in Assembly Language

In Intel assembly language, we write:

Assigning Numbers in GDB Disassembly

In the GDB disassembly, we'll see the following code, with the variable a and its address shown in the comment:

The right column of the table below shows how our pseudocode translates to assembler language:

Notice that movl is used instead of mov. This is because a and b can point to either 32-bit memory cells (like %EAX or %EDX registers) or 64-bit memory cells (like %RAX and %RDX registers). For registers, their names clearly indicate whether we're using 64-bit (e.g., %RAX) or 32-bit (e.g., %EAX) versions. However, for memory addresses a and b, it's not immediately clear if they refer to 64-bit or 32-bit cells. We use movl to clarify that we're using 32-bit memory cells, which can hold integers from 0 to 4,294,967,295.

The 0x2ef2(%rip) address is the compiler's way of generating code to calculate the address of a rather than specifying it directly. This approach requires less memory space. Literal constants are prefixed with $. For instance, in $0x1, the 0x prefix indicates that the following number is hexadecimal. In the comments, leading zeroes of addresses are omitted. For example, the variable a has the address 000055555555802c, which is represented as 0x55555555802c. Note that the movement direction is from left to right in both the disassembly output and the pseudocode.

After executing the first two assembly language instructions, we have the memory layout shown in the figure below.

Assigning Numbers to Registers

Assigning numbers to registers is similar to memory assignments. In pseudocode, we can write:

Note that we don't use brackets when referring to register contents. The second instruction copies the number from the address a to a register.

Assigning Numbers in Assembly Language

In assembly language, we write:

Assigning Numbers in GDB Disassembly

In the GDB disassembly output, we see:

Adding Numbers to Memory Cells

Adding Numbers

Let's examine the following pseudocode statement in detail:

Recall that a and b represent the addresses (locations) 000055555555802c and 0000555555558030, respectively. We refer to the contents at these addresses as (a) and (b).

Adding Numbers in C and C++

In C and C++, we express this operation as:

Adding Numbers in Assembly Language

Assembly language uses the ADD instruction. However, due to AMD64 and Intel EM64T architecture limitations, we can't use both memory addresses in a single instruction. For instance, add a, b isn't valid. Instead, we must use a register as an intermediary:

Alternatively, we can use two registers:

In assembly language, this translates to:

Adding Numbers in GDB Disassembly

The GDB disassembly output shows:

Here's how our pseudocode translates into assembly language:

The figure below depicts the memory layout after executing the ADD and MOV instructions.

Incrementing and Decrementing Numbers

In pseudocode, incrementing or decrementing the number stored at address a is straightforward:

Increment and Decrement in C and C++

C and C++ offer three ways to increment and decrement:

Increment and Decrement in Assembly Language

Assembly language uses the INC and DEC instructions:

We use incl to specify a 32-bit memory cell, as a alone is ambiguous between 32-bit and 64-bit. However, %eax implies 32-bit values, so inc suffices.

Increment and Decrement in GDB Disassembly

GDB disassembly shows:

Or alternatively:

Here's the assembly language translation of increment:

After executing the INC or ADD instruction, the memory layout changes as shown in the figure below.

Multiplying Numbers

To multiply two numbers in pseudocode, we write:

This instruction multiplies the number at address b by the number at address a, storing the result back at address b.

Multiplication in C and C++

In C and C++, we can express multiplication in two ways:

Multiplication in Assembly Language

Assembly language uses the IMUL (Integer MULtiply) instruction. Here's a basic implementation:

This sequence loads a into %EAX, multiplies it by b, and stores the result back in b. Alternatively, we can use registers for all operands:

Multiplication in GDB Disassembly#

The GDB disassembly output shows:

These instructions demonstrate how the compiler translates our high-level multiplication into low-level assembly operations.

Following the execution of the IMUL and MOV instructions, the resulting memory layout is depicted in the figure below.