type
status
date
slug
summary
tags
category
icon
password
URL
Inside an idealized computer
Computer memory is composed of a series of memory cells, each uniquely identified by a specific address. While these cells typically store numerical values, the term "number" is used broadly to encompass not only actual numbers but also the instructions and data integral to a program’s functionality.
Given that accessing memory is often slower than executing arithmetic operations, processors are equipped with registers. These registers act as high-speed storage locations that temporarily hold data, thereby significantly accelerating complex computational tasks. Essentially, registers can be thought of as specialized memory cells within the CPU, identified by their unique names, which serve as addresses.
Inside an Intel 64-bit PC
Intel 64-bit PC, memory addresses for locations containing integer values typically differ by 44 or 88 bytes, depending on the system's architecture. This architecture also includes two specific registers:
%RAX
and %RDX
. Notably, the lower 32 bits of these registers are referred to as %EAX
and %EDX
, respectively.Memory layout and registers
Our uses two memory addresses called
a
and b
. We can think of a
and b
as names for these addresses. We'll use a special notation where (a)
means "the contents at memory address a
." In C or C++, we declare and define these memory locations as:When we load a program, static memory locations are initially filled with zeros. The figure below shows our initial memory layout after loading the program:
Assigning Numbers to Memory Locations
Remember,
a
represents both the address (location) of the storage unit and the name of the address 000055555555802c
. Here, (a)
refers to the contents (number) stored at address a
.Assigning Numbers in C and C++
In C and C++,
a
is a variable. We assign it a number just like any other variable:Assigning Numbers in Assembly Language
In Intel assembly language, we write:
Assigning Numbers in GDB Disassembly
In the GDB disassembly, we'll see the following code, with the variable
a
and its address shown in the comment:The right column of the table below shows how our pseudocode translates to assembler language:
Notice that
movl
is used instead of mov
. This is because a
and b
can point to either 32-bit memory cells (like %EAX
or %EDX
registers) or 64-bit memory cells (like %RAX
and %RDX
registers). For registers, their names clearly indicate whether we're using 64-bit (e.g., %RAX
) or 32-bit (e.g., %EAX
) versions. However, for memory addresses a
and b
, it's not immediately clear if they refer to 64-bit or 32-bit cells. We use movl
to clarify that we're using 32-bit memory cells, which can hold integers from 0 to 4,294,967,295.The
0x2ef2(%rip)
address is the compiler's way of generating code to calculate the address of a
rather than specifying it directly. This approach requires less memory space. Literal constants are prefixed with $
. For instance, in $0x1
, the 0x
prefix indicates that the following number is hexadecimal. In the comments, leading zeroes of addresses are omitted. For example, the variable a
has the address 000055555555802c
, which is represented as 0x55555555802c
. Note that the movement direction is from left to right in both the disassembly output and the pseudocode.After executing the first two assembly language instructions, we have the memory layout shown in the figure below.
Assigning Numbers to Registers
Assigning numbers to registers is similar to memory assignments. In pseudocode, we can write:
Note that we don't use brackets when referring to register contents. The second instruction copies the number from the address
a
to a register.Assigning Numbers in Assembly Language
In assembly language, we write:
Assigning Numbers in GDB Disassembly
In the GDB disassembly output, we see:
Adding Numbers to Memory Cells
Adding Numbers
Let's examine the following pseudocode statement in detail:
Recall that
a
and b
represent the addresses (locations) 000055555555802c
and 0000555555558030
, respectively. We refer to the contents at these addresses as (a)
and (b)
.Adding Numbers in C and C++
In C and C++, we express this operation as:
Adding Numbers in Assembly Language
Assembly language uses the
ADD
instruction. However, due to AMD64 and Intel EM64T architecture limitations, we can't use both memory addresses in a single instruction. For instance, add a, b
isn't valid. Instead, we must use a register as an intermediary:Alternatively, we can use two registers:
In assembly language, this translates to:
Adding Numbers in GDB Disassembly
The GDB disassembly output shows:
Here's how our pseudocode translates into assembly language:
The figure below depicts the memory layout after executing the
ADD
and MOV
instructions.Incrementing and Decrementing Numbers
In pseudocode, incrementing or decrementing the number stored at address
a
is straightforward:Increment and Decrement in C and C++
C and C++ offer three ways to increment and decrement:
Increment and Decrement in Assembly Language
Assembly language uses the
INC
and DEC
instructions:We use
incl
to specify a 32-bit memory cell, as a
alone is ambiguous between 32-bit and 64-bit. However, %eax
implies 32-bit values, so inc
suffices.Increment and Decrement in GDB Disassembly
GDB disassembly shows:
Or alternatively:
Here's the assembly language translation of increment:
After executing the
INC
or ADD
instruction, the memory layout changes as shown in the figure below.Multiplying Numbers
To multiply two numbers in pseudocode, we write:
This instruction multiplies the number at address
b
by the number at address a
, storing the result back at address b
.Multiplication in C and C++
In C and C++, we can express multiplication in two ways:
Multiplication in Assembly Language
Assembly language uses the
IMUL
(Integer MULtiply) instruction. Here's a basic implementation:This sequence loads
a
into %EAX
, multiplies it by b
, and stores the result back in b
. Alternatively, we can use registers for all operands:Multiplication in GDB Disassembly#
The GDB disassembly output shows:
These instructions demonstrate how the compiler translates our high-level multiplication into low-level assembly operations.
Following the execution of the
IMUL
and MOV
instructions, the resulting memory layout is depicted in the figure below.