DATA ORGANIZATION: THE ENDIANNESS DEBATE 🥚
When you have a number bigger than one byte—like a 32-bit integer—it takes up multiple memory slots. This immediately raises a question:
Which byte goes first in memory?
Do we store the number left-to-right (like how we normally write numbers) or right-to-left? The answer depends on the system, and this is called Byte Ordering, or more formally, Endianness.
Little Endian (The Intel Way)
Most x86 processors (Intel, AMD) use Little-Endian. Here’s what that means:
1. The Rule
The “Little End,” meaning the Least Significant Byte (LSB), goes into the lowest memory address.
The “Most Significant Byte (MSB)” is stored last, at a higher address.
2. Why Little-Endian?
The CPU stores the LSB first because it’s the “least important” byte in the overall number.
This ordering actually makes some math operations more efficient, since addition and other arithmetic start from the LSB.
Think of it like writing numbers on paper backward so you can start calculating immediately with the ones place.
3. Example
Hex number: 12345678h
78 → LSB (“Little End”)
12 → MSB (“Big End”)
Memory layout in Little-Endian:
Notice how it looks reversed in memory—but inside a CPU register, it still looks normal (12345678h). Only the memory order is “scrambled.”
Big Endian (The “Human” Way)
Other systems, like some ARM or MIPS processors, use Big-Endian.
1. The Rule
The “Big End,” meaning the Most Significant Byte (MSB), goes into the lowest memory address.
The LSB comes last.
2. Why Big-Endian?
This looks natural to humans, because the number is stored left-to-right, just like we read or write it on paper.
3. Example
Same number: 12345678h
Memory layout in Big-Endian:
Now memory mirrors the way we normally write numbers, but certain CPU operations may be slightly less convenient.
Why Endianess Matters
If you save a file on a Little-Endian PC and try to read it on a Big-Endian machine without converting it, your numbers can become completely scrambled.
Example: Number 1 is stored in 32 bits: 00 00 00 01 (Little-Endian).
A Big-Endian system might read it as 01 00 00 00, which equals 16,777,216 instead of 1.
Bottom line: Understanding endianness is crucial for:
Porting software across platforms
Networking, where different systems communicate binary data
File formats, where byte order is explicitly defined
BIG IDEAS TO REMEMBER
Little Endian (x86):
LSB comes first (lowest memory address)
Useful for CPU arithmetic
Big Endian:
MSB comes first
Aligns with human reading/writing habits
Memory vs. Register:
Inside a register (like EAX), numbers always appear correctly (12345678h).
The “scrambling” only happens when stored in RAM.
Quick Memory Trick
Little-Endian: Think of it like writing a number “backwards” in memory so the CPU can start calculating from the ones place immediately.
Big-Endian: Think of it like writing numbers normally—humans like it, CPUs don’t mind, but arithmetic can be less convenient.
Declaring Uninitialized Data
I. The .DATA? Directive: Ghost Storage 👻
In earlier notes, we used .DATA to define variables with specific values, like:
This works when you know the exact value your variable should start with. But what happens if you need a massive buffer, like a temporary storage for an image or a large array, and you don’t know the values yet?
That’s where .DATA? comes in.
II. The Concept
.DATA? tells the assembler:
“I want to reserve memory for these variables, but I don’t want to initialize them yet.”
Variables declared here exist in memory at runtime, but the compiled executable doesn’t store any actual values for them.
In other words, your program asks the OS to allocate RAM when it runs, instead of bloating the .exe file with zeros or other initial values.
III. The Magic Behind the Scenes
Here’s what happens under the hood:
Disk vs RAM
.DATA variables with initial values are stored directly in the .exe file. That means the file grows by the size of the initialized data.
.DATA? variables occupy no extra space on disk. The program only tells the OS: “When you run me, please give me this much memory.”
Runtime Allocation
The OS reserves the memory in your process’s Data Segment (part of RAM) when your program starts.
The bytes in .DATA? contain garbage values, meaning whatever bits were previously in that RAM location—so you must initialize them before use.
Why this matters
Using .DATA? is a disk-space optimization. Large arrays can dramatically increase your executable’s size if initialized with .DATA.
It’s also a performance-friendly habit, because loading huge initialized blocks from disk is slower than just letting the OS give you “empty” RAM.
Fat File vs Skinny File: An Example
Let’s compare:
Key insight: .DATA? is like putting a sticky note in memory that says: “Set aside this much space, but I don’t care what’s in it yet.”
Garbage Alert ⚠️
The ? symbol doesn’t zero out memory—it leaves random bits that were already in RAM.
Always initialize variables from .DATA? before reading them, otherwise your program may behave unpredictably.
MIXING CODE AND DATA (THE “MESSY ROOM” METHOD)
MASM allows you to switch between code and data anywhere:
What really happens:
The assembler automatically moves temp to the Data Segment, even if you declared it in the middle of your code.
CPU execution remains linear, and registers store temporary values as expected.
Best Practice
While possible, don’t scatter data declarations in code.
Keep .data and .code separate for clarity and maintainability.
DECLARING TYPES: THE CHEAT SHEET
When using .DATA?, you still need to pick the right size, even if the value is unknown
Remember: .DATA? = “Reserve this space at runtime.”
? = “I don’t care what’s inside yet—initialize it before use!”
Readability Pro-Tips
Writing assembly is hard enough without messy formatting. Follow these rules:
Capitalize directives (.DATA?, .CODE) for clarity.
Indent consistently—makes loops and logic easier to scan.
Use comments liberally—Future-You will thank Present-You.
Clear labels—especially for jumps or memory offsets.
Big Idea to Remember
✅ .DATA? = Runtime Allocation: space is reserved in RAM, not on disk.
✅ ? = Uninitialized: the memory contains random garbage.
✅ Always initialize before reading.
Analogy: .DATA is like packing boxes with items before shipping—takes space in the truck (.exe). .DATA? is like telling the warehouse: “Hold this space for me when I arrive”—the truck stays empty until runtime.
SYMBOLIC CONSTANTS (MAKING ASSEMBLY HUMAN)
Assembly is already low-level and unforgiving. One of the few tools you get to make it readable and maintainable is symbolic constants.
At their core:
A symbolic constant is a name that represents a fixed value that never changes.
Instead of scattering raw numbers and strings all over your code, you give them names. That way:
Your code reads like intent, not math homework
You change a value once, not everywhere
You avoid “magic numbers” that nobody remembers later
I. THE BIG IDEA
Instead of this:
You do this:
Same machine code.
Much better for humans.
If you ever want MAX_VALUE to be 200 instead?
You change one line.
II. WHAT SYMBOLIC CONSTANTS ARE (AND ARE NOT)
✅ What they are
Compile-time substitutions
The assembler literally replaces the name with the value
They do not exist in memory
They do not take space in RAM or the .exe
❌ What they are not
They are not variables
You cannot change them at runtime
You cannot take their address
Think of them as smart text replacement, not storage.
HOW MASM CREATES SYMBOLIC CONSTANTS
MASM gives you three main ways to define symbolic constants:
I) INTEGER EXPRESSIONS ONLY
Used for numbers
Can include expressions
Anywhere you use PIXELS, MASM replaces it with 480000.
II) EQU (NUMBERS OR TEXT)
EQU is more flexible.
Numeric example:
Text example:
What EQU really means:
“Whenever you see this name, replace it with exactly this text.”
So, these two lines are identical to the assembler:
⚠️ Important:
This does not create a string in memory.
It only replaces text at assembly time.
III. TEXTEQU (TEXT ONLY, LONGER TEXT)
Mostly used for macros or long text blocks
Same idea as EQU, just specialized for text
Rare in beginner code, but good to know it exists
IV. WHERE YOU CAN USE SYMBOLIC CONSTANTS
Symbolic constants can be used anywhere MASM expects:
A number
A memory address
A text string (depending on the directive)
Example: This moves 100 into EAX.
V. STRINGS: WHERE CONFUSION USUALLY STARTS 🔥
Let’s clear this up once and for all, because this is where the original notes went sideways.
❌ WRONG IDEA (COMMON MYTH)
“You can store a string in a DWORD for performance.”
❌ This is false.
A DWORD is 4 bytes.
A string like "Hello, world!" is 13 bytes + null terminator.
You physically cannot store a string inside a single DWORD.
VI. THE CORRECT WAY TO STORE STRINGS
✅ Strings are stored as byte arrays
DB = Define Byte
Each character = 1 byte
0 = null terminator (required by Windows APIs)
This creates actual memory holding the characters.
VII. WHY THIS FAILED CODE IS WRONG
Why both are wrong:
MY_TEXT_CONSTANT is text substitution, not data
DWORD expects a 32-bit number, not characters
DB MY_TEXT_CONSTANT expands to:
❌ MASM does not automatically add a null terminator
❌ Still not a real string unless written correctly
VIII. THE RIGHT WAY TO COMBINE CONSTANTS + STRINGS
Option 1: Direct string storage (most common)
Option 2: Use constants for readability
Now you get:
Readable code
Correct memory layout
Proper null-terminated string
IX. THE BIG CONFUSION: DWORD + STRINGS
Here’s the truth bomb that clears everything up:
✅ You cannot store a string in a DWORD, but you CAN store a POINTER to a string in a DWORD.
✅ This is valid and common:
myMessage → actual string bytes
myMessagePtr → address of the string (32-bit pointer)
Windows API calls expect pointers, not strings themselves.
X. WHY MessageBoxA WORKS
You are not passing the string.
You are passing:
The address of the string
That address fits in a DWORD (on 32-bit systems)
This is why the confusion happens.
XI. FINAL SUMMARY
✅ Facts to remember
Symbolic constants:
Exist only at assembly time
Do not allocate memory
Improve readability and maintainability
Strings:
Must be stored using DB
Must be null-terminated
Cannot fit in a DWORD
DWORD + strings:
❌ You cannot store characters in a DWORD
✅ You can store a pointer to a string in a DWORD
XII. BIG IDEA TO REMEMBER 🧠
Constants replace text.
Variables reserve memory.
DWORDs hold numbers or addresses, not characters.
THE EQUAL-SIGN (=) DIRECTIVE: SMART REPLACEMENT 🧠
The equal-sign directive (=) is one of the simplest—and most powerful—tools in MASM.
At a high level:
The = directive associates a symbol name with an integer expression.
That’s it.
No memory.
No runtime behavior.
Just compile-time substitution.
I. What the = Directive Actually Does
When you write COUNT = 500
You are telling the assembler:
“Whenever you see the word COUNT, replace it with the number 500.”
That replacement happens during the assembler’s preprocessing step, before machine code is generated.
This means:
COUNT is not a variable
It does not exist in memory
The CPU will never know COUNT existed
Only the number 500 survives into the final machine code.
II. Step-by-Step: What MASM Really Sees
Let’s say your source file starts like this: COUNT = 500
Ten lines later, you write: MOV EAX, COUNT
What you wrote: mov eax, COUNT
What MASM turns it into internally: mov eax, 500
By the time the assembler is done, the symbol COUNT is completely gone.
🔥 Key insight:
The assembler does textual replacement, not runtime evaluation.
III. What Can the Expression Be?
Although COUNT = 500 is the most common case, the expression can be more complex:
MASM computes the math at assembly time, and replaces BUFFER_SIZE with 400.
This is incredibly useful for:
Array sizes
Offsets
Limits
Configuration values
IV. Why Use Symbols Instead of Literal Numbers?
You could write this: mov eax, 500
But now imagine this number appears:
12 times
In loops
In comparisons
In array bounds
Six months later, you realize it should be 600.
Now you’re hunting through code, hoping you don’t miss one.
V. The Real Power: One Change, Everywhere
Using a symbol: COUNT = 500
Later:
Now, when requirements change: COUNT = 600
Reassemble the program.
✅ Every instance of COUNT is automatically replaced with 600
✅ No logic changes
✅ No missed updates
✅ No bugs from inconsistent values
This is maintainability, not convenience.
VI. Why This Matters in Assembly (More Than High-Level Languages)
In high-level languages, constants are common and expected.
In assembly:
Numbers have no meaning on their own
500 could be:
A loop limit
A buffer size
A timeout
A magic value with special meaning
Using symbols gives semantic meaning to raw numbers.
Compare:
Same machine code.
Very different readability.
VII. Important Limitations of =
The equal-sign directive has rules:
✅ Allowed
Integer values
Integer expressions
Arithmetic with other symbols
❌ Not allowed
Strings
Memory definitions
Runtime changes
This will not work:
Why?
Because = is resolved before registers or memory even exist.
VIII. = vs Variables (Critical Distinction)
Let’s compare a symbolic constant:
COUNT = 500
Exists only at assembly time
No memory
Cannot change at runtime
To a variable:
Stored in memory
Can be modified by the program
Takes space in RAM and the executable
🔥 Rule of thumb:
If the value never changes → use =
If the value changes → use .data
IX. BIG IDEA TO REMEMBER 🧠
The = directive is a promise to the assembler, not the CPU.
It says: “Replace this name with this number everywhere, before the program even exists.”
Or even shorter: = makes your code readable.
The assembler does the boring work.
THE CURRENT LOCATION COUNTER $ (WHERE AM I RIGHT NOW?)
Assembly programs don’t magically know where things live in memory.
Someone has to keep track of addresses as code and data are laid out.
That “someone” is the assembler, and the tool it uses is called the:
Current Location Counter (LC)
Also known as the Assembly Pointer (AP)
Represented by the symbol $
I. What the Current Location Counter Really Is
The current location counter ($) is a special symbol that always represents:
“The address in memory where the assembler is currently writing.”
Not where the CPU is executing.
Not where the program is running.
But where the assembler is placing bytes while building your program.
🔥 This is a compile-time concept, not a runtime one.
II. How the Assembler Uses $
When the assembler starts reading your source file:
The location counter starts at 0
As instructions and data are processed:
$ increases
By exactly the number of bytes generated
Example:
As each instruction is assembled:
$ moves forward
Just like a cursor writing bytes into memory
III. $ Is How Labels Get Their Addresses
Every label you write is secretly tied to $.
What the assembler really does is:
“At this moment, $ equals some address.
Associate start with that address.”
So, you can think of a label as:
…but automatically managed for you.
IV. Using $ Directly (The Self-Pointer Trick)
Now let’s look at this line:
This is subtle, clever, and 100% legal.
What’s happening step by step:
The assembler reaches selfPtr
$ currently points to the address where selfPtr will be stored
The value of $ is written into that memory location
Result:
selfPtr contains its own address
In plain English:
“Create a variable, and store the address of that variable inside itself.”
Why this works
DWORD → 4 bytes of storage
$ → the address of the first byte of those 4 bytes
So selfPtr ends up holding a pointer to itself.
This is useful in:
Low-level memory structures
Tables of pointers
Self-describing data layouts
V. Very Important Clarification ⚠️
$ is not a CPU register.
The CPU never sees $
$ does not exist at runtime
$ is resolved during assembly
By the time the program runs:
$ is gone
Only raw numbers (addresses) remain
VI. The Book Analogy (Why This Helps)
Imagine you’re writing a book.
The location counter is the current page number
Every time you write more text, the page number increases
When you say:
“See page 42”
You’re doing the same thing assembly does with $.
In assembly:
Memory = book
Addresses = page numbers
$ = “the page I’m currently writing on”
VII. Symbolic Constants + $ (Power Combo)
You can combine $ with the = directive:
Now, here becomes a symbolic constant equal to the current address.
This is useful for:
Computing offsets
Measuring sizes
Building jump tables
Aligning structures
VIII. Keyboard Definitions (Related but Different)
This example: Esc_key = 27
Is not related to $, but it uses the same idea of symbolic clarity.
Instead of writing:
You write:
Same machine code.
Much clearer intent.
IX. $ Vs Variables (Critical Distinction)
Compare these two:
Symbolic address:
Exists only at assembly time
No memory
No storage
Variable:
Allocates memory
Has a runtime address
Can be modified
🔥 Rule of thumb:
$ tells you where things are
Variables hold what things are
X. DUP OPERATOR CONNECTION
The DUP operator uses symbolic constants, not $, but they often appear together.
Here:
COUNT is resolved first
DUP allocates memory
$ advances as each DWORD is written
Everything still flows through the location counter.
XI. REDEFINING SYMBOLS (ASSEMBLER-TIME ONLY)
This part is crucial and often misunderstood:
What gets assembled is:
Why?
Because:
The assembler reads the file top to bottom
Symbols change value as the assembler processes lines
Runtime execution order does not matter at all
🔥 Assembler time ≠ Runtime
XII. FINAL CLEAN SUMMARY (STICK THIS IN YOUR BRAIN)
The Current Location Counter ($)
The symbol $ represents the current memory address during assembly. It is managed entirely by the assembler and automatically updates as code or data is generated. It can also be used to initialize pointers.
What $ Is
$ exists only at compile time. It is focused on memory addresses and does not appear or exist during program execution.
What $ Is NOT
$ is not a CPU register, not a variable, and not something you can directly modify.
BIG IDEA TO REMEMBER
$ answers one key question:
“Where is the assembler writing right now?”
Once you understand this, concepts like labels, pointers, offsets, and memory layout become much clearer.
Whatever the hell we read in that section😂😂😂
That's what I call real assembly language as a rite of passage, meet the $
ARRAY SIZE CALCULATION WITH THE $ OPERATOR
What does “size of an array” mean in assembly?
When we talk about the size of an array in assembly, we’re usually asking one of two things:
How many bytes does this array occupy in memory?
How many elements does this array contain?
Assembly doesn’t track “arrays” the way high-level languages do. To the assembler, an array is just a block of consecutive bytes in memory. It’s your job to keep track of how big that block is and how you intend to use it.
There are two common ways to do this:
Explicitly specify the size
Let the assembler calculate it using the $ operator
Declaring an array by explicitly stating its size
Example:
This line means:
array is a label (a name for a memory location)
BYTE means each element is 1 byte
16 means reserve 16 bytes of memory
That’s it. Nothing more, nothing less.
Important clarification (this trips people up a lot)
array BYTE 16 does NOT mean:
“The first element has the value 16”
“The array contains the number 16”
It means:
“Allocate 16 bytes of storage starting at the label array.”
The contents of those 16 bytes are uninitialized unless your assembler or environment zero-fills memory (many don’t).
Think of it like this:
You just told the assembler:
“Please set aside 16 empty lockers and call the first one array.”
Accessing elements in the array
Once declared, the label array acts as the base address of that block of memory.
The assembler calculates the correct address automatically:
This works because:
Each element is 1 byte (BYTE)
Offsets are counted in bytes
Initializing an array with values
Now compare that with this:
This declaration means something completely different:
Allocate 4 bytes
Initialize them with these values:
array[0] = 10
array[1] = 20
array[2] = 30
array[3] = 40
So:
This is one of the most important distinctions in assembly:
a single number can mean “size” or “data” depending on context.
Letting the assembler calculate array size using $
What is $?
The $ symbol represents the current location counter (LC).
The LC is the assembler’s internal pointer that always says:
“This is the address where the next byte will be placed.”
As the assembler processes data declarations, $ increases automatically.
What happens here:
The assembler lays down 4 bytes for array
$ now points just past the last byte
Subtracting the starting address (array) gives:
So array_size equals the number of bytes occupied by the array.
⚠️ Critical rule:
array_size must be defined immediately after the array declaration.
If anything else is declared in between, $ will no longer reflect the end of the array.
Calculating the size of a string
Strings are just byte arrays.
This gives you:
The total number of bytes in the string
Including the null terminator (0)
That’s often exactly what you want when working with string routines.
If your string spans multiple lines or declarations, the same rule applies — as long as string_size comes immediately after the final byte.
Arrays of WORDs and DWORDs
So far, everything has been in bytes. But what if your array elements are larger?
WORD arrays (2 bytes per element)
Why divide by 2?
$ - list gives total bytes
Each element is 2 bytes
Dividing converts bytes → elements
Result: list_size = 4
DWORD arrays (4 bytes per element)
ARRAY SIZE & MEMORY UNDERSTANDING
Calculating Number of Elements
$ - list gives the total number of bytes used by the array.
Dividing that value by 4 (for DWORD arrays) gives the number of elements.
KEY IDEAS TO REMEMBER
Assembly’s Perspective
Assembly language does not understand “arrays” as high-level structures. It only works with memory and labels.
Role of BYTE, WORD, DWORD
These directives tell the assembler how much space each element occupies:
BYTE → 1 byte
WORD → 2 bytes
DWORD → 4 bytes
Role of $
$ tells you the current location in memory where the assembler is writing.
Meaning of ($ - label)
($ - label) calculates the total amount of memory used since that label, measured in bytes.
FINAL SUMMARY
Reserving Space Without Initialization
array BYTE 16
This reserves 16 bytes of memory without assigning values.
Initializing Values
array BYTE 10,20,30,40
This reserves 4 bytes and stores the given values.
Measuring Total Size
$ - array
This calculates the total number of bytes used by the array.
Converting Bytes to Element Count
Dividing the total bytes by the size of each element gives the number of elements.
BIG IDEA TO REMEMBER
$ is simply the assembler saying: “Here’s where I am in memory right now.”
Once you internalize this, working with memory layout, arrays, and size calculations becomes much more intuitive.
THE EQU DIRECTIVE: GIVING NAMES TO VALUES (AND IDEAS)
The EQU directive is how you give a name to something in assembly—whether that “something” is a number, a calculation, or even a piece of text.
Think of EQU as a label-maker for constants. Once you define a name with EQU, the assembler treats that name as a stand-in for whatever you assigned to it.
When the assembler later sees that name anywhere in your program, it literally substitutes the value or text you defined. There’s no memory involved, no runtime work—this all happens at assembly time.
THE THREE FORMS OF EQU
There are three valid ways to use EQU, depending on what you want the symbol to represent.
1. name EQU expression
This is the most common form. The expression must evaluate to an integer at assembly time.
Here’s what’s happening:
The assembler evaluates 10 * 10
It gets 100
Everywhere it sees matrixSize, it substitutes 100
So, later code like this:
is treated as if you had written:
This makes your code clearer and easier to change later.
2. name EQU symbol
In this form, you’re defining one symbol in terms of another.
This is useful when:
You want multiple names for the same value
You’re improving readability
You’re abstracting meaning (e.g., “what this address represents”)
Again, substitution happens during assembly—not at runtime.
3. name EQU <text>
This form lets you associate a symbol with arbitrary text, not just numbers.
Important detail:
The assembler does not evaluate this as a number
It simply copies the text wherever the symbol appears
This is especially useful for:
Real-number constants
Strings
Data definitions that don’t evaluate to integers
WHAT EQU REALLY DOES (BEHIND THE SCENES)
A key idea to understand:
EQU does not allocate memory.
It doesn’t reserve space, and it doesn’t create a variable.
It only tells the assembler:
“Whenever you see this name, replace it with this value or text.”
So EQU is purely a compile-time substitution tool.
USING EQU FOR COMPLEX OR TRICKY CALCULATIONS
One of the biggest strengths of EQU is that it lets you define expressions that would be annoying—or error-prone—to calculate by hand.
I. Example: Defining a Stack Address
What’s going on here?
_end is a symbol automatically provided by the assembler
It marks the end of your program’s code/data
You add 1024 bytes to that address
The result becomes the symbolic name stackStart
Now, instead of scattering magic numbers throughout your code, you have a clear, meaningful name that explains why that address exists.
II. One Very Important Rule: EQU Is Immutable
Once you define a symbol with EQU, you cannot redefine it in the same source file.
This is intentional.
Why?
It prevents accidental redefinitions
It guarantees that a symbol always means the same thing
It makes large programs safer and easier to maintain
When you see a name defined with EQU, you can trust that its value never changes.
III. EQU vs = (or ==) — A Crucial Difference
Assemblers often support another directive: = (or ==, depending on the assembler).
EQU: Constant, fixed, immutable
Defined once
Cannot change
Best for true constants
=: Redefinable
The symbol’s value can change
Useful for assembly-time calculations or conditional assembly
More flexible, but also easier to misuse
IV. The Big Picture Difference
If you want stability and clarity, use EQU.
If you need flexibility during assembly, use =.
V. Why EQU Matters in Real Programs
Using EQU properly:
Makes code easier to read
Eliminates “magic numbers”
Reduces bugs caused by inconsistent values
Makes large assembly programs manageable
In short:
EQU lets you name ideas, not just numbers.
And that’s one of the most powerful things you can do in assembly.
TEXTEQU DIRECTIVE
I. Big picture: what problem does TEXTEQU solve?
When you write assembly, you often repeat the same pieces of text:
instruction names (mov, add)
operands (al, eax)
constants
short instruction sequences
TEXTEQU exists so you don’t have to keep retyping those pieces. Instead, you give them a name, and the assembler swaps that name with its associated text before assembly actually happens.
So, the key idea is this:
TEXTEQU creates text substitutions, not variables and not memory.
Nothing is stored at runtime. This is all about helping you and the assembler.
II. What exactly is a text macro?
A text macro is a named chunk of text that the assembler copies and pastes wherever the name appears.
Think of it like:
a global “find and replace”
or a nickname for a piece of assembly code
When the assembler sees the macro name, it replaces it with its text verbatim (after expression evaluation, if any).
THE THREE FORMS OF TEXTEQU
There are three common ways to define a text macro, depending on what you want the macro to represent.
I. Assigning literal text
Here, you’re directly attaching a chunk of text to a name. The angle brackets < > tell the assembler:
“Treat everything inside here as literal text.”
Example:
What this means:
continueMsg is now a text macro
Wherever continueMsg appears later, the assembler replaces it with:
This is especially useful for:
prompt messages
repeated strings
long instruction fragments
It makes your code easier to read and easier to change later.
II. Assigning one text macro to another
Now move is simply another name for mov.
This might look pointless at first—but it becomes powerful when you start combining macros to build instructions dynamically.
III. Assigning a constant expression
The % is important. It tells the assembler:
“Evaluate this expression now, then convert the result into text.”
So, this is still a text macro, but its value comes from a calculation.
Building text macros step by step (important example)
Let’s walk slowly through this example and see what the assembler is actually doing.
rowSize is a numeric constant with value 5
count becomes a text macro
The expression (rowSize * 2) is evaluated immediately
The result is 10
So count is replaced with the text 10
At this point: count → 10
Now we add another layer:
Now we combine everything:
The assembler expands this in stages:
move → mov
count → 10
Final expansion:
NB: Am using onecompiler.com for these images together with sharex gradient mode.
So, whenever you write setupAL, the assembler literally sees:
This is a huge deal for readability and maintainability.
Why this matters in real code
Text macros let you:
Avoid magic numbers scattered everywhere
Centralize instruction patterns
Build readable “mini-commands” out of raw assembly
Change behavior in one place instead of dozens
They’re especially useful in:
loops
setup code
repeated register initialization
macro-heavy assembly projects
Redefining text macros (key difference from EQU)
One very important rule:
Text macros defined with TEXTEQU can be redefined.
This is not true for EQU.
That means you can do something like:
From that point forward, mode expands to RELEASE.
This makes TEXTEQU flexible, but it also means:
You must be careful about scope and order
Earlier code uses the old definition
Later code uses the new one
What TEXTEQU is not
To avoid common confusion:
❌ It does not reserve memory
❌ It does not create a runtime variable
❌ It does not generate instructions by itself
It only tells the assembler how to rewrite your source code text before assembling it.
Mental model
If you remember one sentence, make it this:
TEXTEQU is a compile-time text substitution tool that helps you write clearer, reusable assembly code.
No runtime cost. No memory impact. Just smarter source code.
QUESTIONS FOR DATA DEFINITION
This section is all about symbolic constants and text macros in assembly. These tools don’t generate machine code by themselves — instead, they make your programs clearer, safer, and easier to change later.
Think of them as labels for values or text that the assembler substitutes before your program is turned into machine instructions.
1. Declaring a symbolic constant for the Backspace key (ASCII 08h)
In ASCII, each key or character corresponds to a numeric code.
The Backspace character has the hexadecimal value 08h.
Instead of sprinkling 08h throughout your code (which would be hard to read and easy to forget), we give it a meaningful name.
Using the EQU directive
What this means
EQU tells the assembler:
“Whenever you see BackspaceKey, replace it with 08h.”
No memory is allocated.
The value can’t change later.
Why this matters
Your code becomes self-documenting.
If the value ever changes, you only update it in one place.
2. Declaring the number of seconds in a day
A 24-hour day contains:
24 hours
60 minutes per hour
60 seconds per minute
Instead of calculating this manually and hard-coding the result, we let the assembler do the math.
What’s happening
The assembler evaluates the expression at assembly time.
SecondsInDay becomes a constant equal to 86400.
Why this matters
Clear intent: anyone reading the code instantly understands where the number comes from.
No magic numbers.
3. Calculating the size of an array using SIZEOF
Suppose we define an array of words:
This creates:
20 elements
Each element is a WORD (2 bytes)
Instead of manually calculating 20 * 2, we ask the assembler for the size.
What this does
SIZEOF returns the total number of bytes occupied by myArray.
In this case: 40 bytes.
Why this matters
If you change the array size or type later, ArraySize updates automatically.
Prevents buffer overruns and logic errors.
4. Redefining keywords using TEXTEQU
Unlike EQU, which replaces values, TEXTEQU replaces text.
Here, we redefine the keyword proc to expand into procedure.
What’s happening
Whenever the assembler sees proc, it substitutes the text procedure.
This happens before assembly, like a macro expansion.
Why this matters
Lets you customize syntax.
Useful for readability, teaching, or matching naming conventions.
5. Creating a string constant with TEXTEQU
We can also use TEXTEQU to define reusable text, such as strings.
Step 1: Define the text macro
Step 2: Use it to define a string in memory
What’s happening
Sample is replaced with the quoted string text.
MyString becomes a byte array initialized with that string.
Why this matters
You can reuse the same string in multiple places.
Changing the text in one spot updates it everywhere.
6. Assigning a full instruction using TEXTEQU
TEXTEQU can even represent entire lines of code, not just single words.
Example: Loading the address of myArray into ESI
What this means
Writing SetupESI anywhere in your code expands to:
Why this matters
Great for setup sequences used repeatedly.
Keeps your code DRY (Don’t Repeat Yourself).
Makes intent clearer: “this sets up ESI.”
7. Big Picture Takeaway
EQU → symbolic constants (numbers, expressions, sizes)
SIZEOF → assembler-calculated memory size
TEXTEQU → text substitution (keywords, strings, instructions)
These tools don’t change how the CPU runs your program, they change how humans understand and maintain it.
MOVING FROM 32-BIT TO 64-BIT ASSEMBLY PROGRAMMING
WHAT THIS MODULE IS ABOUT
This module explains how and why assembly language changes when moving from 32-bit (x86) to 64-bit (x64) programming. While the core ideas of assembly remain the same, the environment, rules, and conventions change in important ways.
64-bit programming isn’t “harder,” but it is different — and understanding those differences early prevents confusion and errors later.
High-Level Differences Between 32-Bit and 64-Bit Programs
Here’s what fundamentally changes when we move to 64-bit assembly:
Registers are wider
32-bit registers hold 32 bits of data.
64-bit registers hold 64 bits, allowing much larger values.
More registers are available
32-bit systems have fewer general-purpose registers.
64-bit systems provide 16 general-purpose registers, giving programmers more flexibility and performance.
More memory can be addressed
32-bit programs are limited to about 4 GB of memory.
64-bit programs can access far more memory, which is critical for modern applications.
Calling conventions change
How functions receive parameters and return values is different in 64-bit mode.
These differences affect how code is written, even when the program logic stays the same.
Understanding the AddTwoSum Example (32-Bit vs 64-Bit)
I. Why Use AddTwoSum?
The AddTwoSum program is a small, simple example that makes it easy to see the differences between 32-bit and 64-bit assembly without distractions.
The program:
1. Loads two numbers
2. Adds them
3. Stores the result
4. Exits cleanly
Same idea — different rules.
II. The 32-Bit Version (Overview)
In 32-bit assembly, the program includes several directives and conventions that are required for that environment:
What’s Going On Here?
.386 specifies the processor type
.model flat, stdcall defines the memory model and calling convention
.stack 4096 sets up stack space
INVOKE automatically handles parameter passing
The entry point (main) is explicitly specified
This structure is normal and necessary in 32-bit MASM.
III. The 64-Bit Version (Key Differences)
Now compare it with the 64-bit version:
What Changed — and Why?
No .386, .model, or .stack directives
These are not used in 64-bit MASM.
No INVOKE instruction
64-bit MASM does not support INVOKE.
Instead, parameters are passed manually using registers.
ExitProcess parameters go in registers
The exit code is placed in ECX before calling ExitProcess.
No entry point specified in END
The linker handles this differently in 64-bit programs.
Even though the program does the same thing, the rules underneath are different.
Why PROTO Is Still Used (Even in Small Programs)
In 32-bit assembly, you often see function prototypes declared early:
This might seem unnecessary in small programs — but it serves important purposes.
WHY DECLARE FUNCTION PROTOTYPES?
1. Linker and Library Compatibility
The PROTO statement tells the assembler and linker:
The function exists
What it’s called
How it should be used
This ensures the program links correctly with system libraries.
2. Readability and Documentation
Seeing prototypes at the top of a file tells the reader:
What external functions the program depends on
What kind of parameters they expect
This makes code easier to understand and maintain.
3. Error Checking
Prototypes allow the assembler to catch:
Misspelled function names
Incorrect parameter usage
Catching errors early saves debugging time.
4. Modularity
In larger projects:
Functions may be defined in other files
Libraries may be reused
Prototypes make this organization possible.
5. Tool and Assembler Compatibility
PROTO is a standard, portable way to describe function interfaces in MASM-based assembly.
Even if it feels redundant in tiny examples, it becomes essential in real-world programs.
A TRUE 64-BIT ADDTWOSUM PROGRAM
Now let’s look at a fully 64-bit version that actually uses 64-bit registers and data types.
I. AddTwoSum 64-Bit Version
II. What Changed from the 32-Bit Version?
Only a few things — but they matter:
DWORD → QWORD
The variable is now 64 bits wide.
EAX → RAX
The program uses full 64-bit registers.
Everything else — logic, flow, purpose — remains the same.
III. Why This Matters
Using 64-bit registers and variables allows:
Larger numbers
Better performance
Access to memory beyond 4 GB
This is why modern operating systems and applications are almost entirely 64-bit.
UNDERSTANDING 64-BIT REGISTERS
I. What Are Registers?
Registers are tiny, ultra-fast storage locations inside the CPU.
They are much faster than memory and are used constantly during execution.
64-bit processors have 16 general-purpose registers:
II. Why the “R”?
The letter “R” stands for “register” and indicates a 64-bit version of a register.
Examples:
EAX (32-bit) → RAX (64-bit)
EBX → RBX
ECX → RCX
Adding the R means:
Twice the size
Larger values
More memory access
III. Common Register Roles
RAX — often holds function return values
RBX — commonly used as a base register
RSP — stack pointer
RCX — frequently used for function parameters
Registers are flexible, but conventions help keep code readable and correct.
IV. Register Usage Examples
After a function call, the result is typically found in RAX.
Key Concepts Learned So Far (Chapter Summary)
This chapter builds on earlier assembly fundamentals. Key ideas include:
Constants and expressions — fixed values and calculated values
Identifiers and directives — names and assembler commands
Instructions and operands — executable statements and their data
Program segments — code, data, and stack
Assemblers and linkers — tools that turn source code into executables
Data types — BYTE, WORD, DWORD, QWORD, and floating-point types
Data definitions — reserving and initializing memory
Strings and DUP — efficient memory allocation
Little-endian format — how x86 stores data in memory
Symbolic constants — readable, maintainable values
DATA TYPES AND DEFINITIONS IN ASSEMBLY: THE STUFF YOU TOUCH CONSTANTLY
Assembly is a language where every byte matters. Before you can manipulate a variable, the CPU needs to know how much memory to reserve and how to interpret it. That’s exactly what data definitions do: they tell the assembler, “reserve this much space, and this is what it represents.”
I. The Basic Data Types
Here’s a cheat sheet you’ll reference constantly:
Pro tip: Most of the time, you’ll only touch BYTE, WORD, DWORD, QWORD. Floating-point types are for precise math, and FWORD/TBYTE are niche.
II. Data Definition Syntax
In MASM, the general format is:
label → optional name for the variable (what you use to access it)
directive → data type (BYTE, WORD, DWORD, etc.)
value → initial value stored in memory
Examples:
C equivalent:
Notice how assembly explicitly specifies the size in every declaration. No guessing, no magic behind the scenes.
III. Why Size Matters
Memory alignment is crucial in assembly:
CPUs expect variables to start at addresses divisible by their size.
Example: a DWORD (4 bytes) should ideally start at an address divisible by 4.
Misalignment can slow down your code or even cause crashes on strict CPUs.
Example memory layout:
Rule of thumb: always pick the smallest type that safely holds your value. It saves memory, avoids bugs, and keeps arrays aligned.
IV. The DUP Operator – Magic for Repetition
When creating arrays or buffers, you don’t want to write out hundreds of zeros manually. That’s where DUP comes in.
COUNT → how many times to duplicate
value → what to write
Examples:
Tip:
DUP(0) → memory initialized to 0
DUP(?) → uninitialized memory (garbage)
V. Strings in Assembly
Strings are just arrays of bytes:
Each character = 1 byte
Null terminator (0) is optional but needed for APIs expecting C-style strings
Never try to store a string in a DWORD directly. Use BYTE arrays.
You can store a pointer to a string in a DWORD if needed.
VI. Quick Reference
🤔 Takeaway:
In assembly, you’re constantly defining how many bytes something occupies and what those bytes mean. Once you internalize this, the rest — arrays, strings, buffers, numeric operations — becomes predictable.
🤔 Closing thought:
Assembly is all about control and precision. When you define a variable, you’re telling the CPU exactly where it lives, how big it is, and how to interpret it. Mastering data definitions now makes everything else — loops, arithmetic, arrays — much easier later.