ASM009: DATA REPRESENTATION CONTINUED

DATA ORGANIZATION: THE ENDIANNESS DEBATE 🥚

When you have a number bigger than one byte—like a 32-bit integer—it takes up multiple memory slots. This immediately raises a question:

Which byte goes first in memory?

Do we store the number left-to-right (like how we normally write numbers) or right-to-left? The answer depends on the system, and this is called Byte Ordering, or more formally, Endianness.

Little Endian (The Intel Way)

Most x86 processors (Intel, AMD) use Little-Endian. Here’s what that means:

1. The Rule

The “Little End,” meaning the Least Significant Byte (LSB), goes into the lowest memory address.
The “Most Significant Byte (MSB)” is stored last, at a higher address.

2. Why Little-Endian?

The CPU stores the LSB first because it’s the “least important” byte in the overall number.
This ordering actually makes some math operations more efficient, since addition and other arithmetic start from the LSB.

Think of it like writing numbers on paper backward so you can start calculating immediately with the ones place.

3. Example

Hex number: 12345678h

78 → LSB (“Little End”)
12 → MSB (“Big End”)

Memory layout in Little-Endian:

Notice how it looks reversed in memory—but inside a CPU register, it still looks normal (12345678h). Only the memory order is “scrambled.”

Big Endian (The “Human” Way)

Other systems, like some ARM or MIPS processors, use Big-Endian.

1. The Rule

The “Big End,” meaning the Most Significant Byte (MSB), goes into the lowest memory address.

The LSB comes last.

2. Why Big-Endian?

This looks natural to humans, because the number is stored left-to-right, just like we read or write it on paper.

3. Example

Same number: 12345678h

Memory layout in Big-Endian:

Now memory mirrors the way we normally write numbers, but certain CPU operations may be slightly less convenient.

Why Endianess Matters

If you save a file on a Little-Endian PC and try to read it on a Big-Endian machine without converting it, your numbers can become completely scrambled.

Example: Number 1 is stored in 32 bits: 00 00 00 01 (Little-Endian).

A Big-Endian system might read it as 01 00 00 00, which equals 16,777,216 instead of 1.

Bottom line: Understanding endianness is crucial for:

Porting software across platforms

Networking, where different systems communicate binary data

File formats, where byte order is explicitly defined

BIG IDEAS TO REMEMBER

Little Endian (x86):

LSB comes first (lowest memory address)

Useful for CPU arithmetic

Big Endian:

MSB comes first

Aligns with human reading/writing habits

Memory vs. Register:

Inside a register (like EAX), numbers always appear correctly (12345678h).

The “scrambling” only happens when stored in RAM.

Quick Memory Trick

Little-Endian: Think of it like writing a number “backwards” in memory so the CPU can start calculating from the ones place immediately.

Big-Endian: Think of it like writing numbers normally—humans like it, CPUs don’t mind, but arithmetic can be less convenient.

Declaring Uninitialized Data

I. The .DATA? Directive: Ghost Storage 👻

In earlier notes, we used .DATA to define variables with specific values, like:

This works when you know the exact value your variable should start with. But what happens if you need a massive buffer, like a temporary storage for an image or a large array, and you don’t know the values yet?

That’s where .DATA? comes in.

II. The Concept

.DATA? tells the assembler:

“I want to reserve memory for these variables, but I don’t want to initialize them yet.”

Variables declared here exist in memory at runtime, but the compiled executable doesn’t store any actual values for them.

In other words, your program asks the OS to allocate RAM when it runs, instead of bloating the .exe file with zeros or other initial values.

III. The Magic Behind the Scenes

Here’s what happens under the hood:

Disk vs RAM

.DATA variables with initial values are stored directly in the .exe file. That means the file grows by the size of the initialized data.

.DATA? variables occupy no extra space on disk. The program only tells the OS: “When you run me, please give me this much memory.”

Runtime Allocation

The OS reserves the memory in your process’s Data Segment (part of RAM) when your program starts.

The bytes in .DATA? contain garbage values, meaning whatever bits were previously in that RAM location—so you must initialize them before use.

Why this matters

Using .DATA? is a disk-space optimization. Large arrays can dramatically increase your executable’s size if initialized with .DATA.

It’s also a performance-friendly habit, because loading huge initialized blocks from disk is slower than just letting the OS give you “empty” RAM.

Fat File vs Skinny File: An Example

Let’s compare:

Key insight: .DATA? is like putting a sticky note in memory that says: “Set aside this much space, but I don’t care what’s in it yet.”

Garbage Alert ⚠️

The ? symbol doesn’t zero out memory—it leaves random bits that were already in RAM.
Always initialize variables from .DATA? before reading them, otherwise your program may behave unpredictably.

MIXING CODE AND DATA (THE “MESSY ROOM” METHOD)

MASM allows you to switch between code and data anywhere:

What really happens:

The assembler automatically moves temp to the Data Segment, even if you declared it in the middle of your code.

CPU execution remains linear, and registers store temporary values as expected.

Best Practice

While possible, don’t scatter data declarations in code.

Keep .data and .code separate for clarity and maintainability.

DECLARING TYPES: THE CHEAT SHEET

When using .DATA?, you still need to pick the right size, even if the value is unknown

Remember: .DATA? = “Reserve this space at runtime.”

? = “I don’t care what’s inside yet—initialize it before use!”

Readability Pro-Tips

Writing assembly is hard enough without messy formatting. Follow these rules:

Capitalize directives (.DATA?, .CODE) for clarity.
Indent consistently—makes loops and logic easier to scan.
Use comments liberally—Future-You will thank Present-You.
Clear labels—especially for jumps or memory offsets.

Big Idea to Remember

✅ .DATA? = Runtime Allocation: space is reserved in RAM, not on disk.

✅ ? = Uninitialized: the memory contains random garbage.

✅ Always initialize before reading.

Analogy: .DATA is like packing boxes with items before shipping—takes space in the truck (.exe). .DATA? is like telling the warehouse: “Hold this space for me when I arrive”—the truck stays empty until runtime.

SYMBOLIC CONSTANTS (MAKING ASSEMBLY HUMAN)

Assembly is already low-level and unforgiving. One of the few tools you get to make it readable and maintainable is symbolic constants.

At their core:

A symbolic constant is a name that represents a fixed value that never changes.

Instead of scattering raw numbers and strings all over your code, you give them names. That way:

Your code reads like intent, not math homework
You change a value once, not everywhere
You avoid “magic numbers” that nobody remembers later

I. THE BIG IDEA

Instead of this:

You do this:

Same machine code.

Much better for humans.

If you ever want MAX_VALUE to be 200 instead?

You change one line.

II. WHAT SYMBOLIC CONSTANTS ARE (AND ARE NOT)

✅ What they are

Compile-time substitutions

The assembler literally replaces the name with the value

They do not exist in memory

They do not take space in RAM or the .exe

❌ What they are not

They are not variables

You cannot change them at runtime

You cannot take their address

Think of them as smart text replacement, not storage.

HOW MASM CREATES SYMBOLIC CONSTANTS

MASM gives you three main ways to define symbolic constants:

I) INTEGER EXPRESSIONS ONLY

Used for numbers

Can include expressions

Anywhere you use PIXELS, MASM replaces it with 480000.

II) EQU (NUMBERS OR TEXT)

EQU is more flexible.

Numeric example:

Text example:

What EQU really means:

“Whenever you see this name, replace it with exactly this text.”

So, these two lines are identical to the assembler:

⚠️ Important:

This does not create a string in memory.

It only replaces text at assembly time.

III. TEXTEQU (TEXT ONLY, LONGER TEXT)

Mostly used for macros or long text blocks

Same idea as EQU, just specialized for text

Rare in beginner code, but good to know it exists

IV. WHERE YOU CAN USE SYMBOLIC CONSTANTS

Symbolic constants can be used anywhere MASM expects:

A number
A memory address
A text string (depending on the directive)

Example: This moves 100 into EAX.

V. STRINGS: WHERE CONFUSION USUALLY STARTS 🔥

Let’s clear this up once and for all, because this is where the original notes went sideways.

❌ WRONG IDEA (COMMON MYTH)

“You can store a string in a DWORD for performance.”

❌ This is false.

A DWORD is 4 bytes.

A string like "Hello, world!" is 13 bytes + null terminator.

You physically cannot store a string inside a single DWORD.

VI. THE CORRECT WAY TO STORE STRINGS

✅ Strings are stored as byte arrays

DB = Define Byte
Each character = 1 byte
0 = null terminator (required by Windows APIs)

This creates actual memory holding the characters.

VII. WHY THIS FAILED CODE IS WRONG

Why both are wrong:

MY_TEXT_CONSTANT is text substitution, not data
DWORD expects a 32-bit number, not characters
DB MY_TEXT_CONSTANT expands to:

❌ MASM does not automatically add a null terminator

❌ Still not a real string unless written correctly

VIII. THE RIGHT WAY TO COMBINE CONSTANTS + STRINGS

Option 1: Direct string storage (most common)

Option 2: Use constants for readability

Now you get:

Readable code
Correct memory layout
Proper null-terminated string

IX. THE BIG CONFUSION: DWORD + STRINGS

Here’s the truth bomb that clears everything up:

✅ You cannot store a string in a DWORD, but you CAN store a POINTER to a string in a DWORD.

✅ This is valid and common:

myMessage → actual string bytes
myMessagePtr → address of the string (32-bit pointer)

Windows API calls expect pointers, not strings themselves.

X. WHY MessageBoxA WORKS

You are not passing the string.

You are passing:

The address of the string

That address fits in a DWORD (on 32-bit systems)

This is why the confusion happens.

XI. FINAL SUMMARY

✅ Facts to remember

Symbolic constants:

Exist only at assembly time
Do not allocate memory
Improve readability and maintainability

Strings:

Must be stored using DB

Must be null-terminated

Cannot fit in a DWORD

DWORD + strings:

❌ You cannot store characters in a DWORD

✅ You can store a pointer to a string in a DWORD

XII. BIG IDEA TO REMEMBER 🧠

Constants replace text.

Variables reserve memory.

DWORDs hold numbers or addresses, not characters.

THE EQUAL-SIGN (=) DIRECTIVE: SMART REPLACEMENT 🧠

The equal-sign directive (=) is one of the simplest—and most powerful—tools in MASM.

At a high level:

The = directive associates a symbol name with an integer expression.

That’s it.

No memory.

No runtime behavior.

Just compile-time substitution.

I. What the = Directive Actually Does

When you write COUNT = 500

You are telling the assembler:

“Whenever you see the word COUNT, replace it with the number 500.”

That replacement happens during the assembler’s preprocessing step, before machine code is generated.

This means:

COUNT is not a variable

It does not exist in memory

The CPU will never know COUNT existed

Only the number 500 survives into the final machine code.

II. Step-by-Step: What MASM Really Sees

Let’s say your source file starts like this: COUNT = 500

Ten lines later, you write: MOV EAX, COUNT

What you wrote: mov eax, COUNT

What MASM turns it into internally: mov eax, 500

By the time the assembler is done, the symbol COUNT is completely gone.

🔥 Key insight:

The assembler does textual replacement, not runtime evaluation.

III. What Can the Expression Be?

Although COUNT = 500 is the most common case, the expression can be more complex:

MASM computes the math at assembly time, and replaces BUFFER_SIZE with 400.

This is incredibly useful for:

Array sizes
Offsets
Limits
Configuration values

IV. Why Use Symbols Instead of Literal Numbers?

You could write this: mov eax, 500

But now imagine this number appears:

- 12 times
- In loops
- In comparisons
- In array bounds

Six months later, you realize it should be 600.

Now you’re hunting through code, hoping you don’t miss one.

V. The Real Power: One Change, Everywhere

Using a symbol: COUNT = 500

Later:

Now, when requirements change: COUNT = 600

Reassemble the program.

✅ Every instance of COUNT is automatically replaced with 600

✅ No logic changes

✅ No missed updates

✅ No bugs from inconsistent values

This is maintainability, not convenience.

VI. Why This Matters in Assembly (More Than High-Level Languages)

In high-level languages, constants are common and expected.

In assembly:

Numbers have no meaning on their own

500 could be:

A loop limit
A buffer size
A timeout
A magic value with special meaning

Using symbols gives semantic meaning to raw numbers.

Compare:

Same machine code.

Very different readability.

VII. Important Limitations of =

The equal-sign directive has rules:

✅ Allowed

Integer values

Integer expressions

Arithmetic with other symbols

❌ Not allowed

Strings

Memory definitions

Runtime changes

This will not work:

Why?

Because = is resolved before registers or memory even exist.

VIII. = vs Variables (Critical Distinction)

Let’s compare a symbolic constant:

COUNT = 500

Exists only at assembly time

No memory

Cannot change at runtime

To a variable:

Stored in memory

Can be modified by the program

Takes space in RAM and the executable

🔥 Rule of thumb:

If the value never changes → use =

If the value changes → use .data

IX. BIG IDEA TO REMEMBER 🧠

The = directive is a promise to the assembler, not the CPU.

It says: “Replace this name with this number everywhere, before the program even exists.”

Or even shorter: = makes your code readable.

The assembler does the boring work.

THE CURRENT LOCATION COUNTER $ (WHERE AM I RIGHT NOW?)

Assembly programs don’t magically know where things live in memory.

Someone has to keep track of addresses as code and data are laid out.

That “someone” is the assembler, and the tool it uses is called the:

Current Location Counter (LC)

Also known as the Assembly Pointer (AP)

Represented by the symbol $

I. What the Current Location Counter Really Is

The current location counter ($) is a special symbol that always represents:

“The address in memory where the assembler is currently writing.”

Not where the CPU is executing.

Not where the program is running.

But where the assembler is placing bytes while building your program.

🔥 This is a compile-time concept, not a runtime one.

II. How the Assembler Uses $

When the assembler starts reading your source file:

The location counter starts at 0

As instructions and data are processed:

$ increases
By exactly the number of bytes generated

Example:

As each instruction is assembled:

$ moves forward
Just like a cursor writing bytes into memory

III. $ Is How Labels Get Their Addresses

Every label you write is secretly tied to $.

What the assembler really does is:

“At this moment, $ equals some address.

Associate start with that address.”

So, you can think of a label as:

…but automatically managed for you.

IV. Using $ Directly (The Self-Pointer Trick)

Now let’s look at this line:

This is subtle, clever, and 100% legal.

What’s happening step by step:

The assembler reaches selfPtr
$ currently points to the address where selfPtr will be stored
The value of $ is written into that memory location

Result:

selfPtr contains its own address

In plain English:

“Create a variable, and store the address of that variable inside itself.”

Why this works

DWORD → 4 bytes of storage

$ → the address of the first byte of those 4 bytes

So selfPtr ends up holding a pointer to itself.

This is useful in:

Low-level memory structures
Tables of pointers
Self-describing data layouts

V. Very Important Clarification ⚠️

$ is not a CPU register.
The CPU never sees $
$ does not exist at runtime
$ is resolved during assembly
By the time the program runs:
$ is gone
Only raw numbers (addresses) remain

VI. The Book Analogy (Why This Helps)

Imagine you’re writing a book.

The location counter is the current page number

Every time you write more text, the page number increases

When you say:

“See page 42”

You’re doing the same thing assembly does with $.

In assembly:

Memory = book

Addresses = page numbers

$ = “the page I’m currently writing on”

VII. Symbolic Constants + $ (Power Combo)

You can combine $ with the = directive:

Now, here becomes a symbolic constant equal to the current address.

This is useful for:

Computing offsets
Measuring sizes
Building jump tables
Aligning structures

VIII. Keyboard Definitions (Related but Different)

This example: Esc_key = 27

Is not related to $, but it uses the same idea of symbolic clarity.

Instead of writing:

You write:

Same machine code.

Much clearer intent.

IX. $ Vs Variables (Critical Distinction)

Compare these two:

Symbolic address:

Exists only at assembly time
No memory
No storage

Variable:

Allocates memory
Has a runtime address
Can be modified

🔥 Rule of thumb:

$ tells you where things are

Variables hold what things are

X. DUP OPERATOR CONNECTION

The DUP operator uses symbolic constants, not $, but they often appear together.

Here:

COUNT is resolved first
DUP allocates memory
$ advances as each DWORD is written

Everything still flows through the location counter.

XI. REDEFINING SYMBOLS (ASSEMBLER-TIME ONLY)

This part is crucial and often misunderstood:

What gets assembled is:

Why?

Because:

The assembler reads the file top to bottom
Symbols change value as the assembler processes lines
Runtime execution order does not matter at all

🔥 Assembler time ≠ Runtime

XII. FINAL CLEAN SUMMARY (STICK THIS IN YOUR BRAIN)

The Current Location Counter ($)

The symbol $ represents the current memory address during assembly. It is managed entirely by the assembler and automatically updates as code or data is generated. It can also be used to initialize pointers.

What $ Is

$ exists only at compile time. It is focused on memory addresses and does not appear or exist during program execution.

What $ Is NOT

$ is not a CPU register, not a variable, and not something you can directly modify.

BIG IDEA TO REMEMBER

$ answers one key question:

“Where is the assembler writing right now?”

Once you understand this, concepts like labels, pointers, offsets, and memory layout become much clearer.

Whatever the hell we read in that section😂😂😂

That's what I call real assembly language as a rite of passage, meet the $

ARRAY SIZE CALCULATION WITH THE $ OPERATOR

What does “size of an array” mean in assembly?
When we talk about the size of an array in assembly, we’re usually asking one of two things:
How many bytes does this array occupy in memory?
How many elements does this array contain?

Assembly doesn’t track “arrays” the way high-level languages do. To the assembler, an array is just a block of consecutive bytes in memory. It’s your job to keep track of how big that block is and how you intend to use it.

There are two common ways to do this:

Explicitly specify the size
Let the assembler calculate it using the $ operator

Declaring an array by explicitly stating its size

Example:

This line means:

array is a label (a name for a memory location)
BYTE means each element is 1 byte
16 means reserve 16 bytes of memory

That’s it. Nothing more, nothing less.

Important clarification (this trips people up a lot)

array BYTE 16 does NOT mean:

“The first element has the value 16”
“The array contains the number 16”

It means:

“Allocate 16 bytes of storage starting at the label array.”

The contents of those 16 bytes are uninitialized unless your assembler or environment zero-fills memory (many don’t).

Think of it like this:

You just told the assembler:

“Please set aside 16 empty lockers and call the first one array.”

Accessing elements in the array

Once declared, the label array acts as the base address of that block of memory.

The assembler calculates the correct address automatically:

This works because:

Each element is 1 byte (BYTE)
Offsets are counted in bytes

Initializing an array with values

Now compare that with this:

This declaration means something completely different:

Allocate 4 bytes

Initialize them with these values:

array[0] = 10

array[1] = 20

array[2] = 30

array[3] = 40

So:

This is one of the most important distinctions in assembly:

a single number can mean “size” or “data” depending on context.

Letting the assembler calculate array size using $

What is $?

The $ symbol represents the current location counter (LC).

The LC is the assembler’s internal pointer that always says:

“This is the address where the next byte will be placed.”

As the assembler processes data declarations, $ increases automatically.

What happens here:

The assembler lays down 4 bytes for array
$ now points just past the last byte
Subtracting the starting address (array) gives:

So array_size equals the number of bytes occupied by the array.

⚠️ Critical rule:

array_size must be defined immediately after the array declaration.

If anything else is declared in between, $ will no longer reflect the end of the array.

Calculating the size of a string

Strings are just byte arrays.

This gives you:

The total number of bytes in the string
Including the null terminator (0)

That’s often exactly what you want when working with string routines.

If your string spans multiple lines or declarations, the same rule applies — as long as string_size comes immediately after the final byte.

Arrays of WORDs and DWORDs

So far, everything has been in bytes. But what if your array elements are larger?

WORD arrays (2 bytes per element)

Why divide by 2?

$ - list gives total bytes
Each element is 2 bytes
Dividing converts bytes → elements

Result: list_size = 4

DWORD arrays (4 bytes per element)

ARRAY SIZE & MEMORY UNDERSTANDING

Calculating Number of Elements

$ - list gives the total number of bytes used by the array.

Dividing that value by 4 (for DWORD arrays) gives the number of elements.

KEY IDEAS TO REMEMBER

Assembly’s Perspective

Assembly language does not understand “arrays” as high-level structures. It only works with memory and labels.

Role of BYTE, WORD, DWORD

These directives tell the assembler how much space each element occupies:

BYTE → 1 byte
WORD → 2 bytes
DWORD → 4 bytes

Role of $

$ tells you the current location in memory where the assembler is writing.

Meaning of ($ - label)

($ - label) calculates the total amount of memory used since that label, measured in bytes.

FINAL SUMMARY

Reserving Space Without Initialization

array BYTE 16

This reserves 16 bytes of memory without assigning values.

Initializing Values

array BYTE 10,20,30,40

This reserves 4 bytes and stores the given values.

Measuring Total Size

$ - array

This calculates the total number of bytes used by the array.

Converting Bytes to Element Count

Dividing the total bytes by the size of each element gives the number of elements.

BIG IDEA TO REMEMBER

$ is simply the assembler saying: “Here’s where I am in memory right now.”

Once you internalize this, working with memory layout, arrays, and size calculations becomes much more intuitive.

THE EQU DIRECTIVE: GIVING NAMES TO VALUES (AND IDEAS)

The EQU directive is how you give a name to something in assembly—whether that “something” is a number, a calculation, or even a piece of text.

Think of EQU as a label-maker for constants. Once you define a name with EQU, the assembler treats that name as a stand-in for whatever you assigned to it.

When the assembler later sees that name anywhere in your program, it literally substitutes the value or text you defined. There’s no memory involved, no runtime work—this all happens at assembly time.

THE THREE FORMS OF EQU

There are three valid ways to use EQU, depending on what you want the symbol to represent.

1. name EQU expression

This is the most common form. The expression must evaluate to an integer at assembly time.

Here’s what’s happening:

The assembler evaluates 10 * 10

It gets 100

Everywhere it sees matrixSize, it substitutes 100

So, later code like this:

is treated as if you had written:

This makes your code clearer and easier to change later.

2. name EQU symbol

In this form, you’re defining one symbol in terms of another.

This is useful when:

You want multiple names for the same value
You’re improving readability
You’re abstracting meaning (e.g., “what this address represents”)

Again, substitution happens during assembly—not at runtime.

3. name EQU <text>

This form lets you associate a symbol with arbitrary text, not just numbers.

Important detail:

The assembler does not evaluate this as a number

It simply copies the text wherever the symbol appears

This is especially useful for:

Real-number constants

Strings

Data definitions that don’t evaluate to integers

WHAT EQU REALLY DOES (BEHIND THE SCENES)

A key idea to understand:

EQU does not allocate memory.

It doesn’t reserve space, and it doesn’t create a variable.

It only tells the assembler:

“Whenever you see this name, replace it with this value or text.”

So EQU is purely a compile-time substitution tool.

USING EQU FOR COMPLEX OR TRICKY CALCULATIONS

One of the biggest strengths of EQU is that it lets you define expressions that would be annoying—or error-prone—to calculate by hand.

I. Example: Defining a Stack Address

What’s going on here?

_end is a symbol automatically provided by the assembler
It marks the end of your program’s code/data
You add 1024 bytes to that address
The result becomes the symbolic name stackStart

Now, instead of scattering magic numbers throughout your code, you have a clear, meaningful name that explains why that address exists.

II. One Very Important Rule: EQU Is Immutable

Once you define a symbol with EQU, you cannot redefine it in the same source file.

This is intentional.

Why?

It prevents accidental redefinitions
It guarantees that a symbol always means the same thing
It makes large programs safer and easier to maintain

When you see a name defined with EQU, you can trust that its value never changes.

III. EQU vs = (or ==) — A Crucial Difference

Assemblers often support another directive: = (or ==, depending on the assembler).

EQU: Constant, fixed, immutable

Defined once
Cannot change
Best for true constants

=: Redefinable

The symbol’s value can change

Useful for assembly-time calculations or conditional assembly
More flexible, but also easier to misuse

IV. The Big Picture Difference

If you want stability and clarity, use EQU.

If you need flexibility during assembly, use =.

V. Why EQU Matters in Real Programs

Using EQU properly:

Makes code easier to read
Eliminates “magic numbers”
Reduces bugs caused by inconsistent values
Makes large assembly programs manageable

In short:

EQU lets you name ideas, not just numbers.

And that’s one of the most powerful things you can do in assembly.

TEXTEQU DIRECTIVE

I. Big picture: what problem does TEXTEQU solve?

When you write assembly, you often repeat the same pieces of text:

instruction names (mov, add)
operands (al, eax)
constants
short instruction sequences

TEXTEQU exists so you don’t have to keep retyping those pieces. Instead, you give them a name, and the assembler swaps that name with its associated text before assembly actually happens.

So, the key idea is this:

TEXTEQU creates text substitutions, not variables and not memory.

Nothing is stored at runtime. This is all about helping you and the assembler.

II. What exactly is a text macro?

A text macro is a named chunk of text that the assembler copies and pastes wherever the name appears.

Think of it like:

a global “find and replace”
or a nickname for a piece of assembly code

When the assembler sees the macro name, it replaces it with its text verbatim (after expression evaluation, if any).

THE THREE FORMS OF TEXTEQU

There are three common ways to define a text macro, depending on what you want the macro to represent.

I. Assigning literal text

Here, you’re directly attaching a chunk of text to a name. The angle brackets < > tell the assembler:

“Treat everything inside here as literal text.”

Example:

What this means:

- continueMsg is now a text macro
- Wherever continueMsg appears later, the assembler replaces it with:

This is especially useful for:

prompt messages
repeated strings
long instruction fragments

It makes your code easier to read and easier to change later.

II. Assigning one text macro to another

Now move is simply another name for mov.

This might look pointless at first—but it becomes powerful when you start combining macros to build instructions dynamically.

III. Assigning a constant expression

The % is important. It tells the assembler:

“Evaluate this expression now, then convert the result into text.”

So, this is still a text macro, but its value comes from a calculation.

Building text macros step by step (important example)

Let’s walk slowly through this example and see what the assembler is actually doing.

rowSize is a numeric constant with value 5
count becomes a text macro
The expression (rowSize * 2) is evaluated immediately
The result is 10
So count is replaced with the text 10

At this point: count → 10

Now we add another layer:

Now we combine everything:

The assembler expands this in stages:

move → mov
count → 10

Final expansion:

NB: Am using onecompiler.com for these images together with sharex gradient mode.

So, whenever you write setupAL, the assembler literally sees:

This is a huge deal for readability and maintainability.

Why this matters in real code

Text macros let you:

Avoid magic numbers scattered everywhere
Centralize instruction patterns
Build readable “mini-commands” out of raw assembly
Change behavior in one place instead of dozens

They’re especially useful in:

loops
setup code
repeated register initialization
macro-heavy assembly projects

Redefining text macros (key difference from EQU)

One very important rule:

Text macros defined with TEXTEQU can be redefined.

This is not true for EQU.

That means you can do something like:

From that point forward, mode expands to RELEASE.

This makes TEXTEQU flexible, but it also means:

You must be careful about scope and order

Earlier code uses the old definition

Later code uses the new one

What TEXTEQU is not

To avoid common confusion:

❌ It does not reserve memory

❌ It does not create a runtime variable

❌ It does not generate instructions by itself

It only tells the assembler how to rewrite your source code text before assembling it.

Mental model

If you remember one sentence, make it this:

TEXTEQU is a compile-time text substitution tool that helps you write clearer, reusable assembly code.

No runtime cost. No memory impact. Just smarter source code.

QUESTIONS FOR DATA DEFINITION

This section is all about symbolic constants and text macros in assembly. These tools don’t generate machine code by themselves — instead, they make your programs clearer, safer, and easier to change later.

Think of them as labels for values or text that the assembler substitutes before your program is turned into machine instructions.

1. Declaring a symbolic constant for the Backspace key (ASCII 08h)

In ASCII, each key or character corresponds to a numeric code.

The Backspace character has the hexadecimal value 08h.

Instead of sprinkling 08h throughout your code (which would be hard to read and easy to forget), we give it a meaningful name.

Using the EQU directive

What this means

EQU tells the assembler:

“Whenever you see BackspaceKey, replace it with 08h.”

No memory is allocated.

The value can’t change later.

Why this matters

Your code becomes self-documenting.

If the value ever changes, you only update it in one place.

2. Declaring the number of seconds in a day

A 24-hour day contains:

24 hours
60 minutes per hour
60 seconds per minute

Instead of calculating this manually and hard-coding the result, we let the assembler do the math.

What’s happening

The assembler evaluates the expression at assembly time.
SecondsInDay becomes a constant equal to 86400.

Why this matters

Clear intent: anyone reading the code instantly understands where the number comes from.
No magic numbers.

3. Calculating the size of an array using SIZEOF

Suppose we define an array of words:

This creates:

20 elements
Each element is a WORD (2 bytes)

Instead of manually calculating 20 * 2, we ask the assembler for the size.

What this does

SIZEOF returns the total number of bytes occupied by myArray.
In this case: 40 bytes.

Why this matters

- If you change the array size or type later, ArraySize updates automatically.
- Prevents buffer overruns and logic errors.

4. Redefining keywords using TEXTEQU

Unlike EQU, which replaces values, TEXTEQU replaces text.

Here, we redefine the keyword proc to expand into procedure.

What’s happening

Whenever the assembler sees proc, it substitutes the text procedure.
This happens before assembly, like a macro expansion.

Why this matters

Lets you customize syntax.
Useful for readability, teaching, or matching naming conventions.

5. Creating a string constant with TEXTEQU

We can also use TEXTEQU to define reusable text, such as strings.

Step 1: Define the text macro

Step 2: Use it to define a string in memory

What’s happening

Sample is replaced with the quoted string text.
MyString becomes a byte array initialized with that string.

Why this matters

You can reuse the same string in multiple places.
Changing the text in one spot updates it everywhere.

6. Assigning a full instruction using TEXTEQU

TEXTEQU can even represent entire lines of code, not just single words.

Example: Loading the address of myArray into ESI

What this means

Writing SetupESI anywhere in your code expands to:

Why this matters

Great for setup sequences used repeatedly.
Keeps your code DRY (Don’t Repeat Yourself).
Makes intent clearer: “this sets up ESI.”

7. Big Picture Takeaway

- EQU → symbolic constants (numbers, expressions, sizes)
- SIZEOF → assembler-calculated memory size
- TEXTEQU → text substitution (keywords, strings, instructions)

These tools don’t change how the CPU runs your program, they change how humans understand and maintain it.

MOVING FROM 32-BIT TO 64-BIT ASSEMBLY PROGRAMMING

WHAT THIS MODULE IS ABOUT

This module explains how and why assembly language changes when moving from 32-bit (x86) to 64-bit (x64) programming. While the core ideas of assembly remain the same, the environment, rules, and conventions change in important ways.

64-bit programming isn’t “harder,” but it is different — and understanding those differences early prevents confusion and errors later.

High-Level Differences Between 32-Bit and 64-Bit Programs

Here’s what fundamentally changes when we move to 64-bit assembly:

Registers are wider

32-bit registers hold 32 bits of data.

64-bit registers hold 64 bits, allowing much larger values.

More registers are available

32-bit systems have fewer general-purpose registers.

64-bit systems provide 16 general-purpose registers, giving programmers more flexibility and performance.

More memory can be addressed

32-bit programs are limited to about 4 GB of memory.

64-bit programs can access far more memory, which is critical for modern applications.

Calling conventions change

How functions receive parameters and return values is different in 64-bit mode.

These differences affect how code is written, even when the program logic stays the same.

Understanding the AddTwoSum Example (32-Bit vs 64-Bit)

I. Why Use AddTwoSum?

The AddTwoSum program is a small, simple example that makes it easy to see the differences between 32-bit and 64-bit assembly without distractions.

The program:

1. Loads two numbers

2. Adds them

3. Stores the result

4. Exits cleanly

Same idea — different rules.

II. The 32-Bit Version (Overview)

In 32-bit assembly, the program includes several directives and conventions that are required for that environment:

What’s Going On Here?

.386 specifies the processor type

.model flat, stdcall defines the memory model and calling convention

.stack 4096 sets up stack space

INVOKE automatically handles parameter passing

The entry point (main) is explicitly specified

This structure is normal and necessary in 32-bit MASM.

III. The 64-Bit Version (Key Differences)

Now compare it with the 64-bit version:

What Changed — and Why?

No .386, .model, or .stack directives

These are not used in 64-bit MASM.

No INVOKE instruction

64-bit MASM does not support INVOKE.

Instead, parameters are passed manually using registers.

ExitProcess parameters go in registers

The exit code is placed in ECX before calling ExitProcess.

No entry point specified in END

The linker handles this differently in 64-bit programs.

Even though the program does the same thing, the rules underneath are different.

Why PROTO Is Still Used (Even in Small Programs)

In 32-bit assembly, you often see function prototypes declared early:

This might seem unnecessary in small programs — but it serves important purposes.

WHY DECLARE FUNCTION PROTOTYPES?

1. Linker and Library Compatibility

The PROTO statement tells the assembler and linker:

The function exists

What it’s called

How it should be used

This ensures the program links correctly with system libraries.

2. Readability and Documentation

Seeing prototypes at the top of a file tells the reader:

What external functions the program depends on

What kind of parameters they expect

This makes code easier to understand and maintain.

3. Error Checking

Prototypes allow the assembler to catch:

Misspelled function names

Incorrect parameter usage

Catching errors early saves debugging time.

4. Modularity

In larger projects:

Functions may be defined in other files

Libraries may be reused

Prototypes make this organization possible.

5. Tool and Assembler Compatibility

PROTO is a standard, portable way to describe function interfaces in MASM-based assembly.

Even if it feels redundant in tiny examples, it becomes essential in real-world programs.

A TRUE 64-BIT ADDTWOSUM PROGRAM

Now let’s look at a fully 64-bit version that actually uses 64-bit registers and data types.

I. AddTwoSum 64-Bit Version

II. What Changed from the 32-Bit Version?

Only a few things — but they matter:

DWORD → QWORD

The variable is now 64 bits wide.

EAX → RAX

The program uses full 64-bit registers.

Everything else — logic, flow, purpose — remains the same.

III. Why This Matters

Using 64-bit registers and variables allows:

Larger numbers

Better performance

Access to memory beyond 4 GB

This is why modern operating systems and applications are almost entirely 64-bit.

UNDERSTANDING 64-BIT REGISTERS

I. What Are Registers?

Registers are tiny, ultra-fast storage locations inside the CPU.

They are much faster than memory and are used constantly during execution.

64-bit processors have 16 general-purpose registers:

II. Why the “R”?

The letter “R” stands for “register” and indicates a 64-bit version of a register.

Examples:

EAX (32-bit) → RAX (64-bit)
EBX → RBX
ECX → RCX

Adding the R means:

Twice the size
Larger values
More memory access

III. Common Register Roles

RAX — often holds function return values
RBX — commonly used as a base register
RSP — stack pointer
RCX — frequently used for function parameters

Registers are flexible, but conventions help keep code readable and correct.

IV. Register Usage Examples

After a function call, the result is typically found in RAX.

Key Concepts Learned So Far (Chapter Summary)

This chapter builds on earlier assembly fundamentals. Key ideas include:

Constants and expressions — fixed values and calculated values

Identifiers and directives — names and assembler commands

Instructions and operands — executable statements and their data

Program segments — code, data, and stack

Assemblers and linkers — tools that turn source code into executables

Data types — BYTE, WORD, DWORD, QWORD, and floating-point types

Data definitions — reserving and initializing memory

Strings and DUP — efficient memory allocation

Little-endian format — how x86 stores data in memory

Symbolic constants — readable, maintainable values

DATA TYPES AND DEFINITIONS IN ASSEMBLY: THE STUFF YOU TOUCH CONSTANTLY

Assembly is a language where every byte matters. Before you can manipulate a variable, the CPU needs to know how much memory to reserve and how to interpret it. That’s exactly what data definitions do: they tell the assembler, “reserve this much space, and this is what it represents.”

I. The Basic Data Types

Here’s a cheat sheet you’ll reference constantly:

Pro tip: Most of the time, you’ll only touch BYTE, WORD, DWORD, QWORD. Floating-point types are for precise math, and FWORD/TBYTE are niche.

II. Data Definition Syntax

In MASM, the general format is:

label → optional name for the variable (what you use to access it)

directive → data type (BYTE, WORD, DWORD, etc.)

value → initial value stored in memory

Examples:

C equivalent:

Notice how assembly explicitly specifies the size in every declaration. No guessing, no magic behind the scenes.

III. Why Size Matters

Memory alignment is crucial in assembly:

CPUs expect variables to start at addresses divisible by their size.
Example: a DWORD (4 bytes) should ideally start at an address divisible by 4.
Misalignment can slow down your code or even cause crashes on strict CPUs.

Example memory layout:

Rule of thumb: always pick the smallest type that safely holds your value. It saves memory, avoids bugs, and keeps arrays aligned.

IV. The DUP Operator – Magic for Repetition

When creating arrays or buffers, you don’t want to write out hundreds of zeros manually. That’s where DUP comes in.

COUNT → how many times to duplicate

value → what to write

Examples:

Tip:

DUP(0) → memory initialized to 0

DUP(?) → uninitialized memory (garbage)

V. Strings in Assembly

Strings are just arrays of bytes:

Each character = 1 byte
Null terminator (0) is optional but needed for APIs expecting C-style strings
Never try to store a string in a DWORD directly. Use BYTE arrays.

You can store a pointer to a string in a DWORD if needed.

VI. Quick Reference

🤔 Takeaway:

In assembly, you’re constantly defining how many bytes something occupies and what those bytes mean. Once you internalize this, the rest — arrays, strings, buffers, numeric operations — becomes predictable.

🤔 Closing thought:

Assembly is all about control and precision. When you define a variable, you’re telling the CPU exactly where it lives, how big it is, and how to interpret it. Mastering data definitions now makes everything else — loops, arithmetic, arrays — much easier later.

Page updated

Google Sites

Report abuse