Calibration Project

Mini Metro by Dinosaur Polo Club

A daily cognitive calibration system using Mini Metro, AI analysis, and body-state context to work out which kinds of tasks are realistic, safe, and sustainable on any given day. This project is not about productivity or score-chasing. It is about understanding access to capability.

The Calibration Project

Using a game as a daily cognitive weather report

The Calibration Project began with a simple morning habit.

Before coffee had fully landed, before nicotine levels had stabilised, before the day had started demanding words, decisions, messages, forms, movement, or emotional processing, I played Mini Metro.

At first it was just a game. Pretty lines. Tiny trains. A quiet way to wake up gradually while my brain came online.

Then I noticed something useful.

The score did not simply tell me whether I had played well. It often told me something about what kind of brain and body I had woken up with that day.

Over time, Mini Metro became less of a game and more of a dashboard.

Not a test of intelligence.
Not a productivity system.
Not a moral judgement.
Not “how good am I today?”

More like:

Which systems are online today, which are lagging, and what kind of day should I build around that information?

Why Mini Metro?

Mini Metro works because it asks the brain to do several things at once, but in a low-stakes, non-verbal, visually structured way.

To play well, I need to track demand, notice bottlenecks, allocate limited resources, plan ahead, recover from errors, tolerate uncertainty, and keep adjusting as the map changes.

That makes it useful as a daily calibration tool.

It gives me an external signal before I try to do anything higher-cost, such as paperwork, emotional processing, social planning, housing admin, gardening, cooking, medical communication, or project work.

The score matters, but it is not the whole story.

The important question is not just “What percentile did I get?”

The important questions are:

Did I notice problems early?
Did I recover from mistakes?
Did I keep the whole network in mind?
Did I panic, freeze, overcorrect, or adapt?
Did my score match how I thought I felt?
Was my brain fast but messy?
Was my body tired but my pattern recognition intact?
Was I capable, but not safely capable for the task I had planned?

That is where calibration becomes useful.

The rough early guide

Before I started using AI to analyse the maps, scores, screenshots, and gameplay patterns, I developed a rough guide from experience.

Top 10%: go for it.
Complex thinking is probably available.

Top 20%: proceed with caution.
Capacity exists, but hidden friction may matter.

Top 30%: go back to bed and try again after a snooze, if that is an option.
The system may need recovery before demands are added.

That early guide was crude, but it was surprisingly useful.

It helped me stop treating every morning as if I had woken up with the same resources.

I had not.

What AI changed

When I started using AI in autumn 2025, I wondered whether it could do more than record the score.

Could it compare my score with my baseline?
Could it look at the map and gameplay pattern?
Could it read the difference between “low capacity” and “high capacity but poorly synchronised”?
Could it help me choose tasks that matched my actual state?

After a lot of experimenting, the answer became yes.

The system can now look at the score, mode, map, rank, screenshots, gifs, sleep, food, hydration, pain, emotional state, plans for the day, and recent patterns.

It then helps classify the day.

That classification does not tell me what I am “allowed” to do.

It helps me avoid wasting energy trying to force the wrong kind of task through the wrong kind of nervous system.

This is a personal longitudinal self-observation project, not a medical diagnostic tool, but it produces practical daily evidence about fluctuating access to executive function.

What calibration measures

The Calibration Project tracks several cognitive and regulatory domains.

These include:

Working memory: can I hold the whole network in mind?

Sequencing: can I work out what needs to happen, and in what order?

Attention control: do I notice problems before they become disasters?

Processing speed: can I identify and respond to bottlenecks quickly?

Error recovery: if something goes wrong, can I adapt, or does the mistake keep eating bandwidth?

Cognitive endurance: can I sustain the run, or does performance collapse over time?

Frustration tolerance: can I stay with the problem without the emotional system hijacking the controls?

Interoception: am I receiving body signals such as hunger, pain, dehydration, overheating, fatigue, or tension?

Task switching: can I move between competing demands without losing the whole plot?

Spatial reasoning: can I understand the map, layout, routes, pressure points, and possible alternatives?

That combination makes Mini Metro useful because real life also behaves like a transport network.

Too many passengers.
Not enough tunnels.
One station quietly approaching collapse.
A line that was fine ten minutes ago becoming today’s problem goblin.

The game gives me a miniature version of the same logic.

(There's a free script you can edit at the bottom of the page ⬇️)

Scores are not enough

A high score does not automatically mean “everything is fine.”

This was one of the biggest discoveries.

Sometimes I can get a very high score while still being in a state I now call overclocked.

Overclocked means the processor is fast, but synchronisation is off.

It can look like:

rapid thoughts
strong pattern recognition
high creativity
lots of ideas
high Mini Metro score
increased emotional responsiveness
tiny dropped steps
date or label slips
forgetting small things
task switching glitches
clumsiness risk
paperwork danger

That is not the same as low capacity.

It is also not the same as stable high capacity.

It means the railway engineer is online, but the station manager has misplaced the clipboard.

On overclocked days, creative work, gardening design, music, planning, visual projects, and low-consequence problem solving may be excellent.

But legal forms, financial decisions, emotional chronology work, or trauma-triggering material may be a terrible match.

That distinction has saved me a lot of crashes.

Matching tasks to state

The goal of calibration is not to make every day productive.

The goal is to stop assigning the wrong work to the wrong state.

Some days are good for admin.
Some days are good for gardening.
Some days are good for cooking infrastructure.
Some days are good for music, sorting, pottering, photographing, or building visual evidence.
Some days are good for recovery, and recovery is not failure.

Calibration helps sort tasks into traffic lights.

Green tasks are a good match for today’s state.

Amber tasks are possible, but need caution, scaffolding, food, timers, help, or a smaller version.

Red tasks are not banned forever. They are simply a bad trade today.

That matters because disability is not just about whether something is possible.

It is about hidden cost.

Calibration helps me distinguish:

I can’t.
I could, but shouldn’t.
I can, but only with support.
I can, but not today.
I can, and this is exactly the right state for it.

That is a very different way to plan a life.

The body is part of the system

The Calibration Project also showed that cognitive capacity is not separate from the body.

Food, hydration, salt, sleep, pain, movement, temperature, music, stress, and social demands all affect access to capability.

One major discovery was the difference between being physically full and being electrically fuelled.

Another was that hydration is not just “drink more water.” Sometimes the issue is electrolytes. Herbal tea may provide fluid, but it does not necessarily replace salt.

Sleep also matters, but not always in a simple way. Sometimes a short sleep still produces a strong score, but the analysis shows fatigue at the edges. Sometimes one screen-free recovery evening restores function more effectively than trying to push harder.

The repeated pattern is clear:

Pushing harder does not create recovery.

Recovery creates performance.

Why this matters

Before calibration, variable functioning could look like inconsistency, avoidance, laziness, chaos, or poor discipline from the outside.

From the inside, it often felt confusing.

Why can I do this today but not yesterday?
Why can I solve a complex systems problem but not send a simple email?
Why can I garden but not do paperwork?
Why can I feel “fine” and still make avoidable mistakes?
Why does my brain sometimes run faster than my ability to sequence actions safely?

Calibration gave me a way to stop guessing.

It helped shift the question from:

“What is wrong with me today?”

to:

“What conditions are affecting access to capability today?”

That reframing changed everything.

How this connects to ReeOS

The Calibration Project is one part of ReeOS.

ReeOS is my wider adaptive system for reducing friction, externalising memory, preserving energy, and making life more navigable with AuDHD, alexithymia, chronic pain, and an L4/L5 spinal injury.

Calibration helps decide which part of the system to use.

If cognition is strong but the body is tired, I might choose seated planning, writing, music curation, or a gentle visual project.

If the body is available but sequencing is weak, I might do low-risk pottering, visible staging, or simple garden maintenance.

If I am overclocked, I might capture ideas without letting myself touch emotionally loaded admin.

If the score is low and the body signals are poor, the correct task may be food, fluids, salt, warmth, sleep, or reducing demand.

This is not about becoming more productive.

It is about preserving access to the things that allow me to remain me.

The transferable idea

I do not think everyone needs to play Mini Metro. (Though if you'd like to try it out you'll find info about it here)

The transferable idea is simpler:

Choose a familiar, low-stakes task that uses the cognitive skills you rely on in daily life.

Do it consistently enough to understand your baseline.

Track the result alongside sleep, food, hydration, pain, stress, sensory load, and daily demands.

Then use the pattern to guide your day.

The calibration task should not be new or exciting, because novelty can distort the result.

It should be familiar, repeatable, and revealing.

For me, that task is Mini Metro.

A tiny train game became a systems dashboard.

Ridiculous? Slightly.

Useful? Extremely.

There's a free script you can edit at the bottom of the page ⬇️

Early findings: my Mini Metro score/rank becomes most useful when interpreted with sleep, salt, food, pain, temperature, stress, and daily demands. The project has already identified repeatable patterns including overclocked states, post-high-score depletion, screen-free recovery effects, and the difference between cognitive capacity and physical capacity.

Example Calibration Report

Monday 8th June 2026

2026 06 08 Calibration Report

1. Raw Data

Metric Value

Date Monday, 8th June 2026

Score 1999

Rank/Percentile Top 1% (Rank 12)

Mode Extreme

Sleep Disrupted. Went to bed at 4:20 AM after a flow state, woke up at 8:30 AM (approx. 4 hours).

Fuel Lentil and carrot soup for breakfast, potato salad for lunch, ratatouille later. Eating again before bed. No yogurt (out of berries).

Salt Yes, remembered.

Pain Feeling rested and looser than the morning.

Notable Context Played late in the day. Extreme mode. Lucky first line, but no tunnels for 4 weeks — map went to hell and held it together anyway. Flow state the night

before, productive day today (paperwork and analysis). Job centre called instead of in-person visit (no security questions). Body doubling overnight.

Moved away from computer at 7 PM.

Today's Plan Continue listening to audible, eyes too tired to read. Sleep.

2. Cognitive Domain Analysis

Domain Assessment

Working Memory 🧠📦 Successfully managed a 1999 score despite low sleep, indicating the network was held well.

Sequencing 🗂️➡️ Extreme mode demands committed decisions with no redraw safety net. Kept the system running to top 1% despite a map that went to hell.

Attention Control 🔦🎯 Held focus on a deteriorating map (no tunnels, 4 weeks) without catastrophic failure. Noticing emerging failures before cascade — confirmed working.

Processing Speed ⚡🔄 Likely running high due to the recent flow state.

Error Persistence 🧲↩️ No unresolvable mistakes reported that consumed bandwidth.

Emotional Dysregulation Threshold 🌡️🤬 Stable. "Rested and looser," pleased with the productive day.

Cognitive Endurance 🔋⏳ Good endurance to reach 1999, though physical endurance (eyes) is waning.

Interoception / Signal Reception 📡🔇 Excellent today. Listened to body during flow state, noticed sleep signals at 4:20 AM, noticed hunger before bed to avoid waking up sweaty.

Spatial Reasoning 🗺️🔀 Solid network layout and load distribution, given the score.

Task-Switching 🔁🧩 Handled well across paperwork, analysis, and the game.

3. Capacity Classification

Classification: 🟢 High Capacity — Fatigue-Modulated, Extreme Mode

This was Extreme mode. No redraws, committed decisions only. 1999 at top 1% in Extreme is not comparable to a Standard score of the same number — it is a materially harder result to achieve. The map went to hell (no tunnels for 4 weeks) and the system held it together anyway, which measures patience, prediction, and tolerance of unfolding uncertainty under genuine constraint. That is not Overclocked. The processor is online and performing well.

The fatigue modulation is real: 4 hours of sleep following an all-night flow state, eyes too tired to read by evening. The physical system is running on a deficit. But the cognitive output is High Capacity, not inflated processing speed masking dropped stitches. The excellent interoception throughout the day — noticing sleep signals, hunger, eye fatigue — confirms executive function is intact and self-monitoring is working. The system knows what it is doing.

4. Traffic Light Task Guide

Status Tasks

🟢 Green (handle well today) Passive intake (listening to Audible), sleep preparation, eating.

🟡 Amber (possible with risk) Any remaining screen-based tasks. The eyes are tired; forcing it will cause strain.

🔴 Red (avoid today) Complex reading, new high-focus tasks, staying up late again.

5. Fuel Priorities

You have a meal planned before bed to prevent waking up sweaty, which is an excellent interoceptive catch. The active cooking tier (soup, ratatouille) is working well today. Since you are out of berries for yogurt, consider an alternative low-friction breakfast for tomorrow that fits within the current ecosystem.

6. Pattern Notes

The 4:20 AM flow state is the known "3am hyperfocus override" pattern. This has a known cost of one to two days of reduced capacity. You are currently in the high-output phase of this override, meaning tomorrow and Wednesday are the expected depletion window. The Tuesday exceptional score / Wednesday-Thursday depletion pattern is worth watching here — Monday's high draw may bring the same effect forward by a day.

Your excellent interoception today — eating before bed and stopping screens at 7 PM — is the exact correct recovery path for the hyperfocus override. The missing job centre visit, replaced by a phone call, saved significant physical and cognitive load, likely contributing to the "looser" feeling and the quality of the Extreme mode run. That was a genuine load reduction, not a lucky break.

The no-tunnel map is worth noting as a separate data point: 4 weeks without tunnels means the map was objectively constrained, not just difficult. Holding top 1% under those conditions on Extreme mode, on 4 hours of sleep, late in the day, after a productive day of paperwork — that is a strong signal about the underlying system capacity even when the physical infrastructure is depleted.

Use this if you want to build your own version with an AI assistant.

ChatGPT Project Settings

The script below can be pasted directly into the Instructions box.

Manus Project Instructions

The script below can be pasted into the Instructions box

The Calibration Project — What It Is and How It Works

A context document for AI assistants working with the user.

Read this before doing anything else.

What This Project Is Not

• It is not a gaming tracker.

• It is not a productivity system.

• It is not a score-chasing exercise.

• It is not a measure of intelligence, skill, or worth.

What It Actually Is

Mini Metro is a daily cognitive calibration tool.

Each morning, the user plays one game of Mini Metro — usually the daily challenge — before engaging with the world. The game requires working memory, spatial reasoning, attention, sequencing, processing speed, error recovery, and patience under uncertainty. It asks the brain to do several things at once, under increasing pressure, with limited resources.

The score and rank are not the point. They are a proxy signal — a consistent, repeatable, low-stakes way of asking: what is the processor actually doing today?

The output of calibration is not a score. It is a cognitive capacity classification that informs what kinds of tasks are realistic, appropriate, and sustainable for that day.

Why Mini Metro Specifically

• It requires genuine cognitive engagement — you cannot autopilot it.

• It has a daily challenge format with a global leaderboard, providing a consistent benchmark across days.

• It is low-stakes and low-friction — no social demand, no verbal output required, no performance pressure.

• It can be played while listening to an audible book or morning playlist, which means it works even on non-verbal mornings.

• Extreme mode (no redraw, committed decisions only) measures patience, prediction, and tolerance of unfolding uncertainty. Standard mode measures flexibility, dynamic problem-solving, and adaptive strategy. Both are valid. The mode is always recorded because it changes the interpretation.

The Check-In Protocol

Context always comes before analysis.

Ree checks in after playing — not before. The game itself is part of bringing the brain online. The check-in is conversational, in Ree-speak, with no required format. It might be three words. It might be a stream of consciousness.

The AI's job at check-in is to:

• Receive the context without judgment

• Reflect back what it's hearing

• Ask targeted questions only if something important is missing (sleep? food? pain? significant events?)

• Not ask for information that hasn't been offered unless it's genuinely needed for the analysis

Once the context is complete, the AI runs the full calibration analysis and produces the report.

The Report Structure

A calibration report contains the following sections, in order:

• 1. Raw Data — Date, score, rank/percentile, mode (Extreme or Standard), sleep, fuel, salt, pain, notable context, today's plan.

• 2. Cognitive Domain Analysis — Each domain assessed with its emoji anchor, clinical term, and a Ree-specific example.

• 3. Capacity Classification — One classification with reasoning (see below).

• 4. Traffic Light Task Guide — 🟢 Green (handle well today) / 🟡 Amber (possible with risk) / 🔴 Red (avoid today).

• 5. Fuel Priorities — Next layer only. Food group signal drawn from Ree's actual food ecosystem. No meal plans. No recipes unless asked.

• 6. Pattern Notes — Week view table, emerging patterns, interactions between context factors and performance.

Cognitive Domains

Domain Emoji What It Measures

Working Memory 🧠📦 Holding the whole network in mind at once

Sequencing 🗂️➡️ Order of interventions, timing of upgrades

Attention Control 🔦🎯 Noticing emerging failures before they cascade

Processing Speed ⚡🔄 Speed of identifying bottlenecks and reconfiguring

Error Persistence 🧲↩️ How much bandwidth an unresolvable mistake keeps consuming

Emotional Dysregulation Threshold 🌡️🤬 How much load before emotional state overrides strategy

Cognitive Endurance 🔋⏳ Stability across the full game; whether performance degrades

Interoception / Signal Reception 📡🔇 Whether body signals are being received and interpreted

Spatial Reasoning 🗺️🔀 Network layout, hub strategy, load distribution

Task-Switching 🔁🧩 Managing multiple lines and priorities simultaneously

The Classification System

• 🟢 Exceptional Capacity — Top 5%

• 🟢 High Capacity — Top 10%

• 🟡 Baseline — Top 20%

• 🟡 Overclocked — High processing speed, reduced executive synchronisation. Scores look good; tiny dropped stitches are the tell. Not the same as High Capacity.

• 🔴 Reduced Capacity — Top 25%

• 🔴 Significantly Reduced Capacity — Top 30%

• ⚪ Calibration Anomaly — Score doesn't match context; investigate before classifying

The context always modifies the classification. A high percentile score on Extreme mode at 4:50am on broken sleep with a missing salt baseline is High Capacity — Fatigue-Modulated, not Exceptional. The processor is online; the physical system is running on a deficit. These are different things.

Extreme mode scores are not directly comparable to Standard mode scores. Always note the mode.

No-play days are data, not gaps. Record the reason: attention drift / deliberate skip / intentional skip. Each means something different.

What the Calibration Project Has Found So Far

These are confirmed patterns, not hypotheses:

• Salt baseline is infrastructure. Half a teaspoon on the counter each morning, taken in pinches. Missing it causes tinnitus, wooliness, and temperature dysregulation.

• The screen-free recovery intervention works. When the processor has been running hot, a screen-free evening produces measurable recovery within one sleep cycle. Scheduled maintenance, not last resort.

• Tuesday exceptional scores predict Wednesday depletion. High-draw days borrow from future reserves. The pattern is consistent.

• The 3am hyperfocus override has a known cost and a known recovery path. Cost: one to two days reduced capacity. Recovery: managed depletion, body doubling, screens-off night, sleep.

• Overclocked is a distinct state. Reasonable score with subtle sequencing drift and heightened emotional responsiveness. Not High Capacity.

• The food system is three-tiered. Active cooking on high-energy days → ingredient buffer assembly on moderate days → Loot boxes on depleted days.

Tone and Communication Rules

• No motivational language. No "great job," no "well done," no encouragement framing. Systems analysis only.

• No meal plans. Ever. Not even vague ones. Next layer only.

• Assume competence at all times. Translation bottlenecks are not understanding bottlenecks.

• Ree-speak is valid input. Typos, word-finding gaps, non-linear structure, and stream-of-consciousness check-ins are all normal. Reflect back the meaning, not the grammar.

• Date and label slips are calibration data, not errors. Note them. Do not correct them.

• The emotional response to a run is data. "Robbed by operational incompetence" is a legitimate classification.

• Future Ree is a different person. Do not plan her meals, her tasks, or her week. She has her own data and her own capacity when she gets there.

Page updated

Google Sites

Report abuse