Demystifying MySQL Statistics: The Secret Sauce for Query Performance

Have you ever wondered how the MySQL query optimizer almost magically knows the fastest way to retrieve your data? The answer lies in a critical, yet often overlooked, component: database statistics.

Think of MySQL statistics as a detailed "map" of your data. Without this map, the optimizer is driving blind, making poor decisions that lead to painfully slow queries. In this blog post, we'll pull back the curtain on MySQL statistics. We'll explore how they are collected automatically, how you can step in manually, and how to troubleshoot when this system goes awry.

What Are MySQL Statistics?

In the context of a database, statistics are metadata about the data stored in your tables. They don't contain the actual data, but they describe it. The key pieces of information include:

Cardinality: The number of unique values in an index. This is arguably the most important statistic. A column with high cardinality (like a user_id) is a great candidate for an index, while one with low cardinality (like a gender column with 'M'/'F') is less so.
Number of Rows (TABLE_ROWS): An approximate count of rows in the table.
Index Distribution: How the data is distributed across the index keys (histograms, which we'll discuss later).
Data Size: Information about the physical storage size of tables and indexes.

The Query Optimizer uses this "map" to estimate the cost of different query execution plans (e.g., full table scan vs. using an index) and chooses the one it deems cheapest.

How MySQL Collects Statistics Automatically

For the InnoDB storage engine (the default and most common engine), statistics are primarily automatic. This automation ensures the optimizer's "map" doesn't become too outdated.

1. Persistent Statistics (The Default)

Since MySQL 5.6, the default behavior is to store statistics persistently to disk so they survive server restarts. This is controlled by the innodb_stats_persistent setting (ON by default).

How it Works: The statistics are stored in the mysql.innodb_table_stats and mysql.innodb_index_stats tables.
When is it Updated?
- Auto-recalc: When ~10% of the rows in a table have been modified (INSERT, UPDATE, DELETE, TRUNCATE).
- Background: The update is triggered automatically in the background. You can control the recalc threshold with innodb_stats_auto_recalc, but it's generally best to leave it enabled.
- On Server Startup: Statistics are loaded from the persistent storage.

2. Non-Persistent Statistics

This is the older method, where statistics were stored only in memory and were lost on server restart. You can enable it by setting innodb_stats_persistent=OFF, but this is not recommended for production systems as it can lead to inconsistent performance after a restart.

3. Histograms (A Powerful Addition)

Introduced in MySQL 8.0, histograms provide the optimizer with even more detailed information about the data distribution within a column, even if that column isn't indexed.

What they do: A histogram "buckets" the data, showing how many rows fall into specific value ranges. This is incredibly useful for WHERE clauses on non-indexed columns or for estimating join selectivity more accurately.
Automatic? No, histograms are not created or updated automatically. They must be created and updated manually using ANALYZE TABLE (more on this below).

How to Collect Statistics Manually

Sometimes, the automatic process isn't enough. If you've just loaded a massive amount of data, or you notice a specific query suddenly performing poorly, manual intervention is your best friend.

1. The ANALYZE TABLE Command

This is the primary tool for manually updating table statistics.

ANALYZE TABLE your_table_name;

What it does:

Recalculates the index cardinality and other stats for the table.
If the table has a histogram defined, it will also update the histogram data.
It's a blocking operation that takes a read lock on the table (usually very fast, but be cautious on huge, busy tables).

Example Output:

+-----------------------+---------+----------+----------+

+-----------------------+---------+----------+----------+

+-----------------------+---------+----------+----------+

2. Creating and Managing Histograms

Since histograms are manual, here's how you work with them:

Create a histogram on a column:

ANALYZE TABLE your_table_name UPDATE HISTOGRAM ON column1, column2;

Drop a histogram:

ANALYZE TABLE your_table_name DROP HISTOGRAM ON column1;

Check existing histograms:
You can find information about existing histograms in the information_schema.COLUMN_STATISTICS table.

SELECT * FROM information_schema.COLUMN_STATISTICS WHERE TABLE_NAME = 'your_table_name';

How to Check the Last Statistics Gathered

You have a couple of reliable ways to check when statistics were last updated.

1. Check information_schema.TABLES

The UPDATE_TIME column shows the last time the table's data file was modified. A recent ANALYZE TABLE will update this time.

SELECT TABLE_NAME, UPDATE_TIME

FROM information_schema.TABLES

WHERE TABLE_SCHEMA = 'your_database_name'

AND TABLE_NAME = 'your_table_name';

2. Check Persistent Stats Tables (Most Accurate)

For the most direct information, query the persistent statistics tables.

-- For table-wide stats

SELECT * FROM mysql.innodb_table_stats

WHERE table_name = 'your_table_name';

-- For index-specific stats, including the last update time

SELECT index_name, stat_name, value, last_update

FROM mysql.innodb_index_stats

WHERE table_name = 'your_table_name';

The last_update column tells you exactly when the statistics for that index were last recalculated.

Troubleshooting Performance Degradation from Missing/Stale Statistics

You're seeing a query that used to be fast but is now slow. How do you confirm and fix a statistics issue?

Step 1: Identify the Suspect Query

Use MySQL's performance schema or the slow query log to pinpoint the problematic query.

Step 2: Compare the Execution Plan (EXPLAIN)

The EXPLAIN command is your most powerful tool. It shows you the execution plan the optimizer has chosen.

Run EXPLAIN on the slow query.

EXPLAIN SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped';

Pay close attention to:
- possible_keys: Which indexes could be used?
- key: Which index is actually being used? If it's not using the index you expect, statistics are a prime suspect.
- rows: The optimizer's estimate of how many rows it will examine. If this number is wildly different from reality (e.g., it estimates 100 rows but you know it will scan 1 million), your statistics are stale.

Step 3: The Classic "Fix and Compare" Method

Run ANALYZE TABLE: Update the statistics for the table(s) involved in the query.

ANALYZE TABLE orders;

Run EXPLAIN again: Immediately run the EXPLAIN command again. Has the key changed? Has the rows estimate become more accurate?
Test the query: Does the query now run fast?

If the performance is restored, you've successfully diagnosed a stale statistics issue.

Step 4: For Complex Cases, Consider a Histogram

If ANALYZE TABLE didn't help, and your WHERE clause uses a non-indexed column or has complex range conditions, a histogram might be the solution.

-- Create a histogram on the 'status' column

ANALYZE TABLE orders UPDATE HISTOGRAM ON status;

Run EXPLAIN and test the query again.

Step 5: Force an Index (Last Resort)

If all else fails and you are absolutely certain you know a better plan, you can force the optimizer to use a specific index. Use this with extreme caution, as it's a band-aid and can break if your data distribution changes later.

SELECT * FROM orders FORCE INDEX (idx_customer_id) WHERE customer_id = 123 AND status = 'shipped';

Pro Tip: Make it a Routine

Don't wait for performance to degrade. Consider adding ANALYZE TABLE operations to your regular maintenance windows, especially for large, frequently updated tables. For critical reports run on a data warehouse, updating statistics right after a major data load is a best practice.

Conclusion

MySQL statistics are the silent workhorses of database performance. By understanding how they are collected automatically and knowing how to intervene manually with ANALYZE TABLE and histograms, you empower yourself to keep your queries running at lightning speed.

Remember the troubleshooting mantra: EXPLAIN -> ANALYZE TABLE -> EXPLAIN again. This simple workflow will resolve a significant number of mysterious performance problems.

Page updated

Google Sites

Report abuse