If you work with data, even occasionally, understanding databases is non-negotiable. Not because you need to become a database administrator, but because knowing what's happening under the hood changes how you think about data entirely.
So, what even is a database?
Think of a database as a very organised container for data. Instead of hunting through a pile of Excel files every time you need a figure, you store everything in one place, structured into neat tables with rows and columns, and then you query it to get exactly what you need.
The reason databases are so widely used comes down to two things: they can handle enormous amounts of data, and they're secure. Once your data grows beyond what a spreadsheet can comfortably handle, a database is the obvious next step.
To communicate with a database, you use SQL, Structured Query Language. It's the language you write to ask the database questions, or to change what's inside it.
The stack: database, software, and hardware
When people talk about a database setup, there are really three layers to understand:
The database itself stores all the data. A Database Management System (DBMS) sits on top of it and handles every incoming request, whether that's from a developer writing SQL, an app fetching data, or a BI tool like Tableau or Power BI running queries in the background. The DBMS also manages security, deciding which queries are allowed to run. And then there's the hardware, typically a dedicated server, which is essentially a much more powerful PC that runs around the clock. These days, that server is often in the cloud.
Not all databases look the same
When most people say "database," they picture a relational database — tables, rows, columns, and defined relationships between those tables. That's the most common type, and it's the one SQL is built for.
But there are others worth knowing about:
Relational: Tables with rows & columns. Uses SQL. The standard.
Key–value: A key paired with its value.
Column-based: Data grouped by column. Great for large-scale search.
Graph: Focuses on relationships between data points.
Document: Stores data as flexible, loosely structured documents.
Everything except the relational type falls under the umbrella of NoSQL databases.
How data analysis actually works
Once you're working with a real database, the analytical process typically follows three stages:
Explore: Get familiar with your tables, understand what the columns mean, and get a feel for the data.
Prepare: Raw data is almost always messy: missing values, mismatched types, tables that need to be joined together. This stage is the most time-consuming by far.
Analyse: Once the data is clean and shaped correctly, you actually dig in: aggregations, segmentations, rankings, and so on.
The three types of SQL commands
SQL commands aren't all doing the same kind of thing. They're grouped into three categories based on what they're used for:
Data Definition Language (DDL): CREATE, ALTER, DROP. These commands define the structure of your database.
Data Manipulation Language (DML): INSERT, UPDATE, DELETE. These commands deal with the data inside the tables.
Data Query Language (DQL): SELECT. This is where the analysis happens. There's really only one command here, but it does a lot.
These three categories form the complete foundation of SQL. Master what each one is for, and everything that comes after, joins, subqueries, aggregations, becomes much easier to place.