Exploring the World,
One Query at a Time
Dive Deeper and Faster Without Lagging with SQL
Project Overview
This project involved the setup and analysis of a comprehensive relational database containing global demographic, geographic, and linguistic data. Using SQL, I executed a series of queries to provide actionable insights for various professional scenarios, such as urban planning, health initiatives, and travel promotions.
Situation
As part of the Day 4 curriculum for the Data Technician course, I was tasked with a practical SQL exercise. The objective was to download and implement a script to create a "world" database and then solve 20 specific data retrieval scenarios ranging from basic counts to complex multi-table joins.
Task
The primary requirements were:
* Database Setup: Successfully implement the world database, which includes tables for city, country, and country language.
* Data Retrieval: Write and execute SQL syntax to answer 20 business-driven questions, ensuring both the syntax and output were documented.
* Analysis: Perform tasks such as identifying the country with the highest life expectancy, counting cities in the USA, and calculating average populations.
Action
* Overcoming Technical Challenges: I initially faced significant hurdles while attempting to install a local SQL environment on my laptop. To ensure the project stayed on schedule, I pivoted and utilized a free online SQL version to host the database and execute my queries.
Schema Implementation: I executed the provided script to build the database architecture. This included the city table (with fields like ID, Name, and Population), the country table (including Code, Continent, and LifeExpectancy), and the country language table.
Query Development: I authored and tested SQL statements for the requested scenarios, including:
Pattern Matching: Finding cities featuring "New" or starting with the "Be" prefix.
Aggregate Functions: Using COUNT(*) to find the number of cities in the USA and AVG(Population) for country averages.
Joins: Implementing INNER JOINs to connect cities with their respective countries to identify European cities and compare capital city populations.
Advanced Logic: Completing a bonus challenge to identify cities with an above-average GDP per capita by joining tables and using subqueries.
Result
I successfully generated a full demographic report that answered all the scenarios. Key outcomes included:
Identified Andorra as the country with the highest life expectancy (83.5 years).
Compiled a list of the top 10 most populous cities for market research.
Provided urban planning committees with a filtered list of mid-sized cities (500k–1M population).
Successfully demonstrated proficiency in referential integrity (linking primary and foreign keys) and complex data manipulation within a relational model.
Database Structure Summary (Tables)
City: Contains 4,079 records of global cities.
Country: Contains 239 records of nations with geographic and economic data.
Countrylanguage: Details the languages spoken and their official status across different nations.
GitHub Repository
Original files are available here
We’ve all been there: waiting for a file to open, watching the loading bar crawl, or having Excel crash just as you’re about to save. While spreadsheets are the go-to for quick calculations, they simply aren’t built to handle the sheer volume of modern data. Once you move past basic tables into hundreds of thousands of rows, you need a tool that doesn't just "manage" data but masters it. This is where SQL takes over. By moving the heavy lifting to a database engine, we can dive deeper and faster without any lagging, turning a sluggish process into a streamlined extraction that responds in milliseconds.
The beauty of SQL isn't just speed; it’s the shift from manual "tweaking" to high-level logic. In a spreadsheet, one wrong click or a hidden filter can quietly corrupt your entire analysis. In contrast, SQL uses clean, repeatable code that ensures total data integrity. It allows us to connect disparate sources and perform complex joins that would make a workbook unreadable. By using SQL for extraction, I ensure that the data is not only pulled quickly but is also accurate, auditable, and ready for high-level decision-making—no "Not Responding" windows required.