This is my capstone project, as part of the Google Data Analytics Course on Coursera. I have used SQL , R and Tableau for different parts of the analysis, as they are covered by the Google course.
In this capstone project, I investigate the top Google search trends in Spain during 2024, leveraging the publicly available Google Trends dataset from Google Market.
This report offers valuable insights for companies seeking to refine their marketing strategies in the Spanish market. By analyzing top search trends, businesses can:
Optimize SEO: Identify relevant keywords to improve search engine visibility.
Innovate Products and Services: Discover emerging trends and consumer interests to develop innovative offerings.
Understand Consumer Behavior: Gain deeper insights into Spanish consumer preferences and behaviors.
This comprehensive analysis provides a bird's-eye view of the Spanish market, making it an ideal starting point for companies entering or expanding their operations in Spain.
The Google Trends dataset, located on BigQuery and named "google_trends," contains partitioned tables with the top 25 global Google search queries and rising queries over the past 30 days, spanning a five-year period.
This dataset, provided by Google, is reliable and secure. It's publicly accessible through Google Market and is anonymized to protect user privacy.
Google normalizes search interest values on a 0-100 scale to ensure fair comparisons across regions and time periods. This prevents bias from regions with higher overall search volume.
This dataset is sufficient to answer the research question without requiring additional data sources.
For data extraction, I used SQL in BigQuery, where the "google_trends" dataset is located. I explored the dataset and downloaded it in CSV format due to its large size, which exceeds the capacity of spreadsheet tools like Excel.
The SQL queries and their result tables are the following:
First query
The Google Trends dataset is a quite large dataset, with information about the top 25 searched terms worldwide. To narrow down the scope of the analysis, I used an SQL query to extract the month and year from the "week" column, trim the searched terms, and aggregate the search scores by term, country, and year (2024). The query also filtered out null values, focusing on scores greater than zero.
The results were grouped by term, year, and month, and ordered by year, month, and total score in descending order to identify the most popular searches. This query output served as the foundation for the initial data analysis in R.
SELECT
EXTRACT(MONTH FROM week) AS month,
EXTRACT(YEAR FROM week) AS year,
TRIM(term),
SUM(score) AS total_score,
FROM `bigquery-public-data.google_trends.international_top_terms`
WHERE
country_name = "Spain"
AND EXTRACT(YEAR FROM week) = 2024
AND score > 0
GROUP BY
term,
year,
month
ORDER BY
year,
month,
total_score DESC
Second query
To refine the analysis, I modified the initial SQL query. While the original query focused on year and month, the revised query included the "week" column to provide a more granular view of search trends.
By removing "year" from the grouping and ordering criteria, the query now groups and sorts the data by searched term and week, offering a more detailed time-based analysis. This adjusted dataset became the foundation for the subsequent R analysis.
SELECT
week,
EXTRACT(MONTH FROM week) AS month,
TRIM(term),
SUM(score) AS total_score,
FROM `bigquery-public-data.google_trends.international_top_terms`
WHERE
country_name = "Spain"
AND EXTRACT(YEAR FROM week) = 2024
AND score > 0
GROUP BY
term,
week
ORDER BY
week,
total_score DESC
To ensure data quality, the following steps were taken:
Duplicate Removal: SQL was used to identify and remove duplicate records.
Irrelevant Data Elimination: R was employed to filter out data that was not relevant to the analysis.
Null Value Handling: SQL was used to identify and address missing values.
Data Cleaning: SQL was used to remove extra spaces and blanks.
Data Validation: The dataset was sourced from a reliable source and was already free of misspellings, typos, and misfiled values.
Missing Value Check: SQL was used to verify the absence of missing values.
Formatting: No additional formatting was required.
I used R and the tidyverse library to preprocess, explore, and organize the data imported from BigQuery in CSV format. The complete R code is available in this link.
I used Tableau for visualising the cleaned data and extracting insights from the visualisations. The final dashboard can be found in this link.
The main trend in Google searches in Spain is "Movistar Plus," with a 30% search share, followed by "Madrid" at 21% and "RTVE" at 16%. Overall, the trend is fairly evenly distributed, with no category exceeding 30% or falling below 15%.
On the other hand, when analyzing the data month by month, it becomes clear that each month has its own trends. Terms like "Copa del Rey" in January or "Carlos Alcaraz" in June and July reflect the interests of the Spanish population at specific times of the year. With very few exceptions, "Movistar Plus" remains the most searched term throughout the year (leading for 8 months).
Among the most searched terms of the year, it is striking that "Antena 3" accounts for 17%, not being a popular term in any month except October, which positions the term as a regular search among the Spanish population. The same happens with "Sevilla." On the opposite side is the term "Horizonte" in November, which only appears in this month and it is the most searched within this timeframe so far.
This general analysis of the most searched terms on Google in Spain in 2024 reflects that "Movistar Plus" is the most popular term, perhaps reflecting inconsistencies in the service of this company and/or its popularity within the Spanish market. Further exploration in this topic's popularity could open a market opportunity for other companies to take over or support Movistar Plus in its endeavours.
The terms "Antena 3" and "RTVE", in third and fourth place respectively, are indicators that there is a portion of the population that consumes the content of these audiovisual media through the internet, highlighting the importance of national television for the Spanish population. The search peaks for Antena 3 are in October, when it premieres new programs (such as “Mask Singer”), and for RTVE in July and August, due to the broadcast of the European Championship, the Olympic Games, and the Tour de France.
Finally, there are two cities that appear as the most searched: Madrid and Sevilla. Whether due to population density, tourist offerings, controversies, or opportunities, it would be optimal to dive deep into the reason for this occurrence.
"Movistar Plus" Dominates Search Interest
"Movistar Plus" consistently ranks as the top search term throughout the year.
This suggests potential service issues or high consumer interest in the platform.
Seasonal Trends in Search Behavior
Certain events and seasons influence search trends. For example, "Copa del Rey" and "Carlos Alcaraz" were popular during specific months.
Summer months see increased interest in sports events like the European Championship, Olympic Games, and Tour de France.
Diverse Search Interests
While "Movistar Plus" dominates, there's a diverse range of other popular search terms, including:
Traditional TV channels like "Antena 3" and "RTVE"
Sports events
Specific individuals like "Carlos Alcaraz"
Geographic locations like "Madrid" and "Sevilla"
Potential for Market Analysis
The data suggests potential market opportunities for companies in the media and entertainment industry.
Analyzing the specific reasons behind the popularity of certain terms can provide valuable insights for businesses.
Regional Variations
The analysis highlights regional differences in search behavior, with cities like Madrid and Sevilla being popular search terms.
These insights bring a new perspective into the evolving preferences of the Spanish market.