I am thrilled to present my project, "Avocado Price Analysis," which I completed as part of the Google Data Analytics Professional Certificate program.
In this project, I utilized SQL, Spreadsheets, and Tableau to analyze and visualize the data efficiently. These tools enabled me to extract, manipulate, and derive meaningful insights from the dataset, uncovering valuable patterns and trends.
The main objective was to investigate the fluctuation of avocado prices over time and determine the influence of seasonality on these variations. Additionally, I explored the relationship between prices at the state level and prices within individual cities.
I discovered notable price spikes in specific years, followed by a subsequent downward trend towards price levels resembling previous years. Surprisingly, I found that seasonality had minimal impact on price fluctuations. Moreover, I observed significant deviations between prices in individual cities and the overall state prices, underscoring the significance of considering local market dynamics when analyzing avocado prices.
I reviewed the dataset to identify and remove any empty or incomplete rows that could impact the accuracy of the analysis.
I scanned the data for duplicate entries, ensuring that each record was unique. If duplicates were found, I eliminated them to avoid skewing the analysis.
I validated the data format to ensure consistency and accuracy. This involved verifying that dates, numbers, and other relevant information were correctly formatted.
To facilitate analysis and visualization, I extracted the month from the date column. Additionally, I converted the month number into text format for better readability.
I examined the "total_volume" column, which represented the sum of avocados and bags. To simplify the analysis, I created a new column specifically for the total number of avocados.
Using the VLOOKUP function, I matched the states in the dataset with their corresponding names in a separate table, ensuring consistency and accuracy in the state names.
As the Great Lakes region encompasses multiple states, I grouped them together under the name "Great Lakes" for better representation and analysis.
Similarly, I grouped states belonging to other regions like Midsouth, Northeast, Northern New England, South Central, Southeast, Total U.S., and West, streamlining the analysis and visualization process.
To examine the quantity differences between avocado types, I employed the COUNTIF function. This allowed me to count the occurrences of each avocado type and identify any variations in quantities.
By performing these data cleaning tasks, I ensured the dataset was accurate, consistent, and ready for further analysis.
I utilized Tableau for visually representing geographic information. With Tableau's powerful visualization capabilities, I created interactive map to visually display the geographical aspects of the data. This allowed for a more intuitive understanding of how avocado prices varied across different regions. By leveraging Tableau's features, I enhanced the overall presentation and communication of the project findings, making it easier for stakeholders to interpret and analyze the geographical patterns of avocado pricing.
Using pivot tables, I was able to gather the necessary visual information. Pivot tables allowed me to summarize and aggregate the data, providing valuable insights into various aspects of avocado pricing. By manipulating the data in pivot tables, I could analyze trends, compare different variables, and generate meaningful statistics.
Furthermore, I utilized the visualization capabilities within Excel to create compelling visual representations of the data. Using charts, graphs, and other visual elements, I presented the key findings in an easily understandable format. These visualizations enhanced the overall clarity and impact of the analysis, making it simpler for stakeholders to grasp the patterns and trends within the avocado pricing data.
In SQL, I identified the states with the highest and lowest prices, extracted the corresponding cities from those states, and retrieved the price data for each city by year. Subsequently, I exported this information into spreadsheets. The extracted information was then seamlessly transferred to spreadsheets, enabling me to conduct in-depth analysis, create visualizations, and draw insightful conclusions about the price dynamics across different cities and States.
This integration of SQL and spreadsheets allowed for a seamless workflow, enabling me to leverage the strengths of each tool for comprehensive data analysis and presentation.
SELECT
state, average_price
FROM `mythic-method-383709.Avocado_project_work.Avocado_price`
WHERE average_price = (SELECT MIN(average_price)
FROM `mythic-method-383709.Avocado_project_work.Avocado_price`
Order BY MIN(average_price)
LIMIT 1)
UNION DISTINCT
SELECT
state, average_price
FROM `mythic-method-383709.Avocado_project_work.Avocado_price`
WHERE average_price = (SELECT MAX(average_price)
FROM `mythic-method-383709.Avocado_project_work.Avocado_price`
Order BY MAX(average_price) DESC
LIMIT 1);
In my SQL query, I included one more state with lowest minimum price to ensure a more comprehensive and fair comparative analysis. This decision was made because only one city from Ohio was represented in the dataset, and including the minimum price from the state helped to provide a more representative and balanced perspective.
SELECT
state, geography, MIN(average_price) AS min_price, MAX(average_price) AS max_price, year
FROM `mythic-method-383709.Avocado_project_work.Avocado_price`
WHERE state IN ("Ohio","Michigan", "California")
GROUP BY geography, state, year
ORDER BY state;
Completing this project was an enriching experience, allowing me to apply my data analytics skills to a real-world scenario. Through the use of SQL, Spreadsheets, and Tableau, I successfully explored the changing trends in avocado prices, evaluated the impact of seasonality, and examined the relationship between state and city prices. This project strengthened my analytical capabilities, honed my problem-solving skills, and provided valuable insights into the dynamic nature of pricing within the avocado industry.
Data source: https://www.kaggle.com/datasets/timmate/avocado-prices-2020