This Is the Practical exam submission for my Data Scientist Professional certification. This stage of the certification is graded manually and stringently by Datacamp's data scientist experts. The practical exam is split into two parts.
Technical report
This can be considered how my work is being presented to show how the task has been approached, why certain actions were taken, and how the work helps to solve the problem defined.
Non-technical presentation
The final stage was to adapt the information towards a non-technical audience. It is a common requirement for data scientists to have to present their work to others who have no background in data science. These audiences are interested in why the work was done and what the outcome was, typically not how it was done.
Technologies used:
Interpreted Business Problem into Data problem; Wrangled data and statistical analysis with Pandas; Visualised with Matplotlib; Built and evaluated 2 Classification Models with Scikit-learn; Created a Non-technical presentation for Stakeholders.
This Is the Practical exam submission for my Data Scientist Associate certification. This stage of the certification is graded manually and stringently by Datacamp's data experts. I complete a written report that addresses a business problem. In this report, I selected appropriate visualizations, fitted and evaluated a model, and effectively defended my decisions.
Technologies used:
Interpreted Business Problem into Data problem; Wrangled data and statistical analysis with Pandas; Visualised with Matplotlib; Built and evaluated 2 Predictive Models with Scikit-learn;
According to the United Nations, Climate change refers to long-term shifts in temperatures and weather patterns. Such shifts can be natural, due to changes in the sun’s activity or large volcanic eruptions. But since the 1800s, human activities have been the main driver of climate change, primarily due to the burning of fossil fuels like coal, oil, and gas.
The consequences of climate change now include, among others, intense droughts, water scarcity, severe fires, rising sea levels, flooding, melting polar ice, catastrophic storms, and declining biodiversity.
The dataset was cleaned, combined and analyzed to create a report on the state of climate change in Africa. I also provide insights on the impact of climate change on African regions (with four countries, one from each African region, as case studies).
Technologies used:
Data collection:
The data source was the EDGARv7.0_GHG website provided by Crippa *et. al. (2022) and with DOI.*
Data exploratory analysis and visualization:
Python's pandas
Matplotlib & Seaborn
Machine Learning
Python's Scikit-learn
Version Control
Github link here
The rewards the stock market brings to investors who know how to read it right are enormous while its penalties can be grievous to those who are ignorant, careless or unlucky. Nevertheless, investors are faced with the problem of knowing the right stock to buy and when to buy it. As a result, technical analysis was used in this project in trying to solve this.
The technical analysis method involves visualization (mostly graphical representation) of the stocks trading history like price changes, volume of transactions, etc to determine the trend of the stocks and then making predictions on the future of the stocks based on insights drawn from the visualization. Also with machine learning, I attempt to predict future prices. Finally deploying with Python's Streamlit via Heroku.
The GitHub repo click here
The Streamlit deployment click here
Technologies used:
Data collection:
yahoo finance API via Python's Requests
Data exploratory analysis and visualization:
Python's pandas
Matplotlib, Plotly & Seaborn
Machine Learning
Python's Scikit-learn
Deployment
Python's Streamlit and Heroku
This is the capstone project to the Google Data Analytics Certificate.
Scenario
I am a junior data analyst working in the marketing analyst team at Cyclistic company, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, my team wants to understand how casual riders and annual members use Cyclistic bikes differently . From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
Technologies used
Data cleaning
Spreadsheets
R
Exploratory data analysis and visualize
R's Tidyverse and ggplot
SQL analysis on Google BigQuery
Data Visualization
Tableau
Notebook on Kaggle below.
Analysis of top 20 coins on the website. This involves web-scrapping coinmarketcap website for updated data on top 20 coins, visualizing with Pandas and Plotting with Matplotlib.
Github repo click here
Technologies used
Data collection
Web-scraping with Python's Selenium and Beautifulsoup
Exploratory data analysis and visualize
Python's pandas
Data Visualization
Python's matplotlib
Google Play Store apps and reviews
Mobile apps are everywhere. They are easy to create and can be lucrative. Because of these two factors, more and more apps are being developed. In this notebook, we will do a comprehensive analysis of the Android app market by comparing over ten thousand apps in Google Play across different categories. We'll look for insights in the data to devise strategies to drive growth and retention.
Let's take a look at the data, which consists of two files:
apps.csv: contains all the details of the applications on Google Play. There are 13 features that describe a given app.
user_reviews.csv: contains 100 reviews for each app, most helpful first. The text in each review has been pre-processed and attributed with three new features: Sentiment (Positive, Negative or Neutral), Sentiment Polarity and Sentiment Subjectivity.
GitHub repo click here
Technologies used
Data cleaning
Python's Pandas
Exploratory data analysis and visualize
Python's Pandas
Data Visualization
Python's Matplotlib
The Nobel Prize is perhaps the world's most well known scientific award. Except for the honor, prestige and substantial prize money the recipient also gets a gold medal showing Alfred Nobel (1833 - 1896) who established the prize. Every year it's given to scientists and scholars in the categories chemistry, literature, physics, physiology or medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time the Prize was very Eurocentric and male-focused, but nowadays it's not biased in any way whatsoever. Surely. Right?
Well, we're going to find out! The Nobel Foundation has made a dataset available of all prize winners from the start of the prize, in 1901, to 2016.
GitHub repo click here
Technologies used
Data cleaning
Python's Pandas
Exploratory data analysis and visualize
Python's Pandas
Data Visualization
Python's Matplotlib
With almost 30k commits and a history spanning over ten years, Scala is a mature programming language. It is a general-purpose programming language that has recently become another prominent language for data scientists. Scala is also an open source project. Open source projects have the advantage that their entire development histories -- who made changes, what was changed, code reviews, etc. -- are publicly available. We're going to read in, clean up, and visualize the real world project repository of Scala that spans data from a version control system (Git) as well as a project hosting site (GitHub). We will find out who has had the most influence on its development and who are the experts. The dataset I will use, which has been previously mined and extracted from GitHub, is comprised of three files:
pulls_2011-2013.csv contains the basic information about the pull requests, and spans from the end of 2011 up to (but not including) 2014.
pulls_2014-2018.csv contains identical information, and spans from 2014 up to 2018.
pull_files.csv contains the files that were modified by each pull request.
GitHub repo click here
Technologies used
Data cleaning
Python's Pandas
Exploratory data analysis and visualize
Python's Pandas
Data Visualization
Python's Matplotlib