April, 2022
The project is resulting from the material of Coding Club tutotial as a part of their Data Science course.
There's nothing special in it as for the actions performed within the app - it shows the histogram of Sepal Lengths distribution and some statistics in Iris species taken from Iris dataset of R. But made me proud and satisfied of developing a thing I understood from the beginning till deployment:)
The project is made in R Shiny package using RStudio and published at shinyapp.io
March-April, 2022
As Kaggle states at the competition description page:
"This is the legendary Titanic ML competition – the best, first challenge for you to dive into ML competitions and familiarize yourself with how the Kaggle platform works.
The competition is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck."
In this analysis the following questions were asked:
What is the relationship the features and a passenger’s chance of survival.
Prediction of survival for the entire ship.
The report contains data elaboration, preparation, exploration, and visualizations steps. The paper ends up with the summary of the outcomes produced by linear regression and decision tree modelling.
The notebook itself is available here.
Tools used:
Kaggle Jupyter Notebook for initial coding
RStudio for data cleansing, transformation, and shaping out
RMarkdown (R) for getting a combined .pdf report file
ggplot2, randomForrest, rpart and other R packages within the process of data transformation and modelling
February, 2022
The project is based on an open-source dataset taken for course graduation as Divvy Bikes Case Study from Google Data Analytics Certificate (Coursera) from here (some data bucket at AWS).
The main task set prior to the project implementation was to get an understanding of how the data is being processed and elaborated in various tools and applications with user-friendly interface and almost no coding (except SQL part of the assignment).
Later on, I decided to reproduce almost the same effort but in MS Excel + R combo. But that is another story...:)
Tools used:
MS Excel for data cleansing and transformation
BigQuery (SQL) for getting a combined dataset
Google Spreadsheets for basic visuals preparation
Looker for more comprehensive and user-friendly visualizations
Google Slides for wrapping it all up into a presentation of a job done
February, 2022
The project is based on a personally elaborated dataset taken for course graduation as Divvy Bikes Case Study from Google Data Analytics Certificate (Coursera) from here (my dataset at Kaggle).
After having some experience with dataset processing in Capstone Project v1, and trying some RStudio options, I decided to reproduce almost the same effort but in MS Excel + R combo.
I have opened myself a power of R and fast data processing of a dataset with 5,6M+ rows (observations called by professionals) in almost no time and felt in love with R data-wise.
The project is made in R Markdown using Jupyter Notebook and published at Kaggle.
Tools used:
MS Excel for data cleansing and transformation
R/RStudio for basic data processing and script-polishing purposes
Jupyter Notebook at Kaggle for publishing the work in Markdown