Data Analytics

Training examples of data analytics and insights production efforts

R-based Shiny App (learning purpose)

Coding Club Shiny Tutorial App

April, 2022

The project is resulting from the material of Coding Club tutotial as a part of their Data Science course.

There's nothing special in it as for the actions performed within the app - it shows the histogram of Sepal Lengths distribution and some statistics in Iris species taken from Iris dataset of R. But made me proud and satisfied of developing a thing I understood from the beginning till deployment:)

The project is made in R Shiny package using RStudio and published at shinyapp.io

Kaggle Titanic Survival Prediction

Titanic Analysis

March-April, 2022

As Kaggle states at the competition description page:

"This is the legendary Titanic ML competition – the best, first challenge for you to dive into ML competitions and familiarize yourself with how the Kaggle platform works.

The competition is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck."

In this analysis the following questions were asked:

What is the relationship the features and a passenger’s chance of survival.
Prediction of survival for the entire ship.

The report contains data elaboration, preparation, exploration, and visualizations steps. The paper ends up with the summary of the outcomes produced by linear regression and decision tree modelling.

The notebook itself is available here.

tk_titanic_analysis.pdf

Tools used:

Kaggle Jupyter Notebook for initial coding
RStudio for data cleansing, transformation, and shaping out
RMarkdown (R) for getting a combined .pdf report file
ggplot2, randomForrest, rpart and other R packages within the process of data transformation and modelling

Google Data Analytics Certificate

Capstone Project v1

February, 2022

The project is based on an open-source dataset taken for course graduation as Divvy Bikes Case Study from Google Data Analytics Certificate (Coursera) from here (some data bucket at AWS).

The main task set prior to the project implementation was to get an understanding of how the data is being processed and elaborated in various tools and applications with user-friendly interface and almost no coding (except SQL part of the assignment).

Later on, I decided to reproduce almost the same effort but in MS Excel + R combo. But that is another story...:)

capstone_tk_bikes_20220211 – копія

Tools used:

MS Excel for data cleansing and transformation
BigQuery (SQL) for getting a combined dataset
Google Spreadsheets for basic visuals preparation
Looker for more comprehensive and user-friendly visualizations
Google Slides for wrapping it all up into a presentation of a job done

Capstone Project v2

February, 2022

The project is based on a personally elaborated dataset taken for course graduation as Divvy Bikes Case Study from Google Data Analytics Certificate (Coursera) from here (my dataset at Kaggle).

After having some experience with dataset processing in Capstone Project v1, and trying some RStudio options, I decided to reproduce almost the same effort but in MS Excel + R combo.

I have opened myself a power of R and fast data processing of a dataset with 5,6M+ rows (observations called by professionals) in almost no time and felt in love with R data-wise.

The project is made in R Markdown using Jupyter Notebook and published at Kaggle.

divvy_bikes_2nd_edExplore and run machine learning code with Kaggle Notebooks | Using data from divvy_rides_2022_simplified

Tools used:

MS Excel for data cleansing and transformation
R/RStudio for basic data processing and script-polishing purposes
Jupyter Notebook at Kaggle for publishing the work in Markdown

divvy_bikes.pdf

Page updated

Google Sites

Report abuse