Welcome to my analytics portfolio! I have a unique take on predictive analytics as I use no-code software that runs popular analytics tools in the backend of them. For instance, Orange uses Python and JASP uses R.
The software can be downloaded at the following:
https://orangedatamining.com/download/#windows
https://jasp-stats.org/download/
Here is an overview of each project:
1. Hotel Booking: The data for this project can be found here https://www.sciencedirect.com/science/article/pii/S2352340918315191 - The data consists of hotel reservations for two hotels found in Portugal. I uploaded to data MySQL, wrote queries to extract data in a more optimized format, pulled the newly formatted data to Power BI and Tableau to create dashboards, and then I took a sample of the data to conduct predictive analytics using Orange. I used classification methods to predict hotel booking cancellations, regression methods to predict average daily rate, clustering to create new market segments, and dimensionality reduction to simplify the dataset.
2. Text Mining of Twitter: Twitter is fascinating data source as new data is being generated by the millions (500 million tweets per day). The great part about Orange is that it has a built in API for Twitter data that can be pulled using very simple queries. I pulled in Twitter posts from @UN and conducted text mining in Orange. I used Word Clouds to visualize the data, cluster analysis the group the text, sentiment analysis, and also conducted social network analysis.
3. Image Analysis of Traffic Signs: The data was obtained from one of the many datasets provided within the Orange software. There are a variety of traffic signs on the road. Image analytics were conducted to see if we can cluster together the different signs and also predict those clusters as well. Image embedding was conducted using the Inception V3 embedder to extract data from the images and put them in tabular format. Then, clustering was used to group together the signs which was followed up with various classification methods such as Logistic Regresion, AdaBoost, Naive Bayes, and Gradient Boosted Trees to predict the generated clusters.
4. Airport Cargo Social Network Analysis: The data was obtained from one of the many datasets provided within the Orange software. Social network analysis is useful for seeing the relationships between entities. Here, I analyze the connection between different airports using social network analysis.
5. Stock Price Time Series Analysis: In practice, stock price prediction isn't very useful due to the wide confidence intervals but it is still fun to do! Similar to Orange's Twitter API, Orange also has a Yahoo Finance API that can pull in stock data based on the inputted ticker symbol. Here, I pull from a classic ETF, VOO to try to predict the S&P 500. Indices, such as the VIX, can also be extracted. Autoregressive integrated moving average (ARIMA) and vector autoregressive models (VAR) were used to conduct time series analysis. Seasonality adjustment, periodogram, correlogram, spiralogram, and Granger Causality were also used in the analysis.
6. Airline Satisfaction Survey Analysis using Bayesian and Frequentist Statistics: The dataset can be found here: https://www.kaggle.com/datasets/mysarahmadbhat/airline-passenger-satisfaction - Traditional statistics are conducted to try to infer the characteristics of the population based on the sample. JASP was used as it has a comprehensive statistical library of both Bayesian and Frequentist methods. I used both Bayesian and Frequentist versions of chi-squared tests, logistic regression, linear regression, exploratory factor analysis, reliability analysis, principal component analysis, independent samples t-test, hierarchical clustering, and mediation analysis.
Hotel Booking
Text Mining of Twitter
Image Analysis of Traffic Signs
Airport Cargo Social Network Analysis
Stock Price Time Series Analysis
Airline Satisfaction Survey Analysis using Bayesian and Frequentist Statistics