Machine Learning | Dashboard Development | Exploratory Data Analyst
Projects I worked on during my internship at Telkom! 😊✨🎉
Aug 2024 - Jan 2025 (as Data Scientist)
▪ IndiHomeX Project
In the IndiHomeX project, I analyzed 10 competitors in internet add-on services, leading to a new solution that enhances Wi-Fi networks by eliminating dead zones. I also redesigned the Speed on Demand dashboard, adding a KPI page for easier performance monitoring by stakeholders.
▪ Omni Communication Assistant (OCA Indonesia)
For the OCA project, I developed a machine learning framework for RFM analysis and churn prevention, documenting the process with flowcharts for future reference. I also created insights for the OCA dashboard by defining key metrics and optimizing the user flow on the OCA Telkom website to enhance the customer journey.
Combination of Hyperparameter Initialization and Adam Algorithm for Optimizing Gated Recurrent Unit Models in Forecasting Stock Closing Prices Post-Boycott Issues
December 2024
▪Collected a comprehensive five-year dataset of daily closing prices for PT Unilever Indonesia Tbk, enhancing the model's learning capability through effective data normalization and splitting.
▪Developed a GRU model for stock price forecasting, achieving a MAPE of 1.59% and RMSE of 71.5 through systematic hyperparameter tuning across three scenarios:
Scenario 1: Time step: 10, Learning rate: 0.1, Adam parameters: Default. Resulted in the highest MAPE, indicating suboptimal accuracy due to lower time step and higher learning rate.
Scenario 2: Time step: 20, Learning rate: 0.001, Adam parameters: Default. Achieved the best performance, improving model stability and convergence.
Scenario 3: Time step: 30, Learning rate: 0.0001, Adam parameters: Higher values. Provided better stability but underperformed compared to Scenario 2, highlighting the trade-offs in parameter settings.
▪Assessed the influence of boycott issues on stock performance, emphasizing the need to consider market dynamics. Implemented short-term forecasting strategies for timely predictions, aiding investors in making informed decisions.
Credit Scoring Classification - Dataset Home Credit Indonesia
May 2024 - Project based internship
▪ Worked with 2 main tables and 5 related tables with a total of 307,511 rows of data which were then understood, try to analyzed and merged into the main table using ERD.
▪ Train data using several models such as logistic regression, KNN, decission tree, catboost and gradient boosting. The best performing model is CatBoost Classifier with model accuracy 0.75 and ROC-AUC score 0.82.
▪ Predict to data with CatBoost model and get the result to predict customers who successfully pay, and customers who fail to pay. Providing business recommendations for customers who can return borrowed money.
MERKUZONE: Identification and Prioritization of Environmental Zoning for Pollution Management (K-Means Clustering) - Indonesian Environmental Statistics 2023
May 2024 - Final project
▪ Understand, explore, and preprocess Indonesian environmental statistics data across all provinces with a total of 34 data dimensions and sort them to focus on the topic.
▪ Selecting 3 variables with the purpose of analysis, then the clustering output results I named “merkuzone clustering”, which is clustering the MErah zone, KUning zone and hiZau zone to prioritize areas.
▪ The evaluation results of the silhouette method obtained 3 clusters, provide recommendations for innovative solutions, which if successfully implemented will significantly reduce pollutant levels and support the government in achieving SDGs goals.
ADIDAS Customer Classification by Demand Category Using KNN, Decision Tree, SVM, GaussianNB, and Logistic Regression Algorithms - Adidas Sales 2020-2021
May 2024 - end to end
▪Understand, perform data preprocessing, feature engineering on sales data with a total of 9,641 rows.
▪ The result is that the Logistic model has excellent model performance with 96% accuracy and 97% precision. Next, perform model tuning using grid search CV to improve the model when faced with new data.
▪ Predict to new data (not seen before) and obtained 89% accuracy with 90% precision. In addition, I provided several business recommendations based on the customer demand category for the product. so that adidas can pursue a strategy with the sharpness of my analysis.
Customer Behavior Segmentation (RFM Analysis) - Global Superstore 2014
April 2024 - Task
▪ Understand and preprocess sales data with a total of 51,290 rows.
▪ Using various algorithms to determine the suitable method including: K-Means (Elbow Method), Algomerative, DBSCAN and evaluating the optimum cluster value using Silhouette analysis and found 3 clusters because this case is more suitable using K-Means.
▪As a form of analysis sharpness, I provide business recommendations both personally for customers, campaigns and exclusive product launches for 3 customer clusters where Super shoppers dominate at 48%, followed by regulars customers at 43% and inactive customers only 9%.
Classification of Surviving and Unsaved Passengers on the Titanic using KNN, Decison Tree, Random Forest, and XGboost - Titanic Dataset
Maret 2024 - Task
▪ Understanding and preprocessing titanic data with a total of 891 rows.
▪ Train data using KNN, Desicion Tree, Random Forest and XGBoost models. The result is that the Random forest model has the best performance with 82% accuracy and 82% recall, the recall matrix is chosen because in this case I have to predict more true positives and reduce false negatives.
▪ Predict to test data and recall predicts more surviving passengers than any other matrix. Because high recall can help reduce false negatives so that when we predict more surviving passengers, in this case we can search and rescue faster even though the passenger may have died in reality.
Environmental Zoning Based on Pollutant Levels in Indonesia - Google Looker Studio
May 2024 - Final project
▪ Developed an interactive dashboard to visualize the level of pollutants in various regions in Indonesia seen from the environmental quality index, air quality index, and land cover quality index in 2023.
▪ Designed a dashboard layout with headers, important notes, and a color palette tailored to the topic. Selected appropriate chart types (sankey diagram, bar cahrt, geospatial,scatterplot, table) and incorporated scorecards and filters for enhanced usability.
▪ The main result, I managed to map and visualize all provinces in Indonesia with 3 provinces in the red zone (8.8%), 19 in the yellow zone (55.9%), and 12 in the green zone (35.3%), this dashboard is very useful especially for the public who are in the city because they can find out areas that are high and safe pollution.
Adidas Sales Dashboard - Tableau Public
May 2024 - End to end
▪ Developed an interactive dashboard using Tableau, visualizing adidas product sales ranging from total sales by region, retailer, region, best-selling products, monthly sales trends and I classified products based on customer demand categories.
▪ Create a dashboard layout, create headers and important notes that users should know, choose a color pallete that matches the topic, choose the type of chart and graph that matches the data you want to display, add scorecards and add filters.
▪ In this case I used 3 scorecards as the main metric, using bubble charts, pie charts, bar charts, column charts, geo spatial, scatter plots, line charts, and treemaps to display visual information that helps policy makers and users understand adidas sales.
PT. Mari Belajar Sales Report - Microsoft Power BI
April 2024 - Training
▪ Develop an interactive dashboard using Power BI to view revenue per month, per product, number sold and view revenue source analysis.
▪Create a dashboard layout, create a header, choose a color pallete that matches PT Mari Belajar, choose the type of chart and graph that matches the data you want to display, add scorecards and add filters.
▪ In this case I use 2 scorecards as the main metric, using pie charts, bar charts, line charts, and sankey diagrams to display visual information that helps policy makers and users about sales results at PT. Let's Learn.
Note: This is the simulation data provided by PT Mari Belajar when I got the scholarship to study data in a week.
Exploratory Data Analysis in Spreadsheet - The Look Ecommerce
Maret 2024 - Task
▫ Retrieving a dataset from the google big query database with the name the look ecommerce using a query and performing a left join with several tables with a total of 25,635 rows in the period 2020 to 2023.
▫ Imported data into google spreadsheet and explored using pivot table and got 10+ insights
▫ Created 5+ visualizations with interesting insights, one of which is “the female gender cancels more orders than men. This can be caused by the fact that women are generally picky and have a fast-changing taste, so it takes a long time to decide to buy something". collected in a mini dashboard equipped with scorecards and slicers for users.
Exploratory Data Analysis in MySQL - Number of Poor People in West Java Districts/Cities
February 2024 -End to end
▫ Analyzed data on the number of poor people in West Java from Open Data Jabar with a total of 567 rows from 2002 to 2022.
▫Performed 8+ basic querying on sql to see the largest number of poor people, the increasing trend and also the decreasing trend, especially during covid 19 because the data is focused on 2019-2022.
▫ Obtained 4+ insights on the causes of poverty, one of the insights is that in 2020, Bekasi Regency experienced an increase in the number of poor people by 0.25 followed by Depok City by 0.22. This is none other than the effect of the Covid -19 pandemic and the geographical location of the two regions, namely in the strategic area of JABODETABEK where there is the most activity and outdoor work.