MY MOST RELEVANT PROJECTS
MY MOST RELEVANT PROJECTS
Data Science, Data Visualization, Preprocessing. Exploration, Machine Learning
🧠 Understanding ADHD Through Data Science
I'm excited to share the results of my final project for the CE888: Data Science and Decision Making module, where I tackled a real-world healthcare challenge: using machine learning to predict ADHD diagnosis and participant sex from neuroimaging and socio-demographic data.
💡 The Challenge
As a data scientist for the NHS (scenario-based), I built an explainable and fair predictive system to support clinicians in diagnosing ADHD — especially in female patients, who are often underdiagnosed.
🔍 Stage 1 – Data Exploration & Preprocessing
Cleaned and merged fMRI connectome data, socio-demographics, emotional, and parenting features for 1,213 participants.
Applied imputation techniques (e.g. IterativeImputer) and mutual information for feature selection.
Standardized numerical features and encoded categorical variables.
Used stratified train/validation/test splits to ensure balanced subgroup representation.
🧠 Stage 2 – Predictive Modeling & Fairness Evaluation
Trained several models: Logistic Regression, Random Forest, XGBoost, MLP, SVM, and KNN.
Evaluated using F1, Recall, and AUC-ROC with special attention to clinical relevance.
Random Forest was selected for ADHD prediction:
Test Accuracy: 82%
Recall (ADHD): 88%
AUC-ROC: 0.866
Applied LIME and SHAP for model explainability.
Conducted bias and fairness analysis across sex groups — ensuring high recall and balanced performance for both males and females.
📊 Tools & Libraries:
Python, Scikit-learn, XGBoost, Keras, LIME, SHAP, Pandas, Seaborn, Matplotlib
🎓 This project helped me sharpen my skills in:
-End-to-end data science pipeline development
-Interpretable ML in healthcare
Model fairness and explainability
Jupyter repository: https://github.com/DArmandoSalinas/Predicting-ADHD-sex/tree/main
Data Science, Data Visualization, Preprocessing. Exploration, Machine Learning
🧠 Understanding ADHD Through Data Science
I'm excited to share the results of my final project for the CE888: Data Science and Decision Making module, where I tackled a real-world healthcare challenge: using machine learning to predict ADHD diagnosis and participant sex from neuroimaging and socio-demographic data.
💡 The Challenge
As a data scientist for the NHS (scenario-based), I built an explainable and fair predictive system to support clinicians in diagnosing ADHD — especially in female patients, who are often underdiagnosed.
🔍 Stage 1 – Data Exploration & Preprocessing
Cleaned and merged fMRI connectome data, socio-demographics, emotional, and parenting features for 1,213 participants.
Applied imputation techniques (e.g. IterativeImputer) and mutual information for feature selection.
Standardized numerical features and encoded categorical variables.
Used stratified train/validation/test splits to ensure balanced subgroup representation.
🧠 Stage 2 – Predictive Modeling & Fairness Evaluation
Trained several models: Logistic Regression, Random Forest, XGBoost, MLP, SVM, and KNN.
Evaluated using F1, Recall, and AUC-ROC with special attention to clinical relevance.
Random Forest was selected for ADHD prediction:
Test Accuracy: 82%
Recall (ADHD): 88%
AUC-ROC: 0.866
Applied LIME and SHAP for model explainability.
Conducted bias and fairness analysis across sex groups — ensuring high recall and balanced performance for both males and females.
📊 Tools & Libraries:
Python, Scikit-learn, XGBoost, Keras, LIME, SHAP, Pandas, Seaborn, Matplotlib
🎓 This project helped me sharpen my skills in:
-End-to-end data science pipeline development
-Interpretable ML in healthcare
Model fairness and explainability
Jupyter repository: https://github.com/DArmandoSalinas/Predicting-ADHD-sex/tree/main
Data Science, Data Visualization, Preprocessing. Exploration
🧠 Understanding ADHD Through Data Science
I recently worked on a project aimed at analyzing behavioral, demographic, and neurological data to explore patterns in ADHD diagnosis. This Stage 1 focused on data exploration and preprocessing, ensuring the dataset is structured for predictive modeling in Stage 2.
🔍 Key Insights from Data Exploration (EDA)
📊 Class Distribution & Imbalance
The dataset revealed a class imbalance, with more ADHD-diagnosed individuals than non-ADHD participants.
This means future models will need resampling techniques or alternative evaluation metrics to avoid bias toward the dominant class.
📈 Strong Correlation Between Behavioral Scores & ADHD
Features like SDQ Hyperactivity, SDQ Externalizing, and SDQ Difficulties Total showed a high correlation with ADHD outcomes.
These insights validate previous psychological studies linking hyperactivity and externalizing behaviors to ADHD diagnosis.
📉 Some demographic variables, such as MRI scan age, show low correlation with ADHD diagnosis, indicating they might not be strong predictors.
Further feature selection techniques will determine their significance in modeling.
🛠 Handling Missing Values & Outliers
The dataset contained minimal missing values, which were handled without significant data loss.
Outliers in behavioral scores were detected, but given that ADHD symptoms can vary significantly, these were retained to preserve data integrity rather than removed.
🚀 Preparing for Stage 2: Model Development
🔹 Feature Scaling
Models such as k-NN, SVM, and Neural Networks are sensitive to feature magnitudes, so Min-Max Scaling or Z-score Standardization will be applied in Stage 2.
🔹 Machine Learning Models Considered
Logistic Regression for baseline performance.
k-NN to leverage proximity-based classification on behavioral patterns.
Random Forest for handling mixed data types while capturing feature importance.
SVM to explore non-linear decision boundaries in high-dimensional data.
Neural Networks (if feasible) for detecting complex patterns in ADHD indicators.
🔹 Evaluation Metrics
AUC-ROC, Precision-Recall, and F1-score will be prioritized over accuracy to handle class imbalance effectively.
💡 Key Takeaways
✅ Data preprocessing is crucial—ensuring proper handling of missing values, categorical encoding, and feature scaling before model training.
✅ Feature correlation analysis provides valuable insights into which factors may contribute most to ADHD classification.
✅ Modeling strategies must account for class imbalance, as ADHD cases are overrepresented in the dataset.
Jupyter notebook in : https://github.com/DArmandoSalinas/ADHD-Outcome-Prediction-Data-Exploration-Preprocessing
Ready to move into Stage 2—model training and optimization! 🚀
Artificial Intelligence, Machine Learning, Recommender Systems
Developed a comprehensive Movie Recommendation System using the popular MovieLens dataset to practice in machine learning, recommendation algorithms, and data preprocessing techniques. The system includes a simple web interface for user interaction, providing an intuitive platform for exploring recommendations.
Approaches Implemented:
Collaborative Filtering (SVD):
Built a User-Item matrix to identify latent patterns in user preferences through Singular Value Decomposition (SVD).
Predicted user ratings for unrated movies and recommended top movies for each user.
Achieved a Root Mean Squared Error (RMSE) of 0.477, significantly outperforming the baseline RMSE of 3.624.
Content-Based Filtering:
Leveraged TF-IDF vectorization and cosine similarity to recommend movies similar to a given movie based on genres.
Provided personalized recommendations for new users or sparsely rated datasets.
Hybrid Recommendation System:
Combined the strengths of collaborative and content-based filtering.
Adapted to varied user profiles by dynamically applying the most effective approach.
Web Integration:
Created a lightweight web application to showcase the recommendation system, built using Flask.
Designed a user-friendly interface enabling users to input movie titles or user IDs for tailored recommendations.
Demostrated integration of machine learning models with web technologies, ensuring efficient performance and scalability.
Machine Learning and Advanced Algorithm Implementation (SVD, TF-IDF)
Recommender Systems (Collaborative, Content-Based, and Hybrid)
Web Development with Flask
Data Preprocessing and Normalization
Statistical Analysis and Performance Evaluation
Artificial Intelligence, Machine Learning, Natural Language Processing (NLP)
Developed a text classification system for emotion recognition by analyzing tweets, utilizing the XGBoost algorithm for its exceptional performance in multi-class classification tasks. This project aimed to categorize tweets into six distinct emotional categories, showcasing the power of machine learning in processing social media data efficiently.
The project involved rigorous preprocessing techniques tailored to the unique challenges of tweet analysis. Steps included removing noisy elements like hashtags, mentions, and non-alphabetic characters, applying stemming to normalize words, and using Bag-of-Words for feature extraction. These processes optimized the dataset for modeling and ensured the system could handle over 400,000 entries effectively.
The XGBoost model delivered robust performance metrics, including an accuracy of 87.5%, by leveraging stratified K-fold cross-validation to ensure reliability and consistency. Visualizations, such as word clouds and bar plots, were used to provide deeper insights into the most common words and their associations with specific emotions.
Future improvements, including feature engineering with TF-IDF and n-grams to capture contextual relationships, as well as hyperparameter tuning, were identified to enhance the model's effectiveness further. This project highlights the critical role of advanced algorithms and NLP techniques in unlocking insights from unstructured social media data.
Competences
Natural Language Processing (NLP)
Machine Learning Model Development and Evaluation
Text Data Preprocessing and Feature Engineering
Advanced Algorithm Implementation (XGBoost)
Cross-Validation Techniques for Model Robustness
Analytical Thinking and Problem-Solving
Artificial Intelligence, Deep Learning, Time Series Analysis
In this project, I developed a predictive model for Rossmann drugstore sales using a Feedforward Neural Network (FNN). The goal was to forecast daily sales up to six weeks in advance to support effective staff scheduling and resource management.
Key highlights include:
Data Preprocessing: Engineered features from temporal data (Year, Month, Week), competition metrics, and promotional activity durations. Missing data was handled effectively to ensure model robustness.
Feature Engineering: Created new metrics like competition open duration and promo activity duration to enrich the dataset and improve predictions.
Neural Network Design:
Built a multi-layer FNN with 3 hidden layers (128, 64, 32 nodes) using ReLU activation and dropout for regularization.
Optimized with Adam optimizer and Mean Squared Error (MSE) loss function.
Optimization Techniques: Implemented early stopping to prevent overfitting and ensure generalization.
Evaluation and Deployment: Preprocessed data was used to train the model and evaluate it on the Kaggle Rossmann dataset.
The project demonstrates the application of deep learning techniques to structured/tabular data, leveraging TensorFlow/Keras for implementation. It emphasizes efficient data preprocessing and feature engineering, which are critical for achieving reliable results in real-world machine learning tasks.
Competences
Deep Learning
Machine Learning
Data preprocessing
Feature Engineering
Python Programming
TensorFlow/keras
Time Series Analysis
Data Visualization
Statistical Analysis
Intelligent Systems and Robotics
Developed and implemented a fuzzy logic and PID-based autonomous navigation system for a TurtleBot. The system enables the robot to follow a predefined edge, avoid obstacles, and react to its environment in real time. Leveraged Python to design modular and structured code for sensor data interpretation, rule-based decision-making, and smooth robotic movement. The approach combined linear and angular velocity control to optimize performance.
Competences
Analytical Thinking and Problem-Solving
Technical Proficiency in Fuzzy Logic and PID Control
Sensor Data Integration and Real-Time Processing
Code Modularity and Organization
Adaptability and Learning Agility
Quality Assurance and Testing
DIAR NOTICES
Software
I developed a digital platorm to recevive and send notices, naming it as a distribution canal of notices. For this project I used software developing skills. I programmed with python, JavaScript, HTML, CSS, Flask.
Competences
Analytical Thinking and Problem-Solving
Technical Proficiency
Project Management
User Experience Design
Adaptability and Learning Agility
Quality Assurance and Testing
Thermofluids
This article delves into the realm of causal thermogenesis, the process through which living organisms generate heat to regulate their body temperature. Over the years, researchers have extensively studied this phenomenon to gain a deeper understanding of how the human body responds to both internal and external conditions. The primary objective of this study is to develop a protocol that determines the optimal waiting time before measuring body temperatures to ensure thermogenesis has occurred. The protocol's approach is grounded in energy balance analysis and incorporates variables relevant to a clinic-like setting.
In this project I worked with the research of papers related to what we were working on and the mathematical calculus.
Competences
Ethical Argumentation
Decision Making
Implementation of Actions
Application of Sustainability Principles
Evaluates Feasibility of mechatronics developments
Manufacturing
The project aims to address issues related to the formation of the moldboard, which currently does not achieve its desired shape and is prone to cracks after several cycles of use. The proposal is about a new design for a fixture and stress relief process to solve these issues.
In this project I worked in the mechanical design of the fixtures, their functionality and positioning.
Competences
Effectiveness in Negotiation
Applies Technologies to Develop a Mechatronic Product or System
Selects Mechatronic Components Applying Methodology
Electronics
Using Simulink, I simulated some scenarios with logic gates.
Competences
Demonstration of the Operation of Engineering Systems and Devices
Designs Strategies to Automate Processes
Validates Automation Proposals
Electronics
The aim of the project described in the document is to design and implement an electronic conditioning circuit for accurate temperature measurement. The project is structured around two main stages: Sensing and Conditioning. In the Sensing stage, the system is designed to output a voltage proportional to the temperature in degrees Celsius. In the Conditioning stage, the output is adjusted to reflect temperature in degrees Fahrenheit. The successful implementation of this circuit will demonstrate the importance of electronics in instrumentation and controlling critical variables such as temperature.
In this project I worked simulating the circuits in the software Simulink, making the calculus needed and contributed with the connections in the circuit.
Competences
Effectiveness in Negotiation
Applies Technologies to Develop a Mechatronic Product or System
Selects Mechatronic Components Applying Methodology
Manufacturing, Electronics, Mechanics, Programming and Robotics
The aim of the project described in the document is to develop a proposal for automating a welding process for ABB's manufacturing system. The project seeks to achieve the welding of two specific components within a set time of 31 seconds, using ABB collaborative robots and a specific welding head. The objective is to enhance the efficiency and effectiveness of the welding process, ensuring high-quality joins with optimized production times.
In this project I developed a proposal of using two robots, one for pick and placing functions, other one for the welding procedures and a positioner to facilitate the efficiency of the welding cell. I worked in the mechanical design of the components, the assembly, I programmed all the welding cell in the software RobotStudio and simulate the perfomance of the welding cell getting optimal results.
Competences
Integrity
Applies Technologies to Develop a Mechatronic Product or System
Conducts Theoretical and Experimental Process Modeling
Investigates the State of the Art for Product Development
Generates Mechatronic System Proposals
Evaluates the Feasibility of Mechatronic Developments
Electronics, Programming and Control
The aim of the project described in the document is to design and implement a control system for a greenhouse lighting setup using LED lighting for growing tomato plants. The project involves assembling a prototype, identifying system parameters, modeling processes using various methods, and simulating these models in Simulink. The goal is to achieve precise control over the lighting conditions to optimize plant growth, by applying different control strategies and evaluating their performance to determine the most effective method.
Competences
Critical Thinking
Conducts Theoretical and Experimental Process Modeling
Validates Automation Proposals
I am familiar with:
Visual Studio Code
Jupyter Notebook
Python
TensorFlow
Scikit-learn
ROS
JavaScript
HTML
CSS
Flask
SolidWorks
RobotStudio
Plant Simulation 16
Zortrax
Matlab - Simulink
Multisim
Processing
In-Sight Vision
Stm-32 Cube-IDE