Oct 2023 - Oct 2023
The project won the first place in CodeML Hackathon 2023
Training and validation of a DL model for emotions detection based on image data
Data cleaning, preprocessing and augmentation to have a model robust to noise and images distortion during inference time
Hyper-parameter search and fine-tuning
Real-time logging and tracking of results
Skills: Python · Deep Learning · Computer Vision · Problem Solving · Machine Learning
Aug 2022 - Aug 2023
TL,DR: Design and develop a framework to improve the robustness of deep learning models against extraction attacks.
Abstract: Model stealing attacks have become a serious concern for deep learning models, where an attacker can steal a trained model by querying its black-box API. This can lead to intellectual property theft and other security and privacy risks. The current state-of-the-art defenses against model stealing attacks suggest adding perturbations to the prediction probabilities. However, they suffer from heavy computations and make impracticable assumptions about the adversary. They often require the training of auxiliary models. This can be time-consuming and resource-intensive which hinders the deployment of these defenses in real-world applications. In this paper, we propose a simple yet effective and efficient defense alternative. We introduce a heuristic approach to perturb the output probabilities. The proposed defense can be easily integrated into models without additional training. We show that our defense is effective in defending against three state-of-the-art stealing attacks. We evaluate our approach on large and quantized (i.e., compressed) Convolutional Neural Networks (CNNs) trained on several vision datasets. Our technique outperforms the state-of-the-art defenses with a ×37 faster inference latency without requiring any additional model and with a low impact on the model's performance. We validate that our defense is also effective for quantized CNNs targeting edge devices.
Skills: PyTorch · Deep Learning · Python · Pandas · NumPy · Plotly · Computer Vision · Data Visualization · Technical Reports
Sep 2021 - Jul 2022
TL,DR: A Framework for vulnerability assessment of model stealing attacks against adversarially trained Deep Learning models.
Abstract: Recent attacks on Machine Learning (ML) models such as evasion attacks with adversarial examples and models stealing through extraction attacks pose several security and privacy threats. Prior work proposes to use adversarial training to secure models from adversarial examples that can evade the classification of a model and deteriorate its performance. However, this protection technique affects the model's decision boundary and its prediction probabilities, hence it might raise model privacy risks. In fact, a malicious user using only a query access to the prediction output of a model can extract it and obtain a high-accuracy and high-fidelity surrogate model. To have a greater extraction, these attacks leverage the prediction probabilities of the victim model. Indeed, all previous work on extraction attacks do not take into consideration the changes in the training process for security purposes. In this paper, we propose a framework to assess extraction attacks on adversarially trained models with vision datasets. To the best of our knowledge, our work is the first to perform such evaluation. Through an extensive empirical study, we demonstrate that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances. They can achieve up to ×1.2 higher accuracy and agreement with a fraction lower than ×0.75 of the queries. We additionally find that the adversarial robustness capability is transferable through extraction attacks, i.e., extracted Deep Neural Networks (DNNs) from robust models show an enhanced accuracy to adversarial examples compared to extracted DNNs from naturally trained (i.e. standard) models.
Skills: Deep Learning · PyTorch · Python (Programming Language) · Machine Learning · Plotly · Data Visualization · Technical Reports · Problem Solving
Jun 2020 - Jun 2020
Implement the LPAm+ algorithm to detect communities among Games of Thrones characters.
Analyze the social network to find the most influential people in the network.
Skills: Data Science · Data Mining · Python (Programming Language) · Graphs · Cluster Analysis
May 2020 - May 2020
Develop a market basket analysis algorithm to reveal buying patterns in the Instacart dataset with over three million supermarket transactions.
Analyze market intelligence on customer trends, such as which of the most purchased products are most likely to be reordered.
Skills: SQL · Python (Programming Language) · Data Science · Data Mining · Data Analysis
May 2020 - May 2020
Develop a recommendation system that returns discussion threads (questions & answers) related to a specific question.
Skills: Python (Programming Language) · Natural Language Processing (NLP) · Data Science · Data Mining · Machine Learning · NLTK · Scikit-Learn
Nov 2021 - Dec 2021
Project within the NLP graduate course IFT6285 at University of Montreal
TLDR: Proposing and implementing an algorithm to construct a meaningful sentence from a shuffled bag of words.
In this project, we study the task of ordering a bag of words to have a meaningful sentence. We rephrase this task as a search problem: Given a bag of words, we need to find the word ordering for which a Language Model gives the best score. To do so, we experiment with different automatic approaches by querying a language model about the possible word ordering permutations.
Skills: Natural Language Processing (NLP) · Python (Programming Language) · N-Gram Language Models · Data Visualization · Problem Solving
Report: https://drive.google.com/file/d/1qENBII8bNOYFqXPVvE_NVNiO_wZ4xI2i/view
Oct 2021 - Nov 2021
Project within the NLP graduate course IFT6285 at University of Montreal
TLDR: Proposing classification systems for the CoLA (The Corpus of Linguistic Acceptability) and the MRPC (Microsoft Research Paraphrase Corpus) binary classification tasks from the GLUE benchmark by gradually improving upon a baseline.
In this project, we study two binary classification tasks from the GLUE benchmark: 1) CoLA (The Corpus of Linguistic Acceptability) task, a classification task that consists of identifying if a phrase is grammatically acceptable or no and, 2) MRPC (Microsoft Research Paraphrase Corpus) task, a classification task that consists of identifying whether each pair of sentences are semantically equivalent. For each of these tasks, we try to propose a good classification system.
Skills: Natural Language Processing (NLP) · Machine Learning · BERT (Language Model) · Data Visualization · Deep Learning
Report: https://drive.google.com/file/d/1r5HVx3xwip_hcSRy60N1In3nmpUGOlbn/view