Welcome to Mohamed Hisham's homepage. I'm a Software Engineer who builds stuff. I get things done.
About me
Hi, I’m Mohamed, a software engineer who's interested in machine learning, data science, NLP, gaming, and VR/AR.
My experience includes optimizing C++ CUDA kernels for Hugging Face, fine-tuning LLMs, developing LLM agents and RAG-powered applications, building MCP servers, creating VR games, and data analysis/data viz.
Please take a look at my portfolio below.
Open-Source Contributions - 2025
Reduced the latency of the 4-bit (NF4/FP4) dequantization algorithm for Bitsandbytes on Nvidia GPUs (mainly A100, H100, B200) by around 40% overall. Benchmarking Llama-3.1-70B on Nvidia H100, the optimized algorithm improved decode throughput by ~33% and reduced latency by ~25% for batch sizes greater than 1, and improved prefill throughput by ~12% and reduced latency by ∼10% for batch size 1. Built a stable, Dockerized, end-to-end inference benchmarking, stress testing, and analysis pipeline for reproducible evaluation of k-bit quantization/dequantization algorithms for Bitsandbytes on Nvidia hardware.
Open-Source Contributions - 2025
Discovered and fixed a bug in the 4-bit quantization logic on Nvidia GPUs that reduced relative quantization error for NF4 by ~98% and for FP4 by ∼96%, for quantization block sizes larger than 512.
CS Graduation Project - 2022
This project focused on enhancing accessibility for the Deaf community by providing real-time response suggestions. I collected and processed a dialogue-based dataset from 12 unstructured sources, which included over 3 million sentences and 285,000 dialogues. The dataset was labeled using the Google Natural Language AI API. The collected dataset was used to fine-tune Google’s T5 Large Language Model on the task of sentence-based dialogue completion, resulting in a BERTScore F1 mean of 82.3%.
Research Project - 2022
Developed a pipeline to reconstruct 3D character models using a U-Net CNN generator network and a PatchGAN discriminator. The system was trained on a dataset of over 28,000 3D models, which were scraped from various 3d Models online stores. The deep learning model generated depth maps from lateral views of the scraped 3D models, which were then transformed into point clouds and reconstructed into 3D meshes using marching cubes.
Research Project - 2024
This project focused on developing small feedforward neural networks to approximate signed distance fields (SDFs) for real-time rendering. The neural networks were trained by sampling 3D point clouds from 3D meshes and using Fourier Features to improve their approximation accuracy. The real-time rendering was implemented in Unity using a custom Compute Shader with raymarching, achieving an average of 30 frames per second using a lightweight two-layer neural network.
Research Project - 2024
Developed a semantic segmentation model using a Siamese U-Net CNN architecture to analyze multispectral satellite imagery (13-band). Trained on the Onera Satellite Change Detection dataset, the model achieved an Intersection over Union (IoU) score of over 85% in detecting urban changes. Data augmentation techniques and optimization using dice loss and IoU loss functions were applied to improve model performance.
Personal Project - 2021
This project involved the construction of a hybrid deep learning model that combined Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) architectures. The model was trained on the GTZAN dataset, using spectrogram features and Mel Frequency Cepstral Coefficients (MFCCs), to classify music genres. Implemented and trained both Convolutional Neural Network and Long Short-Term Memory models. The CNN achieved a test accuracy of 88.9%, while the LSTM attained 94.1% classification accuracy.
Personal Project - 2019
Implemented the NeuroEvolution of Augmenting Topologies (NEAT) algorithm from scratch in Python as described in the original research paper and applied it to the Flappy Bird game as a test use case. The implementation consistently converged to valid solutions across all 100 independent runs on the XOR classification benchmark. In 40% of the runs, the evolved networks achieved topologically minimal optimal solutions, averaging 1.85 hidden nodes per network, 22% fewer than the baseline reported in the original NEAT publication.
Global Game Jam - 2020
Developed for the Oculus Rift S during the Global Game Jam 2020, "STUCK" is a virtual reality game that allows players to manipulate time and space to alter the outcomes of their actions. Set in an infinite, psychedelic tunnel, players must rectify the consequences of their choices by rewinding and replaying pivotal moments. The game allows players to rewind and replay critical moments to pass levels.
Internship Project - 2019
During an internship project, I developed a seamless stereoscopic portal teleportation system for virtual reality, compatible with both HTC Vive and Oculus Rift S. The project focused on providing seamless movement and enhanced immersion in virtual reality environments through a portal teleportation mechanic, inspired by the videogame "Portal,".
Brackyes Game Jam - 2020
Developed during the Brackeys Game Jam 2020, this 3D Unity game allows players to see enemies on rewinding CCTV footage after their character dies. Players must track the enemies' positions and defeat them on the next attempt, combining spatial awareness and strategic timing. The game challenges players to rely on memory and observation skills.
Manomotion Game Jam - 2020
Developed for the Manomotion Game Jam, "Sock Puppet" is a mobile augmented reality game where players use hand gestures to control the movement of a virtual character. The game provides real-time hand tracking to create a gesture-based gameplay for an interactive AR experience.