Projects
Corpus Released
mySentence, sentence segmentation for Myanmar language corpus: https://github.com/ye-kyaw-thu/mySentence
AskCovidDr : Retrieval based TF-IDF Bilingual chatbot for Covid-19
Date : Oct, 2022.
Demonstration : https://youtu.be/6Vn0DKm69R4
Description : AskCovidDr is a retrieval based bilingual chatbot which will extract the most similar answer to the question the user asked. Input from the user will be vectorized by the vectorizer, which was trained on our collected question and answer paired corpus. After the text vectorization, the input text becomes a vector and string similarities between the question user asked and the answer we predefined would be compared.
Date : Sep, 2022.
Code and Documentation : https://github.com/ThuraAung1601/mySpellCorrect
Demonstration : https://youtu.be/AIyafOxkE6o
Description : mySpellCorrect, Spelling Correction for Burmese (Myanmar Language) mini-project, is one of my pet projects. Here, Statistical approaches such as ngram and SymSpell were used rather than Rules-based. It is not the first use of SymSpell for Myanmar (မြန်မာ) Language. There is a systematically researched conference paper called SymSpell4Burmese. Therefore, this would be an unofficial implementation of SymSpell4Burmese. myPOS corpus version 3.0 was used to build ngram dictionaries needed for language models.
Date : Nov, 2022.
Code and Documentation : https://github.com/ThuraAung1601/mmCRFseg
Description : mmCRFSeg is an educational purpose project for Myanmar Language Word Segmentation using Conditional Random Fields. Each character in the trained corpus was tagged with 1 for Beginning of Word and 0 for others. myPOS corpus version 3.0 was used to trained mmCRFseg. mmCRFSeg was evaluated on open-test portion of myPOS corpus.
Date : Feb, 2022.
Demo : https://youtu.be/mAjoRFMYshg
Code and Documentation : https://github.com/ThuraAung1601/BHDD-using-basic-CNN
Description: The major goal of the proposed system is understanding the Convolutional Neural Network, and applying it to the Burmese handwritten recognition system. The system can recognise handwritten Burmese Digits with 99% accuracy. This project won The Best Project Award for Simbolo Artificial Intelligence Project Challenge.
Date : Nov - Dec, 2021.
Code and Documentation : https://github.com/ThuraAung1601/Automatic-Myanmar-News-Classification
Demo : https://youtu.be/EdsndeNpEFg
Description : myNews is a machine learning based automatic Myanmar News Classification System. Although there was an implementation for Naive Bayes Classifier in the original repository, other Machine learning algorithms such as Linear SVM, KNN, Random Forest and Decision Tree were implemented for the experiments. The weighted f-score is highest when using Linear SVM. THe vectorizer I used is tf-idf. Dataset used for training and testing of the models is from Aye Hnin Khine’s repository.
Ceretai : Using Computer Vision to Detect Ethnicity in Videos and Improve Ethnicity Awareness
Date : Oct, 2021 - Jan,2022.
Description : The team of more than 40 AI engineers developed a Machine Learning algorithm to classify the ethnicities of people seen on TV and detect faces in videos. The team developed a Machine Learning algorithm to classify the ethnicities of people seen on TV and detect faces in videos. Different task groups built multiple models and fine-tuned them using datasets collected through active learning processes that classify faces into six identities.
Human Detection System using SVM and HOG Features
Date : Feb, 2022.
Code and Documentation : https://github.com/ThuraAung1601/human-detection-hog
Description : This project is to detect pedestrians on the road for traffic safety. Images of human pedestrians were used to train the system. To boost the system accuracy, Histogram of Oriented Gradients features were extracted and trained using Linear SVM. Linear SVM got 93% accuracy for the human detection project.
ConvNet based Driver Drowsiness Detection and Alert System
Date : Nov, 2022.
Demo : https://youtu.be/vkOM3VAm8Rg
Code and Documentation : https://github.com/ThuraAung1601/drowsiness-detection
Description : One part of the MRL Eye Dataset is used for image classification - open or close eyes. Custom Convolutional Neural Network was implemented and gained 94.7% accuracy. ResNet was also tested with transfer learning and fine tuning approaches but custom CNN outperformed them.
Synthetic Myanmar-Thai Parallel Corpus
Date : 7th April, 2023.
Code and Documentation: https://github.com/ThuraAung1601/Synthetic-myanmar-thai-parallel-corpus
Description: Myanmar-Thai parallel corpus with 18,373 sentence pairs has been generated using the Google Translate Machine Translation system manually. The original corpus is the "Myanmar-Rakhine" part of "myPar: Myanmar Parallel Corpora for Machine Translation R&D". IBM-1 word translation model was also trained on the synthetic data.
myTypo
Date : 30th January, 2023.
Code and Documentation: https://github.com/ThuraAung1601/myTypo
Description: myTypo is a Python package to simulate typographical errors in the Myanmar language in order to solve a part of low-resource problem in text correction research for the Myanmar language.
mytxt2braille
Date : 13th April, 2023.
Code and Documentation: https://github.com/ThuraAung1601/mytxt2braille
Description: Rule-based Translator from Myanmar Text to Braille. This tool can be used for translating Myanmar text to both Grade-1 and Grade-2 Braille systems.