Personal Page
Fazlourrahman Balouchzahi
Ph.D. in Computer Science - NLP
Mail: frs_b[@]yahoo.com
Google Scholar: here
ResearchGate: here
EDUCATION
2021-2024:
Ph.D. at Centro de Investigación en Computación, Instituto Politécnico Nacional, CDMX, Mexico under the supervision of prof. Grigori Sidorov and prof. Alexander Gelbukh
Thesis: Hope Speech Detection from Social Media Text
2018 – 2020:
M.Sc. Computer Science at Department of P.G. Studies and Research in Computer Science, Mangalore University, Mangalore, India under the supervision of Dr. H. L. Shashirekha
Thesis: PUNER – Parsi ULMFiT for NER in Persian Language
2013 – 2017:
B.Sc. Computer Science at Faculty of Mathematics and Computer Science, University of Sistan and Baluchistan, Zahedan, Iran
RESEARCH INTERESTS
Natural language Processing
Text Processing
Named-Entity Recognition
Sentiment Analysis
Offensive Language Identification
Author Profiling
Low Resource Languages and Code-Mixed Texts
Hope Speech Detection
Regret Detection
SKILLS
Programming Languages :
o Python
o Java
o C#
Hardware Assembling
Basic Knowledge of Optical Fiber Fusion
Writing Scientific Paper
Conference participated:
Conference and Labs of the Evaluation Forum (CLEF) 2020, Thessaloniki – Greece
1st Congress on Intelligent Systems (CIS) 2020, New Delhi, India
ICON 2020: 17th International Conference on Natural Language Processing, India.
Forum for Information Retrieval Evaluation (FIRE) 2020, India
16th conference of the European Chapter of the Association for Computational Linguistics (EACL) 2021, Ukraine
2nd Congress on Intelligent Systems (CIS) 2021, Bangalore, India
International Workshop on Soft Computing and Advances in Intelligent Systems, SC-AIS-2021, Mexico City, Mexico
8th International Symposium on Language & Knowledge Engineering, November 4th, 2021, Puebla, Mexico
60th Association for Computational Linguistics (ACL) 2022, Dublin, Ireland.
Awards
Best PhD students award in IPN Mexico 2024, winner of the gold medal "presea lázaro cárdenas", the highest recognition in the community of the Instituto Politécnico Nacional, which will be awarded personally by the president of Mexico.
NAACL grant for PolyHope project
Achievements
Overall 2nd rank in Technical Domain Identification workshop in ICON 2020 (1st rank in Bengali and Malayalam, 2nd rank in English and 3rd rank in Tamil texts).
4th in Tamil-English and 6th in Malayalam-English Code-mixed texts in Sentiments Analysis for Dravidian languages workshop during FIRE 2020.
1st, 4th, and 6th ranks in Malayalam-English, Tamil-English, and Kannada-English texts respectively in Offensive Language Identification for Dravidian languages workshop in EACL 2020.
1st, 2nd, and 3rd ranks in Malayalam-English, English, and Tamil-English respectively in Hope Speech Detection for Dravidian languages workshop in EACL 2020.
2nd rank in CheckThat! Task 3: Fake News Detection workshop in CLEF 2021.
2nd rank in PAN 2021: Hate Speech Spreaders Profiling in Spanish and English texts workshop in CLEF 2021.
2nd rank in MEDDOPROF: MEDical DOcuments PROFessions recognition shared task for Spanish texts in IberLEF 2021.
Overall 1st rank in Dravidian CodeMix HASOC, FIRE 2021 (1st rank in Tamil task2 and 2nd ranks in Tamil Task 1 and Malayalam Task 2)
2nd, 4th, and 5th ranks in Dravidian Sentiments Analysis in Kannada, Malayalam, and Tamil languages respectively in Dravidian CodeMix SA, FIRE 2021
5th rank in Arabic Misogyny Identification in ArMI, FIRE 2021
1st and 2nd ranks in English and Spanish in respectively in Proceedings of The Second Workshop on Language Technology for Equality, Diversity and Inclusion workshop in ACL 2022.
LANGUAGES
Balouchi (Native)
Persian (Native)
English (80%)
Spanish (50%)
Research Experience
Accepted
Atnafu Lambebo Tonja, Fazlourrahman Balouchzahi , Sabur Butt ,Olga Kolesnikova , Hector Ceballos, Alexander Gelbukh, Thamar Solorio. NLP Progress in Indigenous Latin American Languages. NAACL 2024.
Published
2023
F. Balouchzahi, G. Sidorov and A. Gelbukh, PolyHope: Two-level hope speech detection from tweets. Expert Systems With Applications (2023), doi: https://doi.org/10.1016/j.eswa.2023.120078. (Scopus Indexed, Q1). URL: link
F. Balouchzahi, S. Butt, G. Sidorov et al., ReDDIT: Regret detection and domain identification from text. Expert Systems With Applications (2023), doi: https://doi.org/10.1016/j.eswa.2023.120099. (Scopus Indexed, Q1). URL: link
Sidorov, G., Balouchzahi, F., Butt, S., & Gelbukh, A. (2023). Regret and Hope on Transformers: An Analysis of Transformers on Regret and Hope Speech Detection Datasets. Applied Sciences, 13(6), 3983. (Scopus Indexed, Q2). URL: link
Hegde, A., Balouchzahi, F., Coelho, S., HL, S., Nayel, H. A., & Butt, S. (2023, December). CoLI@ FIRE2023: Findings of Word-level Language Identification in Code-mixed Tulu Text. In Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation (pp. 25-26). (Scopus Indexed). URL: link
Hegde, Asha, F. Balouchzahi, Sharal Coelho, H. L. Shashirekha, Hamada A. Nayel, and Sabur Butt. "Overview of CoLI-Tunglish: Word-level Language Identification in Code-mixed Tulu Text at FIRE 2023." (2023). (Scopus Indexed). URL: link
Lakshmaiah, S. H., Hegde, A., & Balouchzahi, F. (2023). Trigger detection in social media text. Working Notes of CLEF. (Scopus Indexed). URL: link
Girish, K., Hegde, A., Balouchzahi, F., & Shashirekha, H. L. (2023). Profiling Cryptocurrency Influencers with Sentence Transformers. (Scopus Indexed). URL: link
2022
Balouchzahi, F., Butt, S., Hegde, A., Ashraf, N., Shashirekha, H. L., Sidorov, G., & Gelbukh, A. (2022, December). Overview of CoLI-Kanglish: Word Level Language Identification in Code-mixed Kannada-English Texts at ICON 2022. In Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts (pp. 38-45).
Butt, S., Amjad, M., Balouchzahi, F., Ashraf, N., Sharma, R., Sidorov, G., & Gelbukh, A. (2022, December). EmoThreat@ FIRE2022: Shared Track on Emotions and Threat Detection in Urdu. In Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation (pp. 1-3).
F. Balouchzahi, H. L. Shashirekha, and G. Sidorov. "A Comparative Study of Syllable and Char Level N-grams for Dravidian Multi-Script and Code-Mixed Offensive Language Identification". International Workshop on Soft Computing and Advances in Intelligent Systems, SC-AIS-2021, Journal of Intelligent and Fuzzy Systems (ISSN: 1064-1246). (Scopus Indexed, Q4). URL: link
H. L. Shashirekha., F. Balouchzahi., Anusha M. D., and G. Sidorov. "CoLI-Machine Learning Approaches for Code-mixed Language Identification at Word Level in Kannada-English Texts", International Workshop on Soft Computing and Advances in Intelligent Systems, SC-AIS-2021, Acta Polytechnica Hungarica (ISSN: 1785-8860). (Scopus Indexed, Q3). URL: link
Abdul Meque, Fazlourrahman Balouchzahi, Grigori Sidorov and Alexander Gelbukh. "Mexican Spanish Paraphrase Identification using Data Augmentation", IberLEF2022, September 2022, A Coruña, Spain.(Scopus Indexed). URL: link
Sabur Butt, Fazlourrahman Balouchzahi, Grigori Sidorov and Alexander Gelbukh, "CIC@PAN: Simplifying Irony Profiling using Twitter Data". CLEF 2022, September 2021, Bologna, Italy. (Scopus Indexed). URL: link
Balouchzahi, Fazlourrahman, Sidorov, Grigori, and Shashirekha, Hosahalli Lakshmaiah. ‘Fake News Spreaders Profiling Using N-grams of Various Types and SHAP-based Feature Selection’. 8th International Symposium on Language & Knowledge Engineering, Journal of Intelligent and Fuzzy Systems.1 Jan. 2022: 4437 – 4448. (ISSN: 1064-1246) (Scopus Indexed, Q4). URL: link
Fazlourrahman Balouchzahi, Anusha Gowda, Hosahalli Shashirekha, Grigori Sidorov, "MUCIC@TamilNLP-ACL2022: Abusive Comment Detection in Tamil Language using 1D Conv-LSTM". Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages. ACL. URL: link
Fazlourrahman Balouchzahi, Sabur Butt, Grigori Sidorov, Alexander Gelbukh. "CIC@LT-EDI-ACL2022: Are transformers the only hope? Hope speech detection for Spanish and English comments". Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion. ACL. URL: link
Anusha Gowda, Fazlourrahman Balouchzahi, Hosahalli Shashirekha, Grigori Sidorov. "MUCIC@LT-EDI-ACL2022: Hope Speech Detection using Data Re-Sampling and 1D Conv-LSTM". Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion. ACL. URL: link
2021
F. Balouchzahi, B.K. Aparna, and H.L. Shashirekha, "CoFFiTT-Covid-19 Fake News Detection using Fine-Tuned Transfer Learning Approaches", Springer Book Series, "Lecture Notes on Data Engineering and Communications Technologies", in 2nd Congress on Intelligent Systems CIS 2021, India. (Scopus Indexed). URL: link
F. Balouchzahi, O. Vitman, H. L. Shashirekha, G. Sidorov, A. Gelbukh, "Acronym Identification using Transformers and Flair Framework, in The AAAI-22 Workshop on Scientific Document Understanding". (Scopus Indexed). URL: link
F. Balouchzahi, G. Sidorov and H. L. Shashirekha, "Arabic Misogyny Identification", Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceedings (Scopus Indexed). URL: link
F. Balouchzahi, H. L. Shashirekha, and G. Sidorov, "MUCIC@Dravidian-CodeMix-FIRE2021:CoSaD- Code-Mixed Sentiments Analysis for Dravidian Languages", Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceedings (Scopus Indexed). URL: link
F. Balouchzahi, S. Bashang, G. Sidorov, and H. L. Shashirekha, "MUCIC@Dravidian-CodeMix-HASOC2021: CoMaTa OLI- Code-mixed Malayalam and Tamil Offensive Language Identification", Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceedings (Scopus Indexed). URL: link
F. Balouchzahi, H. L. Shashirekha, and G. Sidorov. "MUCIC at UrduFake 2021-Ensembled Feature Selection for Urdu Fake News Detection", Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceedings (Scopus Indexed). URL: link
F. Balouchzahi, O. Vitman, H. L. Shashirekha, G. Sidorov, A. Gelbukh, "MUCIC at ComMA@ICON: Multilingual Gender Biased and Communal Language Identification using n-grams and Multilingual Sentence Encoders". In ICON 2021: 18th International Conference on Natural Language Processing. ACL. URL: link
F Balouchzahi, HL Shashirekha, G Sidorov, “MUCIC at CheckThat! 2021: FaDo-Fake News Detection and Domain Identification using Transformers Ensembling”, In Conference and Labs of the Evaluation Forum (CLEF) 2021, CEUR Workshop Proceedings (Scopus Indexed), URL: link
F Balouchzahi, HL Shashirekha, G Sidorov, “HSSD: Hate Speech Spreader Detection using N-grams and Voting Classifier”, In Conference and Labs of the Evaluation Forum (CLEF) 2021, CEUR Workshop Proceedings (Scopus Indexed), URL: link
F Balouchzahi, G Sidorov, HL Shashirekha, “ADOP FERT-Automatic Detection of Occupations and Profession in Medical Texts using Flair and BERT”, In IberLEF 2021, September 2021, Malaga, Spain 2021, CEUR Workshop Proceedings (Scopus Indexed), URL: link
F Balouchzahi, HL Shashirekha. “LA-SACo: A Study of Learning Approaches for Sentiments Analysis in Code-Mixing Texts”. 16th conference of the European Chapter of the Association for Computational Linguistics (EACL) 2021, Ukraine, URL: link
F Balouchzahi, BK Aparna, H L Shashirekha “MUCS@DravidianLangTech-EACL2021: COOLI-Code-Mixing Offensive Language Identification”. The First Workshop on Speech and Language Technologies for Dravidian Languages. 16th conference of the European Chapter of the Association for Computational Linguistics (EACL) 2021, Ukraine. URL:link
F Balouchzahi, BK Aparna, HL Shashirekha “MUCS@LT-EDI-EACL2021:CoHope-Hope Speech Detection for Equality, Diversity, and Inclusion in Code-Mixed Texts”. First Workshop on Language Technology for Equality, Diversity, Inclusion (Lt-Edi-2021). 16th conference of the European Chapter of the Association for Computational Linguistics (EACL) 2021, Ukraine. URL: link
2020
HL Shashirekha, F Balouchzahi. “ULMFiT for Twitter Fake News Spreader Profiling”, In Conference and Labs of the Evaluation Forum (CLEF) 2020, CEUR Workshop Proceedings (Scopus Indexed), URL:link
F Balouchzahi, HL Shashirekha. “MUCS@Dravidian-CodeMix-FIRE2020:SACO-Sentiments Analysis for CodeMix Text”, Forum for Information Retrieval Evaluation (FIRE) 2020, CEUR Workshop Proceedings (Scopus Indexed), URL:link
F Balouchzahi, HL Shashirekha. “An Approach for Event Detection from News in Indian Languages using Linear SVC-EDNIL”, Forum for Information Retrieval Evaluation (FIRE) 2020, CEUR Workshop Proceedings (Scopus Indexed), URL:link
F Balouchzahi, HL Shashirekha. “Learning models for Urdu Fake News Detection”, Forum for Information Retrieval Evaluation (FIRE) 2020, CEUR Workshop Proceedings (Scopus Indexed), URL:link
F Balouchzahi, HL Shashirekha. “Learning Approaches for Hate Speech and Offensive Content Identification in Indo-European Languages”, Forum for Information Retrieval Evaluation (FIRE) 2020, CEUR Workshop Proceedings (Scopus Indexed), URL:link
F Balouchzahi, HL Shashirekha. “PUNER-Parsi ULMFiT Named-Entity Recognition in Persian Texts”. Advances in Intelligent Systems and Computing, Congress on Intelligent Systems (CIS) 2020, In Intelligent Systems and Computing (AISC) pp. 75-88, Springer book series(Scopus Indexed) URL: link
F Balouchzahi, MD Anusha, HL Shashirekha. “MUCS@TechDOfication using FineTuned Vectors and N-grams”, ICON 2020: 17th International Conference on Natural Language Processing, ACL. India. URL: link