Amir Soleimani
Ph.D Candidate (2018-2024):
Language Technology Lab (LTL)
University of Amsterdam (UvA)
Netherlands
Email: A.Soleimani.B {at} gmail.com,
A.Soleimani {at} uva.nl
Personal Page: https://asoleimanib.github.io/
[Google Scholar] [LinkedIn] [GitHub][Medium]
I am passionate about AI because it aids humans in processing complex and large amounts of data. I am an NLP researcher received academic and self-training to continuously learn new knowledge and skills. I like facing new challenges and using my experience to find creative and effective solutions. I am highly skilled in research and problem-solving, from defining problems to collecting data, employing existing and new methods, and proposing innovative solutions. I am capable of working independently while also being committed to collaborating and contributing effectively as a team member.
I am a PhD candidate at Language Technology Lab (LTL). I worked on the application of Language Models with Christof Monz and Marcel Worring. My research interests are Language Representation Models, Document Understanding, Document Summarization, Question Answering, Question Generation, Factuality Evaluation, and Claim Verification.
Highlights:
I defended my PhD on 3 April, 2024. You can find my PhD thesis here
Our long paper "NonFactS: Nonfactual Summary Generation for Factuality Evaluation in Document Summarization" has been accepted at #ACL2023 Findings. [github]
I finished my internship at Naver Labs Europe (Jun-Nov 2021). I worked with my supervisors: Vassilina Nikoulina, Benoit Favre and Salah Aït-Mokhtar. We worked on summarizing scientific papers given users desired queries or perspectives when there is limited labelled data or no data available.
Our paper "Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training" has been accepted at BioNLP2022 [link]!
I started writing simplified blog posts at asoleimanib.medium.com.
NLQuAD: A Non-Factoid Long Question Answering Data Set
We introduced NLQuAD, a non-factoid long question answering dataset from BBC news articles. NLQuAD’s question types and the long length of its context documents as well as answers, make it a challenging real-world task. NLQuAD consists of news articles as context documents, interrogative sub-headings in the articles as questions, and body paragraphs corresponding to the sub-headings as contiguous answers to the questions. NLQuAD contains 31k non-factoid questions and long answers collected from 13k BBC news articles. Check github for the dataset and codes!
Longformer for MS MARCO Document Re-ranking Task
Cooperation with Ivan Sekulić and Mohammad Aliannejadi to check the performance of Longformer on the new MS MARCO document retrieval dataset
BERT for Evidence Retrieval and Claim Verification
We showed how BERT can be used in a two-step verification system. It firstly retrieves evidence sentence and then verifies a claim against top retrieved evidence. We also showed the good effect of Hard Negative Mining in training the BERT model.
Publications:
Advances in information verification using natural language processing, PhD Thesis, 2024 [UvA Dare]
Amir Soleimani, et al. "Nonfactual Summary Generation for Factuality Evaluation in Document Summarization", ACL, 2023, [github]
Amir Soleimani, Vassilina Nikoulina, Benoit Favre, Salah Aït-Mokhtar. "Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training", BioNLP, 2022, [github]
Amir Soleimani, Christof Monz, Marcel Worring. "NLQuAD: A Non-Factoid Long Question Answering Data Set", EACL , 2021, [github]
I Sekuli ́c, A Soleimani, M Aliannejadi, F Crestani. ”Longformer for MS MARCO Document Re-ranking Task”, arXiv, 2020, [github]
Amir Soleimani, Christof Monz, Marcel Worring. "BERT for Evidence Retrieval and Claim Verification", European Conference on Information Retrieval (ECIR), 2020, [github]
Amir Soleimani, Nasser M. Nasrabadi, Elias Griffith, Jason Ralph, Simon Maskell, "Convolutional Neural Networks for Aerial Vehicle Detection and Recognition", IEEE National Aerospace and Electronics Conference (NAECON), 2018
Amir Soleimani, Nasser M. Nasrabadi, "Convolutional Neural Networks for Aerial Multi-Label Pedestrian Detection", 21st International Conference on Information Fusion (FUSION), 2018
Amir Soleimani, Babak N Araabi, Kazim Fouladi, "Deep Multitask Metric Learning for Offline Signature Verification", Pattern Recognition Letters, 2016. [github]
Amir Soleimani, Kazim Fouladi, Babak N Araabi, "UTSig: A Persian Offline Signature Dataset", IET Biometrics, 2016, [dataset, dataset]
Amir Soleimani, Kazim Fouladi, Babak N Araabi,"Persian Offline Signature Verification based on Curvature and Gradient Histograms", 6th International Conference on Computer and Knowledge Engineering (ICCKE), 2016
Teaching & Supervision:
Student Supervision:
Generalization In Pipeline and Joint Relation Extraction, Paras Dahal, 2022
Few-shot Language Inference in the Presence of a Label Deficiency, Jari Jansen, 2022
Zero-shot Natural Language Inference in the Presence of Domain Shift, Iris Lau, 2022
Long document text classification: User needs news using BERT and BigBird, Dat Nguyen, 2021
Debiasing Semantic Textual Similarities Datasets, Jop Keuning, 2020
Applying Unsupervised Learning on Hospital Audit Logs for Anomaly Detection, Alex Witkamp, 2020
Predicting housing fraud using structured and unstructured data, Tom Ruiter, 2019
Analysis of Semantic Textual Classification Errors by Neural Sentence Embedding Model, Kjeld Oostra, 2019
TA:
Deep Learning for NLP, 2022
Deep Learning for NLP, 2020
Deep Learning for NLP, 2019