Research

Welcome to My Homepage

My main fields of interest are Natural Language Processing/Understanding and Computational Linguistics/Semantics. I am interested in both (logical) rule-based and statistical approaches. I have some publications on the following topics:

Natural Language Lexical-Semantic Preferential Modeling

Natural language sentences that carry multiple-readings have been the subject of interest among the researchers in the domain of computational and logical semantics. The traditional Montagovian framework and the family of its modern extensions have tried to capture this phenomenon with providing some models that enable the automatic generation of logical formulas as the meaning representation. Nonetheless, the question of preference modeling for these interpretations have not intensively been investigated. Two directions has been focused in this regard:

[1] The problem of ranking the valid logical meanings of a given multiple-quantifier sentence by considering the syntactic quantifier order. This study extends existing Incomplete Dependency-based Complexity Profiling [Morrill, 2000] by defining a new metric which is inspired by Hilbert’s epsilon and introducing the notion of the reordering cost.

[2] Semantic Gradiences-- reported in the Generative Lexicon framework [Pustejovsky, 1995] -- are computationally tackled by bridging a framework named Motagovian Generative Lexicon [Retoré, 2014] and crowd-sourced lexical data that is gathered by a serious game JeuxDeMots [Lafourcade, 2007].

Prediction of Missing Semantic Relations in Lexical-Semantic Network

Using Supervised Machine Learning Techniques (Random Forest Framework), this study focuses on the prediction of missing six semantic relations (such as is_a and has_part) between two given nodes in RezoJDM a French lexical-semantic network. The output of this prediction is a set of pairs in which the first entries are semantic relations and the second entries are the probabilities of existence of such relations. This is done by using node2vec approach in the feature extraction phase.

Computational Context Modeling

The context of medical conditions is an important feature to consider when processing clinical narratives. In the domain of clinical in-formation extraction, contextual information typically includes three types of modifiers: negation (whether the target concept exists, does not exist, or is possible/uncertain),experiencer (whether the target concept refers to the current patient or to someone else), and temporality (whether the target concept is currently true, historically true, or hypothetical) other than the patient in English clinical text. A study has been done on French Adaptation of FastContext. This is performed by enriching the French Contextual Rules to 10K+ by using semi-automatically synonym extraction.

Coreference Resolution using Supervised Machine Learning

This study focus on Coreference Resolution Resolver using machine-learning based NLP frameworks for coreference resolution in French language. It is already trained using 25 syntactic/morphological features derived from ANCOR a French Oral Corpus. French-CRS has already pre-trained language models and it is ready to be incorporated for French text. It internaly uses other systems for mention and named entity detections. French-CRS is planned to be enriched by semantic features. This will let it be fitted for other tasks such as nomination detection in social media context.

Computational Psycholinguistic Difficulty Modeling

Capturing linguistic difficulty, has been done by means of Property Grammar [Blache, 1991] which is a powerful formalism for precise evaluation on the basis of quantifiable criteria. It brings together different parameters from linguistic, psycholinguistics and computational domains such as Incomplete Dependency Theory [Gibson, 1998], Dependency Locality Theory [Gibson, 2000] and Activation Theory [Vasishth, 2003].

Our research has applied the same principle, but, instead of Property Grammar we have used Categorial Proof Nets which enables deep semantic representations via Curry-Howard polymorphism. Our contributions for introducing new metrics for quantifying linguistic complexity are three-fold: DLT-based Complexity Profiling, Activation-based Complexity Profiling and finally Satisfaction-Ratio Measurement. These new metrics are applicable for wide range of linguistic phenomena such as non-canonical sentences.

Algorithms for Sentence Completion Task

For non-canonical sentences that have missing categories, two algorithms are designed:

[1] An algorithm that extends Categorial Grammars with unification and dynamic programming with time complexity O(n4).

[2] A constraint-based method using Constraint Handling Rules and Tree Adjoining Operations for missing category detection and correction. This method benefits from its generality and can cover more problems. Some of the properties of rules are supported by the relevant mathematical proofs.