Sergio Jimenez

I'm Ph.D. in Engineering- Systems and Computing at the National University of Colombia, Bogota in the Department of Systems and Industrial Engineering.

Currently, I'm affiliated to the Corpus Linguistics research group in the Instituto Caro y Cuervo, Bogotá D.C., Colombia.

My advisors were Professor Alexander Gelbukh from the CIC-IPN and Professor Fabio Gonzalez from the National University of Colombia - MindLab

My main interests are:

My main interests are:

  • Computational Linguistics
  • Semantic networks (WordNet) for latin american Spanish
  • Similarity functions for text
  • Natural Language Processing
  • Educational applications of NLP
  • Information Retrieval
  • Recommender Systems

Contact:

Teaching

II- 2016, Diplomado en Análisis Computacional del Lenguaje at Instituto Caro y Cuervo

Summer 2016, Natural Language Processing and Text Mining at Universidad Nacional de Colombia

II- 2015, Diplomado en Análisis Computacional del Lenguaje at Instituto Caro y Cuervo

PhD Thesis: Text Comparison using Soft Cardinality. Universidad Nacional de Colombia, Bogotá D.C. 2015. Summa cum laude honor (laureada).

Advised Thesis

2016, George Dueñas, Predicción automática de la dificultad de preguntas abiertas de respuesta corta con propósitos educativos, Universidad Nacional de Colombia, Bogotá D.C., master's thesis. Summa cum laude honor (laureada).

Data

A dataset for children age prediction from short narrative texts. This dataset contains 2946 short stories written by children between 5 to 16 years old, each one labeled with the author's age. Texts are written in English (1800), French (662) and Spanish (484). Please download the data from GitHub.

Publications

2018

Sergio Jimenez, George Dueñas, Silviu-Petru Cucerzan, Fabio A. Gonzalez, Alexander Gelbukh, George Dueñas. BM25-CTF: Improving TF and IDF factors in BM25 by using collection term frequencies Journal of Intelligent & Fuzzy Systems, vol 34, no. 5, pp. 2887-2899, 2018 . http://dx.doi.org/10.3233/JIFS-169475 . (Journal ranked 438th in Computer Science field according to Guide2Research)

POST-PRINT:

bm25-ctf-improving.pdf

2017

Sergio Jimenez, George Dueñas. G-WordNet: Moving WordNet 3.0 and Its Resources to a Graph Database. In: Solano A., Ordoñez H. (eds) Advances in Computing. CCC2017, Communications in Computer and Information Science Vol. 735, pp 100-114, 2017

Sergio Jimenez, George Dueñas, Lorena Gaitan, Jorge Segura. RUFINO at SemEval-2017 Task 2: Cross-lingual lexical similarity by extending PMI and word embeddings systems with a Swadesh's-like list. Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval 2017, Vancouver, Canada.

2016

Sergio Jimenez, Fabio A. Gonzalez, Alexander Gelbukh. Mathematical properties of soft cardinality: Enhancing Jaccard, Dice and cosine similarity measures with element-wise distance. Information Sciences 367-368 (2016) pp. 373-389, 2016. (Journal ranked 43th in Computer Science field according to Guide2Research)

2015

George Dueñas, Sergio Jimenez, Julia Baquero. Automatic prediction of item difficulty for short-answer questions. 10CCCC 10th Colombian Congress in Computation, Bogotá, Colombia, 2015

Sergio Mancera, Sergio Jimenez, Fabio A. Gonzalez. ZETEMA: A Web Service for Automatic Short-Answer Questions Grading. 10CCCC 10th Colombian Congress in Computation, Bogotá, Colombia, 2015

Claudia Becerra, Sergio Jimenez, Fabio A. Gonzalez, Alexander Gelbukh. Recomendación de productos a partir de perfiles de usuario interpretables (Products recommendacion based on interpretable user profiles). Tecnura, volume 15, number 45, p. 89-100, 2015.

Sergio Jimenez, Fabio A. Gonzalez, Alexander Gelbukh. Soft Cardinality in Semantic Text Processing: Experience of the SemEval International Competitions. Polibits, volume 51, p. 63-72, 2015.

2014

Sergio Jimenez, George Dueñas, Julia Baquero, Alexander Gelbukh. UNAL-NLP: Combining Soft Cardinality Features for Semantic Textual Similarity, Relatedness and Entailment. SemEval 2014, In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014), Aug. 2014, Dublin, Ireland. 3th among 18 participating systems in Task-1 textual entailment and 4th in relatedness (see "UNAL-NLP-run1"); 3th among 38 participating systems in Task-3 Cross-Level Semantic Similarity (see "UNAL-NLP");and 9th among 38 participating systems in Task-1 STS English and 3th among 22 in Spanish (see "UNAL-NLP")

Alejandro Riveros, Maria De-Arteaga, Fabio A. Gonzalez, Sergio Jimenez, Henning Müller. MindLab-UNAL: Comparing Metamap and T-mapper for Medical Concept Extraction in SemEval 2014 Task 7. SemEval 2014, In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014), Aug. 2014, Dublin, Ireland.

Emilio Silva-Schlenker, Sergio Jimenez, Julia Baquero. UNAL-NLP: Cross-Lingual Phrase Sense Disambiguation with Syntactic Dependency Trees. SemEval 2014, In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014), Aug. 2014, Dublin, Ireland.

André Lynum, Partha Parkray, Björn Gambäck, Sergio Jimenez. NTNU: Measuring Semantic Similarity with Sublexical Feature Representations and Soft Cardinality. SemEval 2014, In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014), Aug. 2014, Dublin, Ireland. 3th among 38 participating systems in STS-Task-10 English (see "NTU-run3")

(preprint) Nelly Moreno, Sergio Jimenez, Julia Baquero. Automatically Assessing Children’s Writing Skills Based on Age-Supervised Datasets. CICLING'14, Apr. 2014 Kathmandu, Nepal. Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science Volume 8404, 2014, pp 566-577 (The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-54903-8_47).

2013

Claudia Becerra, Sergio Jimenez, Alexander Gelbukh. Towards User Profile-based Interfaces for Exploration of Large Collections of Items. Decisions@Recsys'13, Workshop on Human Decision Making in Recommender Systems in The ACM Conference on Recommender Systems, Oct. 2013, Hong Kong.

Maria De-Arteaga, Sergio Jimenez, George Dueñas, Sergio Mancera, Julia Baquero. Author Profiling Using Corpus Statistics, Lexicons and Stylistic Features. Notebook for PAN at CLEF-2013. CLEF 2013, PAN 2013 (Online Working Notes/Labs/Workshop), Sep 2013, Valencia, Spain. 6th system among 17 in Spanish author profiling task (see "jimenez13")

Sergio Jimenez, Claudia Becerra, Alexander Gelbukh. SOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Jun 2013, Atlanta, Georgia, USA. Top system for unseen questions and domains in SciEntsBank dataset (see "SoftCardinality")

Sergio Jimenez, Claudia Becerra, Alexander Gelbukh. UNAL: Discriminating between Literal and Figurative Phrasal Usage Using Distributional Statistics and POS tags. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Jun 2013, Atlanta, Georgia, USA. Participating system in Task 5: Evaluating Phrasal Semantics (see "UNAL")

Sergio Jimenez, Claudia Becerra, AlexanderGelbukh. SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment from Cardinalities and SMT. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Jun 2013, Atlanta, Georgia, USA. Top system for Spanish-English and Italian-English datasets (see "SoftCard")

Sergio Jimenez, Claudia Becerra, AlexanderGelbukh. SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for Semantic Textual Similarity. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: SemanticTextual Similarity, Jun 2013, Atlanta, Georgia, USA. 18th system among 90 (see "SOFTCARDINALITY-run2")

2012

Sergio Jimenez, Alexander Gelbukh. Baselines for Natural Language Processing Tasks Based on Soft Cardinality Spectra. Applied and Computational Mathematics Journal, Volume 11, 2, Special Issue on Applied Artificial Intelligence and Soft Computing, 2012. JCR impact factor 2012: 0.75

Sergio Jimenez, Claudia Becerra, Alexander Gelbukh. Soft Cardinality + ML: Learning Adaptive Similarity Functions for Cross-lingual Textual Entailment. *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012). Jun 2012, Montreal, Canada. Top system in Italian-English and French-English datasets (see "SoftCard")

Sergio Jimenez, Claudia Becerra, Alexander Gelbukh. Soft Cardinality: A Parameterized Similarity Function for Text Comparison. *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012). Jun 2012, Montreal, Canada. Third system among 89 (see "sgjimenezv/task6-SOFT-CARDINALITY")

Nelly Moreno, Julia Baquero, Sergio Jimenez. Un enfoque basado en corpus para caracterizar la evolución de la competencia narrativa escrita en niños hispanohablantes. Workshop de Procesamiento Automatizado de Textos y Corpus. WOPATEC 2012, Viña del Mar, Chile, 2012.

2011

Sergio Jimenez, Alexander Gelbukh. SC Spectra: A Linear-Time Soft Cardinality Approximation for Text Comparison. Advances in Soft Computing, Lecture Notes in Computer Science Volume 7095, 2011, pp 213-224. BEST PAPER at the 10th Mexican International Conference on ArtificialIntelligence, MICAI'11, Puebla, Mexico

Sergio Jimenez, Fabio A. Gonzalez, Alexander Gelbukh. Text Comparison Using Soft Cardinality. String Processing and Information Retrieval, Lecture Notes in ComputerScience Volume 6393, 2010, pp 297-302. Presented at SPIRE'10, Los Cabos, Mexico, Oct. 2010

2009

Sergio Jimenez, Claudia Becerra, Alexander Gelbukh, Fabio A. Gonzalez. Generalized Monge-Elkan Method for Approximate Text String Comparison. Computational Linguistics and Intelligent Text Processing, Lecture Notesin Computer Science Volume 5449, 2009, pp 559-570. Presented at CICLING'09, Mexico D.F., Mexico, Mar. 2009

Co-advised Undergraduate Thesis

Maria De-Arteaga, 2013. Modelos vectoriales en minería de textos y sus aplicaciones en la elaboración del perfil de autores (in Spanish). Vector models in text mining and its applications to author pro.ling tasks. Math Department, Universidad Nacional de Colombia, Bogota

Nelly Esperanza Moreno Córdoba, 2011. UN ENFOQUE BASADO EN CORPUS PARA CARACTERIZAR LA EVOLUCIÓN DE LA COMPETENCIA NARRATIVA ESCRITA EN NIÑOS HISPANOHABLANTES (in Spanish). Linguistics Department, Universidad Nacional de Colombia, Bogota. Second place in the Otto de Greiff Award (Social Sciences) 2012