PHC Project
University Sorbonne Paris Nord and Coburg University
The recent proliferation of Large Language Models (LLMs) has deeply affected the Natural Language Processing domain. These models have demonstrated impressive capabilities for the analysis of text, generation of text, and even reasoning, to some extent. Some researchers have begun to question the capabilities of these models in the context of geospatial data. Since these models are trained on billions of documents that include geographical information, a legitimate question is: how well the LLMs can model the position of geographical entities and their relationships? Recent works show that LLMs have some limited capabilities for predicting the location of cities, depending on the prompt used, and spatial reasoning. These results are based mostly on higher-order co-occurrence statistics. Unfortunately, LLMs are also prone to so-called “neural hallucinations”, a slightly misleading term which means that they can generate wrong information, because the generation process is a partially randomized statistical process of predicting the next word, which is not informed by a separate knowledge representation layer. In this sub-area of using deep neural networks for modeling the language of (people talking about) geography, the following Research Questions, are pressing:
RQ1. How much geographic knowledge is stored in current large, pretrained language models?
RQ2. What do LLMs 'know' about distances and topology?
RQ3. What techniques can be developed to query them in order to answer these questions?
RQ4. What geographic biases are present in these models, and how do they differ when the models are queried in different languages (EN versus FR/DE discrepancies)?
RQ5 . How can the models' geographic knowledge be improved by modifications in pre-training or further Fine-tuning?
RQ6. What are the most effective methods to utilize LLMs for toponym resolution, the task of mapping place names to spatial footprints?
RQ7. To what extent is LLM's ability to resolve toponyms dependent on their degree of ambiguity?
RQ8. What is the new state of the art in "text meets GIS" in the age of ChatGPT?"
Paris, 9-11 September 2024
Invited speaker: Ludovic Moncla, INSA Lyon
Coburg, early 2025
Avec le soutien de l'Ambassade de France en Allemagne