Abstracts

AI for Digital Humanities and Computational Social Sciences: An Overview

Alexandre Gefen


Artificial intelligence designates, in a broad sense, the intelligence of machines and, in a narrower sense, machine learning (machine learning, deep learning), in other words, the software capacity, in particular by the method of neural networks, to make, after having been more or less trained, inferences, identifications and classifications in a more efficient and rapid manner than humans. AI is at the crossroads of computer science and mathematics, but also of cognitive sciences. Its most striking applications have been in the field of image recognition, but they are now extending to the processing of all massively available data, such as the automatic processing of natural language, the identification of people and objects, the recognition of speech or printed characters and handwriting.

In the humanities and social sciences, it appears that the emerging applications of machine learning concern economics, sociology, geography or history in all the tasks of locating and classifying images, texts and data, as well as opening up in certain cases to the exploration of predictive models in sociology and economics.

The complexity and the number of calculations produced by AI make it impossible to reduce the choice of the computer to a simple causal chain, hence a problematic black box effect whose ethical and epistemological stakes are major.Beyond the transformations in the methodologies of the disciplines it induces, AI is a question of epistemology linked to the meeting of cognitive sciences and mathematics because it has major consequences on the representation of knowledge and on reasoning. The question of artificial intelligence is to be problematized with that of human intelligence.

Is Python Going Extinct? A Digital Humanities Overview

Brian Kokensparger

Python is arguably the major programming language for serious Digital Humanities (DH) research. There are new programming languages that work well for DH research, first R and (more recently) Julia, among others. With the rise of special-purpose programming languages that promise better functionality out of the box and faster performance, does that mean that Python is going extinct? This talk reviews Python's strengths and weaknesses as a programming language within the context of DH research, with additional consideration toward its advantages for teaching future DH programmers. What is the incentive to stay with Python as the primary programming language for DH? What is the incentive to abandon it and go to an emerging language? For institutions beginning to implement DH programs, which is the best way to begin? This review talk will attempt to lay out the evidence that Python is not only alive and well, but still the best programming language for DH research and education.

Processing Spanish Golden Age theatre with Python: Data structures for versified plays

Fernando Sanz-Lázaro


The project **** has developed algorithms and data structures using Python 3 to allow distant reading of Spanish Golden Age plays. We depart from plain text files, which have previously been structured in a relatively straightforward fashion. Each line represents an entity, be it metadata or a character, a speech, or a stage direction. Fine distinctions are marked with tabulators and a reduced set of tags. These texts are processed line by line to obtain the information describing any speech or stage direction. The speech lines and their data are stored as a Pandas data frame.

We have implemented the library libscansion that provides the class scansion to process a verse. It takes the verse itself as a string and a list of integers with expected numbers of metric syllables (NoS) sorted according to their probability. The rationale is that same-length verses tend to be grouped. This class has the speech, the rhythmic pattern, the rhyme, and the assonance as string attributes; the NoS and the position of the rhyme stress as integers; and the metric syllables as a list of strings and expected NoS as another of integers.

Scansion includes methods to translate each plain word into a tuple and represent the verse as a list of tuples. Each tuple has two elements: a list of phonological syllables and a PoS-tag (part of the speech). PoS-based rules and a dictionary determine the presence of metric stress in each word to mark tonic syllables. The list is flatted down as a list of syllables and reevaluated according to the expected NoS. Syllables are separated or joined according to the poetic rules of metre adjustment to meet the first element of the list of expected NoS. If it is impossible, the algorithm tries the following values until it succeeds, promoting the match to the first position. Once obtained a suitable syllabic distribution, the attributes are assigned values accordingly.

We iterate over the data frame creating an object for each verse, passing as parameters the speech—or joined speeches for shared verses—and a sorted list initialised with typical metres the first time. The relevant attributes of the object are added to the data frame, and the updated list of expected NoS is used to create the object again with the new verse. The resulting structure is stored as a CSV to be used in distant reading analyses.

REFERENCES

Bird, S., Klein, E. & Loper, E. (2009). "Natural Language Processing with Python". O’Reilly Media.

Gervás, P. (2000). "A logic programming application for the analysis of spanish verse". Computational Logic - CL 2000, First International Conference, 1330–1344. https://doi.org/10.1007/3-540-44957-4_89

Kroll, S. (2019). "Filología digital para el estudio de la cultura y literatura del siglo de oro (2014-2017)". Etiópicas. Revista de letras renacentistas, (15), 1–21. https://aiso-asociacion.org/wp-content/uploads/2019/12/aiso_1.pdf

Navarro Tomás, T. (1991). "Métrica española". Labor.

Navarro-Colorado, B. (2017). "A metrical scansion system for fixed-metre spanish poetry". Digital Scholarship in the Humanities, 33(1), 112–127. https://doi.org/10.1093/llc/fqx009

Qi, P., Zhang, Y., Zhang, Y., Bolton, J. & Manning, C. D. (2020). "Stanza: A Python natural language processing toolkit for many human languages". Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. https://nlp.stanford.edu/pubs/qi2020stanza.pdf

Quilis, A. (2013). "Métrica española" (17th ed.). Ariel.

Quilis, A. (2019). "Tratado de fonología y fonética españolas". Gredos.

Torre, E. (2014). "Acentos contiguos en el verso español? Rhythmica". Revista Española de Métrica Comparada, (12). https://doi.org/10.5944/rhythmica.14248

van Rossum, G. (1995). "Python tutorial" (tech. rep. CS-R9526). Centrum voor Wiskunde en Informatica (CWI). Amsterdam.

Using R in the Digital Humanities. A Philologist at the Keyboard

Jose Manuel Fradejas Rueda


It is too complicated to define what Digital Humanities are, since it depends on the field of humanities in which you work. Mine is philology (another difficult term to define) and most of the time I work with texts, with large amounts of text. Unless mother nature has endowed you with a great capacity for memory, it is impossible for any human being to handle large amounts of text and make any sense of them. Computers can handle large amounts of text and help you extract sense, meaning, and patterns from them. There are many superb applications for working with textual data. However, ready-to-wear applications can only do what they were designed to do, and they can certainly do a lot of things.

However, you will run into problems as soon as you want to go beyond the solutions that those apps offer. If you want to think out of the box and to be successful, you must use a programming language. I use R to handle and analyze texts, mostly Old Spanish texts, which implies an added difficulty since one of the greatest problems we face in the Digital Humanities realm is that it is a practically English only field, not only because the programs are designed for English speakers, but papers published by the finest journals are English only. There is little or no interest at all in opening it to other languages.

I also use R for many other tasks which, if done by hand, could take a lot longer than I think is reasonable. In this lecture I will show how I knew what R was, how I learned it and how, and why I use it on almost daily in my research.

iHe GAME APP: Digital Humanities and Gamefication in the Evaluation of Teaching in times of COVID19, Brazil

Janaina Cardoso de Mello (author), Julia Beatriz Silva Vicente Chaves , Pedro Henrique Ribeiro Fernandes , Luan Felipe Silva dos Santos , Beatriz França Alves


The Covid 19 pandemic by closing schools in the world has further impacted countries where educational methods are paper-mediated (Gérard, 2001). In Brazil, the difficult access to the Internet and technological equipment, especially in peripheral areas, has allied itself to the out-of-date in Basic Education on the use and creation of software for teaching. The absence of literacy in digital culture and computational thinking in teaching has strained schools. In the evaluation procedures, there was difficulty in measuring quantitatively and qualitatively the learning in Emergency Remote Education. Many teachers have transferred the traditional test to digital support using Word or Electronic Forms. Seeking to solve this problem, iLearning was used inspired by iDu (iLearning Education) and iMe (iLearning Media), in the creation of the iHe Game (iHistoria Evaluation Game) app to evaluate learning levels in a dynamic, fun, and interactive game articulated to History Teaching. The iHe Game app is a free digital trial tool, created on a free software platform for smartphones with the Android operating system, in Java and C++ programming language. A similar version was developed in the Scratch programming language for Windows. The choice of smartphones occurred because it was the only access of most Brazilians during the social isolation of 2020. The premises of the Digital Humanities in the gamification of education developed in Indonesia (Rahardja et al., 2014 and Aini; Rahardja (Khoirunisa, 2020) and Finland (Majuri; Koivistoa; Hamari, 2018) conferred the theoretical and practical north of this work. The research is characterized as quali-quantitative. Were made bibliographic surveys, documents, and application of electronic questionnaire with 10 closed questions, sent to 50 teachers, who tested the prototype of the app iHe. The responses were weighted according to Bardin's content analysis (2011). Graphs, tables, and diagrams were elaborated that allowed the correction of errors and the improvement of the application. The results of the application of the iHe app considered: a) perceptions of use and experience; b) engagement and immersion, perception of learning and additional benefits (skills and competencies); c) levels of effort, degree of difficulty, stress, overcoming obstacles; d) perception of satisfaction; e) social interaction (competitiveness, reflection, sharing, cooperation, social influence); f) motivation, initiative, autonomy, familiarity, identification. Contributions attest to the creative use of a digital learning measurement tool, where students when playing can repeat the process, improve performance, and re-signify gaps, information, and behaviors.

Digital humanities and research on Early Modern English scientific writing

Laura Esteban-Segura


In this paper I aim to present and describe a project in the digital humanities which is concerned with the editing of Early Modern English scientific manuscripts, more specifically medical ones, and with the compilation of a corpus that can be useful as a tool to investigate language from such period. The project under consideration is entitled The Malaga Corpus of Early Modern English Scientific Prose and is based at the University of Málaga (Spain). The digital editions and the corpus include manuscripts from the Hunterian Collection (Glasgow University Library), the Wellcome Collection (London Wellcome Library) and the Rylands Collection (University of Manchester Library). With regard to text types, these manuscripts hold specialized texts, surgical and anatomical treatises, as well as recipe collections and materia medica.

The importance of rhythm: Python for statistical metrics

Simon Kroll


The project Sound and Meaning is currently elaborating a large, annotated database about the Spanish Golden Age theatre. The main aim of this project is to determine which relations can be established between metrics, rhythms, and the semantics of the play. Are there rhythms that seem to suit better a comedy rather than a tragedy? Do the different playwrights have rhythmical fingerprints, similar to the authorial marks that can be detected by stylo (R) counting the most frequent words (MFW)?

This contribution will offer a first insight into the structure of the database and the programs to present and display the data we are currently collecting. Using Python and pandas we will demonstrate the new results this project is generating. Since the method of the MFW in packages like stylo (R) is well established and accepted in the community, we will show that the statistical analysis of the most frequent rhythms can generate similar results. Rhythm is therefore a distinguishable characteristic of the different authors of the seventeenth century. Furthermore, we will show differences between the literary genres of the comedia nueva and their rhythms. This contribution will therefore present new programs for the metrical analysis and first results of the statistic evaluation of the data.