Below, a thematic distribution of keywords clustered by their co-occurrence frequency. When two keywords occur together within the same document, they increase their frecuency, being clusterized then using the equivalence index (Callon et al., 1991). The thematicEvolution() function of bibliometrix package performs this measure, locating the clusters in a Cartesian coordinate system, generating four different spaces according to the Callon’s centrality, which gives the degree of interaction of the cluster with other clusters in the x axis, and the Callon’s density, which provides the internal strength of the cluster in the y axis (Cobo et al., 2011). These four spaces are:
Basic themes: high centrality and low density;
Motor themes: high centrality and high density;
Niche themes: low centrality and high density;
Emerging or Declining themes: low centrality and low density;
Humanities facing Digitization
As it can be seen, the reflexion of the first period (based in Picture 1) of the digital humanities academic production, both digital humanities and humanities are representing a cluster. They are accompanied, in the middle of the basic themes area, by digitization and metadata, treating problems such as archives, digital collections or annotation. It also highligths the presence of the keyword ontology, characterizing a cluster with more terms related with data and digitization: linked data, semantic web, tei, digital library, and more. Even, the same kind of worries are present within the most developed and dense clusters of the period: text mining, database and information retrieval, which are more technical oriented. Twitter appear as clearly niche theme.
Picture 5. Cluster analysis for Author's Keywords using "digital humanities" in Scopus database between 1999 and 2014. The plot was elaborated through the thematicEvolutionMap() function of the bibliometrix package (Aria & Cucurullo, 2021) and deployed with plotly (Sievert et al., 2021). The clusterization is applied over the 300 most frequent keywords with a minimum frequency of cooccurrence of 6/1000.
Visualization, Text mining and Distant reading
Within the second period the cluster digital humanities preserves the highest centrality of the database accompanied by keywords such as digital libraries and digital history, consolidating a strong interconnection between the three terms. The keyword visualization generates its own cluster, capturing new trends that appear with high centrality, such as machine learning. The questions about text mining decrease their density, although they gain in centrality appearing together a new keyword: distant reading. The niche themes of the period are network analysis and again the methodologies based on social media and twitter.
Picture 6. Cluster analysis for Author's Keywords using "digital humanities" in Scopus database between 2015 and 2018. The plot was elaborated through the thematicEvolutionMap() function of the bibliometrix package (Aria & Cucurullo, 2021) and deployed with plotly (Sievert et al., 2021). The clusterization is applied over the 300 most frequent keywords with a minimum frequency of cooccurrence of 6/1000.
An example of this period is the article Visual Text Analysis in Digital Humanities (Jänicke et al., 2017), with 30 citations in Scopus, is the most citated document, between 2014 and 2019, using the keyword distant reading. The authors develop a categorization of text analysis techniques, including network analysis and visualizations, and how they are combine with more classical close reading approaches. It reflects very well the emergency of these kind of digital techniques dialoguing with the humanistic methodologies.
Technology as a source
Finally, within the third period, the last tow years of production, the main cluster labelled with digital humanities absorbs other important approaches, specially those related with text mining, such as distant reading, natural language processing or stylometry, something that shows they have strong connections. With high centrality, the keywords cultural heritage appears separated from digital humanities, together with: semantic web, ontology, deep learning, linked data or computer vision. They explicit all the methods and practices that museums and institutions are developing in order to identify, process and catalogue data, especially visual information. Also with high centrality, digital history is another keyword that usually appears together with digital humanities but now is separated within the last period. It appears with humanities itself, virtual reality and pedagogy.
However, the most impressive cluster of the period is machine learning, posittioned in the most relevant and developed place for the period, accompanied by open science and digital preservation. They reveal the good relation that is kept between digital humanities and the most innovative and quantitative approaches of data science, reinforced by terms such as deep learning and artificial intelligence in the cultural heritage cluster. Another technical cluster is within the niche themes zone, which is the network analysis cluster. It also highlights that the term humanities has disappeared as header keyword for first time in the anlysis, being included under digital history label. The digtal humanities is consolidating as an area where, more and more, techonology can not be differentiated from the proper humanistic content.
Finally, it is worth to point out the rise of a new keyword that had not previously detected: digital methods appears as a cluster with good levels of density near of the niche themes. It can be considered as the relay of the focus on twitter ans social media, now more vinculated with epistemology and the interdisciplinarity.
Picture 7. Cluster analysis for Author's Keywords using "digital humanities" in Scopus database between 2019 and 2021. The plot was elaborated through the thematicEvolutionMap() function of the bibliometrix package (Aria & Cucurullo, 2021) and deployed with plotly (Sievert et al., 2021). The clusterization is applied over the 300 most frequent keywords with a minimum frequency of cooccurrence of 6/1000.
The most citated recent paper using machine learning as keyword is Machine Learning for Cultural Heritage: A Survey (Fiorucci et al., 2020), where the authors expose the trends of the last five years employing machine learning for cultural heritage. It is a paper already with 32 citations, that link precisaly two important keywords identified within the last period of the thematic evolution analysis. The authors also highlight that one of the causes that limits a closer collaboration between enginners and cultural heritage professionals is the lack of large, qualified and public repositories where they can access to data.