Resources

Current Wikimedia research projects in Australia

Measuring the Gender Gap: Attribute-based Class Completeness Estimation

Led by Gianluca Demartini and Lei Han The University of Queensland, Australia

Implications of ChatGPT for knowledge integrity on Wikipedia

Led by Assoc Prof Heather Ford, University of Technology Sydney (UTS) and colleagues

Wiki Histories Project, UTS

Led by Assoc Prof Heather Ford, University of Technology Sydney (UTS) and colleagues

Current/recent Wikimedia research around the world

Research: Analyzing sources on Wikipedia

Led by Isaac Johnson, Wikimedia Foundation

Research:Language-Agnostic Topic Classification

Led by Isaac Johnson, Wikimedia Foundation

Research:Machine Learning Assisted Wikipedia Editing

Sebastian Riedel, Facebook AI Research and UCL, 2022-January – 2023-December

Our core hypotheses are that a) for language models to be useful in the editing process, humans will need fine-grained control over the behavior of these models and b) language models will need to be able to retrieve relevant information from the web (that is, we need retrieval augmented models). We are working on several work streams that test and develop these hypotheses at the moment.

Other current research of interest

WikiQ 2020-2021

Project on article quality in multiple language WPs based on large-scale reference analysis - see BestRef for an interactive dataset

Wikinformetrics: Construction and description of an open Wikipedia knowledge graph data set for informetric purposes,

2022, Wenceslao Arroyo-Machado , Daniel Torres-Salinas , Rodrigo Costas

Related Projects

On Sources and Misinformation:

Wikidata:WikiProject Source Reliability

WikiProject source reliability, aka the Credibility Ratings + Assessments Project, is an effort to identify and aggregate online sources of assessments of the reliability and credibility of sources. These assessments include estimates of bias, verifiability of claims, level of editorial oversight or peer review, and expertise in specific topics. Assessments may be of individual documents, or aggregate assessments of authors or domains.

Types of assessments include:

Compilations of self-assessments. Examples include the TRANSPOSE project for collating self-reported peer-review practices of journals).
Evaluations by groups created specifically to evaluate source reliability. Examples include Media Bias Fact Check, other fact checking sites, and sites like Ad Fontes Media that produce visuals referenced by others.
Evaluations by communities of practice, as a by-product of their work reviewing or sourcing information to others. Examples include Perennial Sources lists on various language Wikipedias, topical Reliable Sources lists from individual WikiProjects, and newsrooms that publish their internal measures of source reliability.
Compilations of secondary assessments, including the above. Examples include Iffy.news.

WikiCred

Credibility Coalition*

Credibility Coalition is a research community that fosters collaborative approaches to understanding the veracity, quality and credibility of online information. We incubate activities and initiatives that bring together people and institutions from a variety of backgrounds. *Doesn't seem to have been active since 2019.

Misinfocon

Iffy Index of Unreliable Sources

The Iffy Index of Unreliable Sources compiles credibility ratings by Media Bias/Fact Check. Mainly media outlets and websites. Last updated January 2023.

On Citations

Abstract Wikipedia

"The goal of Abstract Wikipedia is to let more people share more knowledge in more languages. Abstract Wikipedia is a conceptual extension of Wikidata.[1] In Abstract Wikipedia, people can create and maintain Wikipedia articles in a language-independent way. A particular language Wikipedia can translate this language-independent article into its language. Code does the translation. Wikifunctions is a new Wikimedia project that allows anyone to create and maintain code. This is useful in many different ways. It provides a catalog of all kinds of functions that anyone can call, write, maintain, and use. It also provides code that translates the language-independent article from Abstract Wikipedia into the language of a Wikipedia. This allows everyone to read the article in their language. Wikifunctions will use knowledge about words and entities from Wikidata. This will get us closer to a world where everyone can share in the sum of all knowledge."

Open Global Data Citation Corpus Make Data count

In 2023, The Wellcome Trust awarded funds to build the Open Global Data Citation Corpus to dramatically transform the data citation landscape. Through this award, DataCite has partnered with Chan Zuckerberg Initiative, EMBL-EBI, and other organizations that scrape and assert data citations.

Wikimedia Research Resources and Links

Wikimedia Research https://research.wikimedia.org/

Pages on Wikipedia and Meta about Wikimedia Research

Research resources (Meta)

Research Projects: The canonical directory of Wikimedia research projects that are planned, underway or have recently been completed.

Research Index: A list of current research projects and research resources

Bibliographies and References

Zotero Library - the reference library for this project. PDFs aren't hosted but most references should have links. Email me if you can't find the article or report

Other lists and resources of interest

Wiki Research Bibliography - a bibliography of research publications.
Wikipedia in academic studies (en) on Wikipedia
Wikipedia research and tools: Review and comments, review by Finn Årup Nielsen[1]
WikiPapers, a wiki research literature compilation (conference papers, journal articles, theses, datasets and tools)
Works about Wikimedia projects known to Wikidata
Mapping Wikipedia (en) - various maps using geocoded Wikipedia pages by floatingsheep

Accessing Wikimedia data

*New From hell to HTML: releasing a Python package to easily work with Wikimedia HTML dumps - Feb 2023

Announcing mwparserfromhtml, a new library that makes it easy to parse the HTML content of Wikipedia articles

Beyond normal reading of pages, access to Wikimedia content by reusers is currently achieved through:

Scraping of web pages

Wikimedia data dumps: Dumps are produced monthly for a specific set of namespaces and wikis, and then made available for public download.

Wikimedia Enterprise: Enterprise-grade APIs Built for Search, Voice Assistants, AI & more. The Wikimedia Enterprise API is a new service introduced in 2022 focused on use cases of high-volume for-profit reusers of Wikimedia projects, that those entitites can use at scale, and for which they will be charged.

Wikimedia Enterprise HTML Dumps This partial mirror of Wikimedia Enterprise HTML dumps is an experimental service.

API Portal - currently available as a proof of concept.

API gateway is in its alpha release.

Analytics Datasets: Clickstream: a monthly-generated clickstream for wikipedia in English, Russian, German, Spanish, and Japanese

Shared data sets from research projects

Best Ref - (Liew2020-2021):
Shows popularity and reliability scores for sources in references of Wikipedia Articles in different languages. Data extraction based on complex method using Wikimedia dumps in July 2020. To find the most popular and reliable sources we used information about over 200 million references of Wikipedia articles. More details in the research "Modeling Popularity and Reliability of Sources in Multilingual Wikipedia". Values for PR-score and AR-score were additinaly increased 100 times (to distinguish smaller values in the ranking).

Wikipedia Knowledge Graph dataset, 2022, Arroyo-Machado, Wenceslao; Torres-Salinas, Daniel; Costas, Rodrigo, Zenodo
In order to reduce the complexity of identifying and collecting data on Wikipedia and expanding its analytical potential, after collecting different data from various sources and processing them, we have generated a dedicated Wikipedia Knowledge Graph aimed at facilitating the analysis, contextualization of the activity and relations of Wikipedia pages, in this case limited to its English edition. We share this Knowledge Graph dataset in an open way, aiming to be useful for a wide range of researchers, such as informetricians, sociologists or data scientists.

Wikinfometrics: informetric analysis of the English Wikipedia
This R Shiny app provides interactive visualizations of the top Wikipedia articles by indicator to better understand the analytical dimension of these metrics.

Wikimedia Tools

Wikitech: Wikitech is the home of technical documentation for Wikimedia Foundation infrastructure and services. This includes production clusters, Wikimedia Cloud Services, Toolforge hosting, and the Beta Cluster.

Data services: includes services that allow for direct access to databases and dumps, as well as web interfaces for querying and programmatic access to data stores. Data services currently include: Wiki Replicas, ToolsDB, Wikilabels Postgres, Wikimedia Dumps, Shared Storage, CirrusSearch Elasticsearch replicas, Quarry and PAWS.

List of Wiki software - software that uses the Wiki format

Wikidata tools - See also the list of Wikidata tagged tools on Toolhub.

Pywikibot

Pywikibot is a Python library and collection of scripts that automate work on MediaWiki sites. Originally designed for Wikipedia, it is now used throughout the Wikimedia Foundation's projects and on many other wikis.

Visualizing tools

Networkx

Networkx is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks (Networkx, 2022).

Wikimedia Statistics

Page updated

Google Sites

Report abuse