Wikipedia:Wikipedia Signpost/2023-02-04/Recent research

Recent research

Wikipedia's "moderate yet systematic" liberal citation bias


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

English Wikipedia's news citations found to have "moderate yet systematic" liberal bias

"Distribution of Wikipedia’s news media citation political polarization scores using Kernel Density Estimates (KDE). Negative: liberal; positive: conservative." (Figure 2 from the paper)

A preprint titled "Polarization and reliability of news sources in Wikipedia"[1] finds

[...] a moderate yet systematic liberal polarization in the [English Wikipedia's] selection of news media sources. We also show that this effect is not mitigated by controlling for news media factual reliability."

The study is based on a dataset of 30 million citations extracted in 2020, which the second author and others have already examined from different angles in other research publications (cf. our previous coverage: "6.7% of Wikipedia articles cite at least one academic journal article with DOI", "How Wikipedia keeps up with COVID-19 research", "A Map of Science in Wikipedia").

As with research examining other kinds of bias (like gender, language or geography), studying political bias involves the non-trivial problem of defining a "neutral" baseline against which to compare Wikipedia's content. For example, in a series of earlier papers that (among other results) found Wikipedia to be "more slanted towards Democratic views" than Britannica, although its "bias was moving from left to right", Greenstein and Zhu used the United States Congressional Record as a kind of gold standard of unbiased language. (Of course, this opened them up to the question whether the spectrum of opinions present among US federal lawmakers is an appropriate baseline for an international encyclopedia, even if their analysis was focused on articles related to US politics.) A 2017 paper studied both political and gender bias by comparing Wikipedia's coverage of topics to that of "political periodicals geared toward either liberal or conservative ideologies" (e.g. Mother Jones vs. National Review), and women's vs. men's magazines, respectively (see our earlier coverage: "English Wikipedia biased against conservative and female topics, at least when compared to US magazines").

The present study relies on a different source that has since become available:

To estimate the political polarization of Wikipedia citations, we use the Media Bias Monitor.[supp 1] This system collects demographic data about the Facebook followers of 20,448 distinct news media outlets [...]. These data include political leanings, gender, age, income, ethnicity and national identity. For political leanings, the Facebook Audience API[supp 2] provides five levels: Very Conservative, Conservative, Moderate, Liberal, Very Liberal. To measure the political leaning of an outlet, MBM firstly finds the fraction of readers having different political leanings, and then multiply the fraction for each category with the following values: very liberal (–2), liberal (–1), moderate (0), conservative (1), and very conservative (2). The sum of such scores provides a single polarization score for the outlet, ranging between –2 and 2, where a negative score indicates that a media outlet is read more by a liberal leaning audience, while a positive score indicates a conservative leaning audience. In the original paper, MBM is compared to alternative approaches used to infer the political leanings of news media outlets, finding that this method highly correlates with most alternatives."

Matching domain names between MBM and the "Wikipedia Citations" dataset, the study finds that

"The average Wikipedia citation polarization score (red line) is -0.51 (median -0.52) [on the aforementioned MBM scale from -2 (very liberal) to 2 (very conservative)], therefore leaning towards liberal. The bulk of citations also falls between the range -1 and 0."

"Distribution of Wikipedia citation political polarization scores for the top 10 WikiProjects" (figure from the paper)

Breaking down polarization ratings by ORES article topic areas, "we cannot see differences among macro topics". This "general trend" was also found for the top 10 (sub-)topic areas and the top 10 Wikiprojects, although with "minor shifts [...]. For example, the topic sports has a higher conservative-leaning fraction of citations, all the while maintaining a liberal-leaning skew. The WikiProjects Politics and India are more liberal-leaning than the average, instead. Taken together, these results confirm that the overall trend towards liberal political polarization is not specific to some areas of Wikipedia, but seems to be widespread across topics and WikiProjects."

"Distribution of Wikipedia’s news media citation reliability scores" according to Media Bias/Fact Check (figure 1 from the paper)

Motivating their second research question, the authors "speculate that editors may introduce political polarization in their sources in order to prioritise reliable ones" (which might remind one of Stephen Colbert's dictum "Reality has a well-known liberal bias"). To test this hypothesis, they use the reliability ratings of Media Bias/Fact Check (but not that site's bias ratings). They note in passing that "that, while there are only 1467 citations rated as 'VERY LOW' [reliability], there remains a sizable fraction of citations to low or mixed reliability outlets" on English Wikipedia, as of 2020. (It might have been interesting to conduct the same analysis with the English Wikipedia's own reliability ratings that the community has compiled for numerous news sources at WP:RSP – where, ironically, "Media Bias/Fact Check" is itself currently rated as "generally unreliable, as it is self published", somewhat in contrast to the present paper and the peer-reviewed publication that it cites in justification of using MBFC.)

However, in a linear regression analysis (which also takes article topic and WikiProjects into account), the authors "cannot see a clear pattern emerge. While high reliability shows a liberal skew, very high reliability shows a conservative skew in turn. Mixed sources tend to be more liberal, while low and very low reliability ones tend to be more conservative." Overall, they conclude that "the case for a possible association between low reliability and conservative news outlets disappear[s]" in the end.


Briefly


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"Political representation bias in DBpedia and Wikidata as a challenge for downstream processing"

From the abstract:[2]

"Diversity Searcher is a tool originally developed to help analyse diversity in news media texts [...] We compare two data sources that Diversity Searcher has worked with – [the Wikipedia-based] DBpedia and Wikidata – with respect to their ontological coverage and diversity, and describe implications for the resulting analyses of text corpora. We describe a case study of the relative over- or underrepresentation of Belgian political parties between 1990 and 2020 in the English-language DBpedia, the Dutch-language DBpedia, and Wikidata [...]. In particular, we came across a staggering overrepresentation of the political right in the English-language DBpedia."

From the "Method" section:

"As a null hypothesis, a knowledge source represents a political constellation in an unbiased way if the relative number of politicians from a given party who are represented as an entity in a knowledge source [...] equals the relative number of this party in a relevant real-life context. [... We] consider “having a Wikipedia page” (etc.) as an important contributor to public visibility of a person and their party. [...] The baseline is then – relatively – easy to define: the shares of the vote or the number of seats of parties Y at times T in a given political body. We started by concentrating on the national parliament, the Chamber of People’s Representatives (Kamer van volksvertegenwoordigers, henceforth KVV) and used the number of seats at the beginning of a legislature. We also looked at the regional (Flemish) parliaments (Vlaams parlement, VP) [...]"

From the "Results and interpretation" section:

"These results not only confirm our first informal observation of over-representation of rightwing parties (especially the N-VA) in the English-language DBpedia, with a trend growing over time. (During these years, the N-VA’s share of the popular vote increased, but the DBpedia growth clearly exceeds the baseline growth.) Different biases seem to occur in the Dutch-language DBpedia: although on the whole comparatively similar to the baseline, this ontology seems to over-represent the main centrist party (CD&V). Wikidata, in contrast, gives a rather accurate picture of party shares in the national parliament. The French-language Walloon parties are (understandably, given the language focus) under-represented in the Dutch-language DBpedia. Both the overrepresentation of rightist and centrist parties in media coverage have been identified in earlier international research [...]"


"Assuming Good Faith Online"

In this legal essay,[3] US legal scholar Eric Goldman (whom some Wikipedians might recall for his – later retracted – 2005 prediction of Wikipedia's demise due to volunteer burnout) contrasts Wikipedia's "Assume Good Faith" principle with current attempts by Internet regulators to rein in on user-generated content websites and Section 230 (see also this issue's "In the media").


Simulation of article disputes finds that "it is more important not to have intolerant editors than to have very tolerant ones"

From the abstract:[4]

"[...] we focus on how the editors' attitudes, namely being broad-minded or stubborn, affect the consensus-building process in a model of Wikipedia. We further investigate how banning editors affects the speed with which conflicts or debates can be resolved. For the analysis, we use an agent-based opinion model developed to simulate different aspects of Wikipedia. We show that, in most cases, banning agents from editing an article slows down the consensus-building process, and increases the system’s relaxation time. We show further, and counterintuitively, that with large groups of 'extremists' who hold other than the central opinion, consensus can be reached faster and the article will be less biased."

From the "Conclusion" section:

"[..] for the consensus [to be achieved] it is more important not to have intolerant editors than to have very tolerant ones.

Our results indicate that consensus is reached extremely slowly if the bias of the article can be changed only by a small amount. To resolve the conflict faster, one must either increase the change of bias in one edit or the ratio of extremists. In general, the latter cannot be controlled deliberately, but the former can be influenced.

In Wikipedia, there is already a method aimed at resolving disputes of that sort. The solution is to move the disputed questions into a new section (or page) where they can be discussed freely. The new trend to move disputed parts of the article into the Criticism or Controversy sections [which is actually discouraged in a widely cited community essay] is a good way to handle this problem. Assigning [sensitive] arguments and opinions to a small section of the article that is much easier to modify makes the full article less disputed. Thus, tolerance towards the main article increases [...]"

See also our review of a related earlier paper involving one of the authors: "More newbies mean more conflict, but extreme tolerance can still achieve eternal peace".

"The Role of Local Content in Wikipedia: A Study on Reader and Editor Engagement"

From the abstract:[5]

"About a quarter of each Wikipedia language edition is dedicated to representing 'local content', i.e. the corresponding cultural context —geographical places, historical events, political figures, among others—. To investigate the relevance of such content for users and communities, we present an analysis of reader and editor engagement in terms of pageviews and edits. The results, consistent across fifteen diverse language editions, show that these articles are more engaging for readers and especially for editors. The highest proportion of edits on cultural context content is generated by anonymous users, and also administrators engage proportionally more than plain registered editors [...]"

(cf. by some of the same authors: "The Wikipedia Diversity Observatory: A Project to Identify and Bridge Content Gaps in Wikipedia")

This paper is part of a 2021 monograph published on occasion of Wikipedia's 20th anniversary ("Wikipedia, veinte años de conocimiento libre"), which comprises various other research papers, most of which are in Spanish with an English abstract.


"Discussing the Past: The Production of Historical Knowledge on Wikipedia"

From the abstract:[6]

"This study explores how historical knowledge is produced on Wikipedia. The project is based on multiple methodologies ranging from qualitative analysis of Wikipedia pages related to history, survey with Wikipedia editors, to quantitative analysis of participatory practices within the Wikipedia community. The main argument is that Wikipedia allows people to discuss the past, express their opinions and emotions about history and its significance in the present and the future through the portal of “talk” [pages] that Wikipedia provides [...].

This dissertation includes detailed examinations of the history of discussions at Talk:Atomic bombings of Hiroshima and Nagasaki, Talk:Vietnam War and Talk:September 11 attacks.

"Producing Historical Knowledge on Wikipedia"

This is an earlier paper by the dissertation's author. From the "Conclusion" section:[7]

"Wikipedia’s capability of producing historical narratives, its self-critical character through the talk pages, and its open character are significant tools that should not be underestimated. The popularity of Wikipedia and, particularly, the popularity of the historical pages that are visited daily by a lot of people have to be studied and not be neglected as a kind of not “real history.” Wikipedia cannot change radically the historical scholarship but can bring the historian closer to the society."


From the abstract:[8]

"Broken external references on Wikipedia which lack archived copies are marked as 'permanently dead'. But, we find this term to be a misnomer, as many previously dysfunctional links work fine today. For links which do not work, it is rarely the case that no archived copies exist. Instead, we find that the current policy for determining which archived copies for an URL are not erroneous is too conservative, and many URLs are archived for the first time only after they no longer work."


References

  1. ^ Yang, Puyu; Colavizza, Giovanni (2022-11-21), Polarization and reliability of news sources in Wikipedia, arXiv:2210.16065 Code
  2. ^ Karadeniz, Ozgur; Berendt, Bettina; Kiyak, Sercan; Mertens, Stefan; d'Haenens, Leen (2022-12-29), Political representation bias in DBpedia and Wikidata as a challenge for downstream processing, arXiv:2301.00671
  3. ^ Goldman, Eric (2022). "Assuming Good Faith Online". SSRN Electronic Journal. doi:10.2139/ssrn.4277296. ISSN 1556-5068. S2CID 254353500. 30 Catholic U.J.L. & Tech. __ (Forthcoming), Santa Clara Univ. Legal Studies Research Paper No. 4277296
  4. ^ Rudas, Csilla; Török, János (2018). "Modeling the Wikipedia to Understand the Dynamics of Long Disputes and Biased Articles". Historical Social Research. 43 (1): 72–88. doi:10.12759/hsr.43.2018.1.72-88. ISSN 0172-6404.
  5. ^ Ribé, Marc Miquel; Laniado, David; Kaltenbrunner, Andreas (2021-05-17). "The Role of Local Content in Wikipedia: A Study on Reader and Editor Engagement". Área Abierta. 21 (2): 123–151. doi:10.5209/arab.72801. ISSN 1578-8393. S2CID 238044047.
  6. ^ Apostolopoulos, Petros (2022-04-28). "Discussing the Past: The Production of Historical Knowledge on Wikipedia". North Carolina State University. (dissertation)
  7. ^ Apostolopoulos, Petros (2019-04-29). "Producing Historical Knowledge on Wikipedia". Madison Historical Review. 16 (1).
  8. ^ Nyayachavadi, Anish; Zhu, Jingyuan; Madhyastha, Harsha V. (2022-10-25). "Characterizing "permanently dead" links on Wikipedia". Proceedings of the 22nd ACM Internet Measurement Conference. IMC '22. New York, NY, USA: Association for Computing Machinery. pp. 388–394. doi:10.1145/3517745.3561451. ISBN 9781450392594.
Supplementary references and notes:
  1. ^ Filipe N Ribeiro, Lucas Henrique, Fabricio Benevenuto, Abhijnan Chakraborty, Juhi Kulshrestha, Mahmoudreza Babaei, and Krishna P Gummadi. Media bias monitor: Quantifying biases of social media news outlets at large-scale. In Twelfth international AAAI conference on web and social media, 2018.
  2. ^ "Customer File Custom Audiences - Meta Marketing API - Documentation".