Wikipedia:Wikipedia Signpost/2022-11-28/Recent research: Difference between revisions

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 20:58, 27 November 2022

Article display preview:

TKTK – TKTK

Recent research

COVID-19 WikiProject praised in new book

JPxG, you'll need to delete unused sections of still there at publication time. Also, change this blurb.

This is a draft of a potential Signpost article, and should not be interpreted as a finished piece. Its content is subject to review by the editorial team and ultimately by JPxG, the editor in chief. Please do not link to this draft as it is unfinished and the URL will change upon publication. If you would like to contribute and are familiar with the requirements of a Signpost article, feel free to be bold in making improvements!

This draft article ...

Y ... has a title defined.
COVID-19 WikiProject praised in new book
Y ... has a blurb defined.
JPxG, you'll need to delete unused sections of still there at publication time. Also, change this blurb.
Y ... has been copyedited.
N ... does not have an image.
N ... is not yet approved for publication.

Writer resources ...

The Newsroom (talk)

deadlines

Writing: 1 June 18:00 (1 day left; 6%)

Publishing: 2 June 18:00 (2 days left; 11%)

There are 3 hours, 9 minutes and 36 seconds until deadline. (refresh)

Last revised 20:58, 27 November 2022 (UTC) (18 months ago) by HaeB (refresh)

← Back to Contents

View Latest Issue

28 November 2022

Recent research

COVID-19 WikiProject praised in new book

Contribute —

By Piotr Konieczny, ...

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"Wikipedia as a trusted method of information assessment during the COVID-19 crisis"

Reviewed by Piotr Konieczny

This book chapter,^[1] unfortunately not yet in open access, provides a good overview of Wikipedia's history and practices, concluding that Wikipedia's coverage of the COVID-19 pandemic is "precise and robust", and that this generated positive coverage of Wikipedia in mainstream media. The author provides an extensive and detailed overview of how Wikipedia volunteers covered the pandemic, and highlights the efforts of the dedicated WikiProject COVID-19, an offshot of the larger WikiProject Medicine, as important in creating quality content in this topic area. He also praises the Wikimedia Community and the Wikimedia Foundation for their dedication to the fight against fake news and related misinformation.

One of the most interesting observations made by the author – if somewhat tangential to the main topic – is that Wikipedia "is used [by readers] in ways similar to a news media", which "generates a tension between Wikipedia's original encyclopaedic ambitions and these pressing journalistic tendencies" (an interesting and, in this reviewer's experience, under-researched topic, at least in English – see here for a review of a recent French book on this topic). The author concludes that this, somewhat begrudging, acceptance of current developments by the Wikipedia community has significantly contributed to making its coverage relevant to the general audience, and at the same time, notes that despite reliance on media sources, rather than waiting for scholarly coverage, Wikipedia was still able to main high quality in its coverage, which the author attributes to editors' reliance on "legacy media outlets, like The New York Times and the BBC".

Wikimedia Foundation builds "Knowledge Integrity Risk Observatory" to enable communities to monitor at-risk Wikipedias

Reviewed by Tilman Bayer

"Taxomony of Knowledge Integrity Risks in Wikipedia" (figure 1 in the paper)

A paper titled "A preliminary approach to knowledge integrity risk assessment in Wikipedia projects"^[2] by two members of the Wikimedia Foundation's research team provides a "taxonomy of knowledge integrity risks in Wikipedia projects [and an] initial set of indicators to be the core of a Wikipedia Knowledge Integrity Risk Observatory." The goal is to "provide Wikipedia communities with an actionable monitoring system." The paper was presented in August 2021 at an academic workshop on misinformation (part of the ACM's KDD conference), as well as at the Wikimania 2021 community conference the same month.

The taxonomy distinguishes between internal and external risks, each divided into further sub-areas (see figure). "Internal" refers "to issues specific to the Wikimedia ecosystem while [external risks] involve activity from other environments, both online and offline."

Various quantitative indicators of knowledge integrity risks are proposed for each area. A remark at the end of the paper clarifies that they are all meant to be calculated at the project level - i.e. to provide information about how much an entire Wikipedia language version is at risk (rather than, say, to identify specific articles or editors that may deserve extra scrutiny).

For example, the following indicators are suggested for the "Content verifiability" risk category:

Distribution of articles by number of citations, number of scientific citations and number of citation and verifiability article maintenance templates, distribution of sources by reliability.

The authors emphasize that "the criteria for proposing these indicators are that they should be simple to be easily interpreted by non-technical stakeholders". Some of the proposed metrics are indeed standard in other contexts. But the paper mostly leaves open how they should be interpreted in the context of knowledge integrity risks. For example, the metrics for internal risks in the category "community capacity" include "Number of articles, editors, active editors, editors with elevated user rights (admins, bureaucrats, checkusers, oversighters, rollbackers)". The authors indicate that these are meant to identify a "shortage of patrolling resources." Presumably the idea is to construct risk indicators based on the ratio of these editor numbers to the number of articles or edits (with higher ratios perhaps indicating higher resilience to integrity risks), but the paper doesn't provide explanations.

For various other metrics, the possible interpretations are even less clear. For example, "ratio of articles for deletion; ratio of blocked accounts" are listed in the "Community governance" risk category. But do high ratios indicate a higher risk (because the project is more frequently targeted by misinformation) or a lower risk (because the local community is more effective at deleting and blocking misinformation)? Similarly, would a comparatively low "number of scientific citations" on a project indicate that it is rife for scientific misinformation - or simply that it has fewer and shorter articles about scientific topics?

Throughout the paper, such questions often remain unresolved, raising doubts about how useful these metrics will be in practice. While the authors sometimes cite relevant literature, several of the cited publications do not support or explain a relation between the proposed metric and misinformation risks either. For example, one of the two papers cited for "controversiality" (cf. our review) points out that, contrary to what the Foundation's researchers appear to assume, editor controversies can have a positive effect on article quality ("Clearly, there is a positive role of the conflicts: if they can be resolved in a consensus, the resulting product will better reflect the state of the art"). Similarly, other research has found that "higher political polarization [among editors] was associated with higher article quality."

An exception is the "Community demographics" risk category, where the authors provide the following justification:

"To illustrate the value of the indicators for knowledge integrity risk assessment in Wikipedia, we provide an example on community demographics, in particular, geographical diversity [defined as] the entropy value of the distributions of number of edits and views by country of the language editions with over 500K articles. On the one hand, we observe large entropy values for both edits and views in the Arabic, English and Spanish editions, i.e., global communities. On the other hand, other large language editions like the Italian, Indonesian, Polish, Korean or Vietnamese Wikipedia lack that geographical diversity."

(Assuming that this refers to entropy in the sense of information theory, for example, these values are minimal (0) when all edits or views are concentrated in a single country, and maximized when every country worldwide contributes the exact same number of edits or views.)

Here, the authors "highlight the extraordinarily low entropy of views of the Japanese Wikipedia, which supports one of the main causes attributed to misinformation incidents in this edition" (referring to concerns about historical revisionism in several Japanese Wikipedia articles). However, it remains unclear why a low diversity in views should be more directly associated to such bias problems than a low diversity in edits (where the Japanese Wikipedia appears to be largely on par with Finnish and Korean Wikipedia, and Italian, Polish and Catalan Wikipedia would seem similarly at risk). The paper also includes a plot showing a linear regression fit that indicates a relation between the two measures (entropy of views and edits). But this finding seems somewhat unsurprising if one accepts that reading and editing activity levels may be correlated, and its relevance to knowledge integrity remains unclear.

Lastly, while the paper's introduction highlights "deception techniques such as astroturfing, harmful bots, computational propaganda, sockpuppetry, data voids, etc." as major reasons for a recent rise in misinformation problems on the web, none of these are explicitly reflected in the proposed taxonomy, or captured in the quantitative indicators. (The "Content quality" metrics mention the frequency of bot edits, but in the context of Wikipedia research, these usually refer to openly declared, benign bot accounts. The "geopolitics" risk category gives a nod "political contexts" where "some well resourced interested parties (e.g., corporations, nations) might be interested in externally-coordinated long-term disinformation campaigns in specific projects." but this evidently does not capture the )

This omission is rather surprising, considering that problems like paid advocacy and conflict of interest editing have been discussed as major concerns among the editing community for a long time (see e.g. the hundreds of articles published over the year by the Signpost, recently in form of a regular "Disinformation Report" rubric). They are also among the few content issues where the Wikimedia Foundation has felt compelled to take action in the past, e.g. by changing its terms of use and taking legal action against some players.

The paper stresses in its title that the taxonomy is meant to be "preliminary." Indeed, since its publication last year, further work has been done on refining and improving at least the proposed metrics (if not necessarily the taxonomy itself), according to the research project's page on Meta-wiki and associated Phabricator tasks, resulting in a not yet public prototype of the risk observatory (compare screenshot below). Among other changes, the aforementioned entropy of views by country seems to have been replaced by a "Gini index" chart. Also, rather than the relative ratios of blocks mentioned in the paper, the prototype shows absolute counts of globally locked editors over time, still raising several questions on how to interpret these numbers in terms of knowledge risks.

The project appears to be part of the WMF Research team's "knowledge integrity" focus area, announced in February 2019 in one of four "white papers that outline our plans and priorities for the next 5 years" (see also last month's issue about several other efforts in this focus area, which likewise haven't yet resulted in actionable tools for editors apart from one since discontinued prototype).

The case of Croatian Wikipedia

While it is not mentioned in the "knowledge integrity risk assessment" paper (or on the research project's page on Meta-wiki), the Croatian Wikipedia is probably the most prominent example of a Wikimedia project where knowledge integrity was found to be compromised significantly. Thus it might provide an interesting retroactive test case for the usefulness of the observatory. A "disinformation assessment report"^[3] commissioned by a different team at the Wikimedia Foundation (published in June 2021, i.e. around the same time as the paper reviewed here) found "a widespread pattern of manipulative behaviour and abuse of power by an ideologically aligned group of Croatian language Wikipedia (Hr.WP) admins and other volunteer editors", who "held undue de-facto control over the project at least from 2011 to 2020." It's unclear to this reviewer whether that kind of situation would have been reflected in any of the Wikipedia Knowledge Integrity Risk Observatory's proposed indicators. None of them seem to be suitable for distinguishing such a case - where the project's admins reverted or blocked editors who actually tried to uphold core Wikipedia principles - from the (hopefully) more frequent situation where the project's admins uphold and enforce these principles against actors who try to introduce disinformation.

Interestingly though, while the findings of the Croatian Wikipedia disinformation assessment are largely qualitative, it also developed a quantitative indicator "to measure and quantify disinformation". This is based on examining the Wikipedia articles about individuals who the UN's International Criminal Tribunal for the former Yugoslavia (ICTY) convicted of war crimes, counting how many of them mention this conviction in the first three sentences. The report's anonymous author found

"...that Croatian and Serbian language Wikipedia, in 62.5% and 39.1% of cases, respectively, avoid informing their visitors in the introductory paragraph that the person they’re reading about is a convicted war criminal who comes from the same ethnic group. All other surveyed Wikipedia languages – Serbo-Croatian, Bosnian, English, French, Spanish, and German – do this consistently and keep the information at the very beginning of an article [...]"

...

Reviewed by ....

Briefly

See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
...

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by ...

"Editors, sources and the 'go back' button: Wikipedia's framework for beating misinformation"

From the abstract and the "Conclusion" section:^[4]

"This paper investigates the editorial framework developed by the Wikipedia community, and identifies three key factors as proving successful in the fight against medical misinformation in a global pandemic — the editors, their sources and the technological affordances of the platform."
"Perhaps most significantly for the flow of misinformation, is that unlike the interconnectivity of other online media platforms, Wikipedia is largely a one-way street. While Facebook, YouTube and Google refer their readers to the site for fact-checking, Wikipedia does not return the favour. Without a commercial agenda, its readers are not directed to other content by an algorithm, nor are they subjected to advertisements or clickbait, hijacking their attention. [...]
The site is winning the battle against COVID-19 misinformation through the combination of an enthusiastic, volunteer army (those nit-picking masses), working within the disciplined schema of rigorous referencing to credible sources, on a platform designed for transparency and efficient editing. This editorial framework, combined with sanctions, expert oversight and more stringent referencing rules, is showing Wikipedia to be a significant platform for health information during the COVID-19 pandemic."

"..."

From the abstract:

...

"..."

From the abstract:

...

References

^ Segault, Antonin (2022). "Wikipedia as a trusted method of information assessment during the COVID-19 crisis". COVID-19, Communication and Culture. Routledge. ISBN 9781003276517. [1] Google Books
^ Pablo Aragón and Diego Sáez-Trumper. 2021. A preliminary approach to knowledge integrity risk assessment in Wikipedia projects. In: The Second International MIS2 Workshop: Misinformation and Misbehavior Mining on the Web (MIS2 workshop at KDD 2021), August 15, 2021, Virtual. ACM, New York, NY, USA. Also as updated preprint: Aragón, Pablo; Sáez-Trumper, Diego (2021-06-30). "A preliminary approach to knowledge integrity risk assessment in Wikipedia projects". arXiv:2106.15940 [cs].
^ https://commons.wikimedia.org/wiki/File:Croatian_WP_Disinformation_Assessment_-_Final_Report_EN.pdf
^ Avieson, Bunty (2022-11-07). "Editors, sources and the 'go back' button: Wikipedia's framework for beating misinformation". First Monday. doi:10.5210/fm.v27i11.12754. ISSN 1396-0466.{{cite journal}}: CS1 maint: unflagged free DOI (link)

Supplementary references and notes:

This page is a draft for the next issue of the Signpost. Below is some helpful code that will help you write and format a Signpost draft. If it's blank, you can fill out a template by copy-pasting this in and pressing 'publish changes': {{subst:Wikipedia:Wikipedia Signpost/Templates/Story-preload}}

Images and Galleries

Sidebar images

To put an image in your article, use the following template (link):

[[File:|center|300px|alt=Placeholder alt text]]

CAPTION

{{Wikipedia:Wikipedia Signpost/Templates/Filler image-v2
 |image     = 
 |size      = 300px
 |alt       = Placeholder alt text
 |caption   = CAPTION
 |fullwidth = no
}}

This will create the file on the right. Keep the 300px in most cases. If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Inline images

Placing

{{Wikipedia:Wikipedia Signpost/Templates/Inline image
 |image   =
 |size    = 300px
 |align   = center
 |alt     = Placeholder alt text
 |caption = CAPTION
}}

(link) will instead create an inline image like below

[[File:|300px|center|alt=Placeholder alt text]]

CAPTION

Galleries

To create a gallery, use the following

<gallery mode = packed | heights = 200px>
|Caption for second image
</gallery>

to create

Quotes

Framed quotes

“

Lorem ipsum dolor sit amet...

”

— AUTHOR, SOURCE

To insert a framed quote like the one on the right, use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/Filler quote-v2
 |1         = The goose is on the loose!
 |author    = AUTHOR
 |source    = SOURCE
 |fullwidth = no
}}

If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Pull quotes

To insert a pull quote like

“

Lorem ipsum dolor sit amet...

”

use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/Quote
 |1         = The goose is on the loose!
 |source    = SOURCE
}}

Long quotes

To insert a long inline quote like

The goose is on the loose! The geese are on the lease!
— User:Oscar Wilde
— Quotations Notes from the Underpoop

use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/block quote
 | text   = The goose is on the loose! The geese are on the lease!
 | by     = Oscar Wilde
 | source = Quotations
 | ts     = Notes from the Underpoop
 | oldid  = 1234567890
}}

Side frames

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

A caption

Side frames help put content in sidebar vignettes. For instance, this one (link):

{{Wikipedia:Wikipedia Signpost/Templates/Filler frame-v2
 |1         = Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 |caption   = A caption
 |fullwidth = no
}}

gives the frame on the right. This is useful when you want to insert non-standard images, quotes, graphs, and the like.

Example − Graph/Charts

A caption

For example, to insert the {{Graph:Chart}} generated by

{{Graph:Chart
 |width=250|height=100|type=line
 |x=1,2,3,4,5,6,7,8|y=10,12,6,14,2,10,7,9
}}

in a frame, simple put the graph code in |1=

{{Wikipedia:Wikipedia Signpost/Templates/Filler frame-v2
 |1=
{{Graph:Chart
 |width=250|height=100|type=line
 |x=1,2,3,4,5,6,7,8|y=10,12,6,14,2,10,7,9
}}
 |caption=A caption
 |fullwidth=no
}}

to get the framed Graph:Chart on the right.

If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Two-column vs full width styles

If you keep the 'normal' preloaded draft and work from there, you will be using the two-column style. This is perfectly fine in most cases and you don't need to do anything.

However, every time you have a |fullwidth=no and change it to |fullwidth=yes (or vice-versa), the article will take that style from that point onwards (|fullwidth=yes → full width, |fullwidth=no → two-column). By default, omitting |fullwidth= is the same as putting |fullwidth=no and the article will have two columns after that. Again, this is perfectly fine in most cases, and you don't need to do anything.

However, you can also fine-tune which style is used at which point in an article.

To switch from two-column → full width style midway in an article, insert

{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-start-v2|fullwidth=yes}}

where you want the switch to happen.

To switch from full width → two-column style midway in an article, insert

{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-start-v2|fullwidth=no}}

where you want the switch to happen.

Article series

To add a series of 'related articles' your article, use the following code

Discuss this story

To follow comments, add the page to your watchlist. If your comment has not appeared here, you can try purging the cache.

@@ Line 28: / Line 28: @@
 One of the most interesting observations made by the author – if somewhat tangential to the main topic – is that Wikipedia "is used [by readers] in ways similar to a<!--sic--> news media", which "generates a tension between Wikipedia's original encyclopaedic ambitions and these pressing journalistic tendencies" (an interesting and, in this reviewer's experience, under-researched topic, at least in English – see [https://journals.openedition.org/edc/14098 here] for a review of a recent French book on this topic). The author concludes that this, somewhat begrudging, acceptance of current developments by the Wikipedia community has significantly contributed to making its coverage relevant to the general audience, and at the same time, notes that despite reliance on media sources, rather than waiting for scholarly coverage, Wikipedia was still able to main high quality in its coverage, which the author attributes to editors' reliance on "legacy media outlets, like ''The New York Times'' and the BBC".
-=== ... ===
+===Wikimedia Foundation builds "Knowledge Integrity Risk Observatory" to enable communities to monitor at-risk Wikipedias===
-:''Reviewed by ...''
+:''Reviewed by [[User:HaeB|Tilman Bayer]]''
+[[File:Taxomony of Knowledge Integrity Risks in Wikipedia (v1).png|thumb|520px|center|"Taxomony of Knowledge Integrity Risks in Wikipedia" (figure 1 in the paper)]]
+A paper titled "A preliminary approach to knowledge integrity risk assessment in Wikipedia projects"<ref>Pablo Aragón and Diego Sáez-Trumper. 2021. [https://web.archive.org/web/20220529031821/http://claws.cc.gatech.edu/mis2-papers/paper6-aragon.pdf A preliminary approach to knowledge integrity risk assessment in Wikipedia projects]. In: The Second International MIS2 Workshop: Misinformation and Misbehavior Mining on the Web (MIS2 workshop at KDD 2021), August 15, 2021, Virtual. ACM, New York, NY, USA. Also as updated preprint: {{Cite journal| last1 = Aragón| first1 = Pablo| last2 = Sáez-Trumper| first2 = Diego| title = A preliminary approach to knowledge integrity risk assessment in Wikipedia projects| journal = arXiv:2106.15940 [cs]| date = 2021-06-30| url = http://arxiv.org/abs/2106.15940}}</ref> by two members of the Wikimedia Foundation's research team provides a "taxonomy of knowledge integrity risks in Wikipedia projects [and an] initial set of indicators to be the core of a Wikipedia Knowledge Integrity Risk Observatory." The goal is to "provide Wikipedia communities with an actionable monitoring system." The paper was [https://www.slideshare.net/elaragon/a-preliminary-approach-to-knowledge-integrity-risk-assessment-in-wikipedia-projects presented in August 2021] at an academic workshop on misinformation (part of the ACM's [[Special Interest Group on Knowledge Discovery and Data Mining|KDD]] conference), as well as [https://www.youtube.com/watch?v=0N7X9oPF0QA&t=651s at the Wikimania 2021 community conference] the same month.
+The taxonomy distinguishes between internal and external risks, each divided into further sub-areas (see figure). "Internal" refers "to issues specific to the Wikimedia ecosystem while [external risks] involve activity from other environments, both online and offline."
+Various quantitative indicators of knowledge integrity risks are proposed for each area. A remark at the end of the paper clarifies that they are all meant to be calculated at the project level - i.e. to provide information about how much an entire Wikipedia language version is at risk (rather than, say, to identify specific articles or editors that may deserve extra scrutiny).
+For example, the following indicators are suggested for the "Content verifiability" risk category:
+<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
+Distribution of articles by number of citations, number of scientific citations and number of citation and verifiability article maintenance templates, distribution of sources by reliability.
+</blockquote>
+The authors emphasize that "the criteria for proposing these indicators are that they should be simple to be easily interpreted by non-technical stakeholders". Some of the proposed metrics are indeed standard in other contexts. But the paper mostly leaves open how they should be interpreted in the context of knowledge integrity risks. For example, the metrics for internal risks in the category "community capacity" include "Number of articles, editors, active editors, editors with elevated user rights (admins, bureaucrats, checkusers, oversighters, rollbackers)". The authors indicate that these are meant to identify a "shortage of patrolling resources." Presumably the idea is to construct risk indicators based on the ratio of these editor numbers to the number of articles or edits (with higher ratios perhaps indicating higher resilience to integrity risks), but the paper doesn't provide explanations.
+For various other metrics, the possible interpretations are even less clear. For example, "ratio of articles for deletion; ratio of blocked accounts" are listed in the "Community governance" risk category. But do high ratios indicate a higher risk (because the project is more frequently targeted by misinformation) or a lower risk (because the local community is more effective at deleting and blocking misinformation)? Similarly, would a comparatively low "number of scientific citations" on a project indicate that it is rife for scientific misinformation - or simply that it has fewer and shorter articles about scientific topics?
+Throughout the paper, such questions often remain unresolved, raising doubts about how useful these metrics will be in practice. While the authors sometimes cite relevant literature, several of the cited publications do not support or explain a relation between the proposed metric and misinformation risks either. For example, one of the two papers cited for "controversiality" (cf. [[m:Research:Newsletter/2013/June#"The_most_controversial_topics_in_Wikipedia%3A_a_multilingual_and_geographical_analysis"|our review]]) points out that, contrary to what the Foundation's researchers appear to assume, editor controversies can have a positive effect on article quality
+("Clearly, there is a positive role of the conflicts: if they can be resolved in a consensus, the resulting product will better reflect the state of the art"). Similarly, [[m:Research:Newsletter/2018/February#Politically_diverse_editors_and_article_quality|other research]] has found that "higher political polarization [among editors] was associated with higher article quality."
+[[File:Views and edits entropies in large Wikipedias.png|thumb|"Entropy values (𝑆) of the distributions of the number of edits and views by country of the Wikipedia language
+editions, identified by the ISO 639-1 code, with over 500K articles. The graph includes a linear regression model fit." (Figure 2 from the paper)]]
+An exception is the "Community demographics" risk category, where the authors provide the following justification:
+<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
+"To illustrate the value of the indicators for knowledge integrity risk assessment in Wikipedia, we provide an example on community demographics, in particular, geographical [[Diversity_index|diversity]] [defined as] the entropy value of the distributions of number of edits and views by country of the language editions with over 500K articles. On the one hand, we observe large entropy values for both edits and views in the Arabic, English and Spanish editions, i.e., global communities. On the other hand, other large language editions like the Italian, Indonesian, Polish, Korean or Vietnamese Wikipedia lack that geographical diversity."
+</blockquote>
+(Assuming that this refers to [[Entropy (information theory)|entropy in the sense of information theory]], for example, these values are minimal (0) when all edits or views are concentrated in a single country, and maximized when every country worldwide contributes the exact same number of edits or views.)
+Here, the authors "highlight the extraordinarily low entropy of views of the Japanese Wikipedia, which supports one of the main causes attributed to misinformation incidents in this edition" (referring to [[Japanese_Wikipedia#Criticism|concerns about historical revisionism]] in several Japanese Wikipedia articles). However, it remains unclear why a low diversity in views should be more directly associated to such bias problems than a low diversity in edits (where the Japanese Wikipedia appears to be largely on par with Finnish and Korean Wikipedia, and Italian, Polish and Catalan Wikipedia would seem similarly at risk). The paper also includes a plot showing a linear regression fit that indicates a relation between the two measures (entropy of views and edits). But this finding seems somewhat unsurprising if one accepts that reading and editing activity levels may be correlated, and its relevance to knowledge integrity remains unclear.
+Lastly, while the paper's introduction highlights "deception techniques such as astroturfing, harmful bots, computational propaganda, sockpuppetry, data voids, etc." as major reasons for a recent rise in misinformation problems on the web, none of these are explicitly reflected in the proposed taxonomy, or captured in the quantitative indicators. (The "Content quality" metrics mention the frequency of bot edits, but in the context of Wikipedia research, these usually refer to openly declared, benign bot accounts. The "geopolitics" risk category gives a nod "political contexts" where "some well resourced interested parties (e.g., corporations, nations) might be interested in externally-coordinated long-term disinformation campaigns in specific projects." but this evidently does not capture the )
+This omission is rather surprising, considering that problems like paid advocacy and conflict of interest editing have been discussed as major concerns among the editing community for a long time (see e.g. the
+[https://en.wikipedia.org/w/index.php?title=Special:Search&limit=500&offset=0&profile=default&prefix=Wikipedia%3AWikipedia+Signpost%2F20&search=paid+editing&ns0=1 hundreds of articles] published over the year by the Signpost, recently in form of a regular "Disinformation Report" rubric). They are also among the few content issues where the Wikimedia Foundation has felt compelled to take action in the past, e.g. by [https://diff.wikimedia.org/2014p/06/16/change-terms-of-use-requirements-for-disclosure/ changing its terms of use] and [[Wikipedia:Wikipedia Signpost/2013-11-20/News and notes|taking legal action]] against some players.
+The paper stresses in its title that the taxonomy is meant to be "preliminary." Indeed, since its publication last year, further work has been done on refining and improving at least the proposed metrics (if [[:File:Taxomony of Knowledge Integrity Risks in Wikipedia (v1).png|not necessarily the taxonomy itself]]), according to the research project's  [[m:Research:Wikipedia_Knowledge_Integrity_Risk_Observatory|page on Meta-wiki]] and associated [https://phabricator.wikimedia.org/T293501 Phabricator tasks], resulting in a not yet public prototype of the risk observatory (compare screenshot below). Among other changes, the aforementioned entropy of views by country seems to have been [[:File:Gini-index-of-country-views-count-2022-01-03T16-01-27.329Z.jpg|replaced]] by a "[[Gini index]]" chart. Also, rather than the relative ratios of blocks mentioned in the paper, the prototype [[:File:Globally-locked-editors-count-over-time-2022-01-03T16-00-28.389Z.jpg|shows]] absolute counts of globally locked editors over time, still raising several questions on how to interpret these numbers in terms of knowledge risks.
+The project appears to be part of the WMF Research team's "knowledge integrity" focus area, announced in February 2019 in [[m:Research:2030|one of four "white papers that outline our plans and priorities for the next 5 years"]] (see also [[Wikipedia:Wikipedia Signpost/2022-10-31/Recent research#How existing research efforts could help editors fight disinformation|last month's issue]] about several other efforts in this focus area, which likewise haven't yet resulted in actionable tools for editors apart from one since discontinued prototype).
+[[File:Wikipedia Knowledge Integrity Risk Observatory.png|thumb|520px|A screenshot of the (non-public) prototype of the risk observatory (January 2022)]]
+====The case of Croatian Wikipedia====
+While it is not mentioned in the "knowledge integrity risk assessment" paper (or on the research project's page on Meta-wiki), the [[Croatian Wikipedia]] is probably the most prominent example of a Wikimedia project where knowledge integrity was found to be compromised significantly. Thus it might provide an interesting retroactive test case for the usefulness of the observatory. A "disinformation assessment report"<ref>https://commons.wikimedia.org/wiki/File:Croatian_WP_Disinformation_Assessment_-_Final_Report_EN.pdf</ref> commissioned by a different team at the Wikimedia Foundation (published in June 2021, i.e. around the same time as the paper reviewed here) found "a widespread pattern of manipulative behaviour and abuse of power by an ideologically aligned group of Croatian language Wikipedia (Hr.WP) admins and other volunteer editors", who "[[m:Croatian Wikipedia Disinformation Assessment-2021|held]] undue de-facto control over the project at least from 2011 to 2020."  It's unclear to this reviewer whether that kind of situation would have been reflected in any of the Wikipedia Knowledge Integrity Risk Observatory's proposed indicators. None of them seem to be suitable for distinguishing such a case - where the project's admins reverted or blocked editors who actually tried to uphold core Wikipedia principles - from the (hopefully) more frequent situation where the project's admins uphold and enforce these principles against actors who try to introduce disinformation.
+Interestingly though, while the findings of the Croatian Wikipedia disinformation assessment are largely qualitative, it also developed a quantitative indicator "to measure and quantify disinformation". This is based on examining the Wikipedia articles about individuals who the UN's [[International Criminal Tribunal for the former Yugoslavia]] (ICTY) convicted of war crimes, counting how many of them mention this conviction in the first three sentences. The report's anonymous author found
+<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
+"...that Croatian and Serbian language Wikipedia, in 62.5% and 39.1% of cases, respectively, avoid informing their visitors in the introductory paragraph that the person they’re reading about is a convicted war criminal who comes from the same ethnic group. All other surveyed Wikipedia languages – Serbo-Croatian, Bosnian, English, French, Spanish, and German – do this consistently and keep the information at the very beginning of an article [...]"
+</blockquote>
 === ... ===
@@ Line 42: / Line 92: @@
 :<small>''Compiled by ...''</small>
+===="Editors, sources and the 'go back' button: Wikipedia's framework for beating misinformation"====
-===="..."====
+From the abstract and the "Conclusion" section:<ref>{{Cite journal| doi = 10.5210/fm.v27i11.12754| issn = 1396-0466| last = Avieson| first = Bunty| title = Editors, sources and the 'go back' button: Wikipedia's framework for beating misinformation| journal = First Monday| date = 2022-11-07| url = https://firstmonday.org/ojs/index.php/fm/article/view/12754}}</ref>
-From the abstract:
 <blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
+"This paper investigates the editorial framework developed by the Wikipedia community, and identifies three key factors as proving successful in the fight against medical misinformation in a global pandemic — the editors, their sources and the technological affordances of the platform."<br>
-...</blockquote>
+"Perhaps most significantly for the flow of misinformation, is that unlike the interconnectivity of other online media platforms, Wikipedia is largely a one-way street. While Facebook, YouTube and Google refer their readers to the site for fact-checking, Wikipedia does not return the favour. Without a commercial agenda, its readers are not directed to other content by an algorithm, nor are they subjected to advertisements or clickbait, hijacking their attention. [...]<br>
+The site is winning the battle against COVID-19 misinformation through the combination of an enthusiastic, volunteer army (those nit-picking masses), working within the disciplined schema of rigorous referencing to credible sources, on a platform designed for transparency and efficient editing. This editorial framework, combined with sanctions, expert oversight and more stringent referencing rules, is showing Wikipedia to be a significant platform for health information during the COVID-19 pandemic."
+</blockquote>
 ===="..."====