Wikipedia:Wikipedia Signpost/2024-01-31/Recent research
Article display preview: | This is a draft of a potential Signpost article, and should not be interpreted as a finished piece. Its content is subject to review by the editorial team and ultimately by JPxG, the editor in chief. Please do not link to this draft as it is unfinished and the URL will change upon publication. If you would like to contribute and are familiar with the requirements of a Signpost article, feel free to be bold in making improvements!
|
YOUR ARTICLE'S DESCRIPTIVE TITLE HERE
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
A "lack of bureaucratic openness and rules constraining administrator behavior" enabled nationalist takeover of Croatian Wikipedia
- Reviewed by Bri and Tilman Bayer
A paper titled "Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Serbo-Croatian Wikipedias"[1] (accepted for publication in the CSCW 2024 proceedings) examines the well-known case of the Croatian Wikipedia hijacking by far-right nationalists (from at least 2011 to 2020), and asks why the similarly situated Serbian, Bosnian and Serbo-Croatian Wikipedias managed to escape this fate.
As summarized in a post by the University of Washington's Center for an Informed Public (an interdisciplinary center involving UW's Information School, School of Law, and Department of Human Centered Design & Engineering), on the Croatian Wikipedia
[A] cabal [of nationalist editors] seized complete control of the governance of the encyclopedia, banned and blocked those who disagreed with them, and operated a network of fake accounts to give the appearance of grassroots support for their policies...
— CIP summary
This has already been documented in detail in a report commissioned by the Wikimedia Foundation (see e.g. prior Signpost coverage: "Croatian Wikipedia: capture and release", Disinformation report, 2021-06-27 and "Wikimedia Foundation builds 'Knowledge Integrity Risk Observatory' to enable communities to monitor at-risk Wikipedias", Recent research, 2022-11-28).
However, the present paper's findings go beyond that, focusing on the capture of governance on Croatian Wikipedia as distinguished from other language-group wikis where it did not happen, particularly the Serbian Wikipedia. The findings point at weak policies and norms that allowed capture to happen, especially the lack of policies around blocking, and the importance of integrity amongst the community's bureaucrats, who can grant and remove admin permissions.
The researchers used a grounded theory approach, specifically a "qualitative analysis of interview data with a range of participants in Croatian and Serbian Wikipedia and in the broader Wikipedia community" (15 interviews in total). Based on this,
... we arrived at three propositions that, together, help explain why Croatian Wikipedia succumbed to capture while Serbian Wikipedia did not:
1. Perceived Value as a Target. Is the project worth expending the effort to capture?
2. Bureaucratic Openness. How easy is it for contributors outside the core founding team to ascend to local governance positions?
3. Institutional Formalization. To what degree does the project prefer personalistic, informal forms of organization over formal ones?
We found that both Croatian Wikipedia and Serbian Wikipedia were attractive targets for far-right nationalist capture due to their sizable readership and resonance with a national identity. However, we also found that the two projects diverged early on in their trajectories in terms of how open they remained to new contributors ascending to local governance positions and the degree to which they privileged informal relationships over formal rules and processes as organizing principles of the project. Ultimately, Croatian’s relative lack of bureaucratic openness and rules constraining administrator behavior created a window of opportunity for a motivated contingent of editors to seize control of the governance mechanisms of the project.
The authors state in the paper that it is the first academic work they know of "that has considered how distributed influence operations target, become deeply engaged with, and are facilitated by institutional and organizational arrangements within peer production communities like Wikipedia".
Briefly
- See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
Other recent publications
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
- Compiled by Tilman Bayer
"From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?"
From the abstract:[2]
"[...] in most cases estimates of scientific reputation are based on composite or weighted indicators and absolute positions in university rankings. In this study, we adopt a more granular approach to assessment of universities' scientific performance using a multidimensional set of indicators from the Leiden Ranking and testing their individual effects on university [English] Wikipedia page views. We distinguish between international and local attention and find a positive association between research performance and Wikipedia attention which holds for regions and linguistic areas. Additional analysis shows that productivity, scientific impact, and international collaboration have a curvilinear effect on universities' Wikipedia attention. This finding suggests that there may be other factors than scientific reputation driving the general public's interest in universities."
"NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages"
From a Twitter/X thread by one of the authors of this preprint[3]:
"Scraped data such as from Wikipedia is vital for NLP, but how reliable is it in low-resource settings? [...]
We explore 2 methods of building a corpus for 12 underrepresented Indonesian languages: by human translation, and by doing free-form paragraph writing given a theme.
We then compare their quality vs Wikipedia text.
[Compared to] Wikipedia data, both Nusa Translation (NusaT) and Nusa Paragraph (NusaP) are generally more lexically diverse and use fewer loan words. We also realize that apparently some of the Wikipedia pages for low-resource languages are mostly boilerplate. [...]
To conclude:
- We release NusaT and NusaP, high-quality corpus for 12 underrepresented languages
- Underrepresented languages corpus from Wikipedia does not represent the true language distribution [...]"
"Loanword identification based on web resources: A case study on Wikipedia"
From the abstract:[4]
"To alleviate the resource scarcity and improve the robustness in loanword identification, the current study proposes a novel loanword identification method based on Wikipedia. In this paper, we first present how to obtain loanword candidate datasets and comparable corpora from Wikipedia. On the basis of these corpora, we develop a pseudo-data generation model for loanword identification tasks. And then we put forward a loanword identification model [...]"
From the introduction:
"In order to evaluate the performance of our method, we have applied it to different receipt languages (Uyghur, Chinese and English). Experimental results showed that the proposed method achieves the best performance compared with other baseline systems in all domains."
References
- ^ Kharazian, Zarine; Starbird, Kate; Hill, Benjamin Mako (2023-11-06), Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Serbo-Croatian Wikipedias, arXiv. Accepted for publication in Proceedings of the ACM on Human-Computer Interaction (CSCW 2024)
- ^ Arroyo‐Machado, Wenceslao; Díaz‐Faes, Adrián A.; Herrera‐Viedma, Enrique; Costas, Rodrigo (2023-11-23). "From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?". Journal of the Association for Information Science and Technology. doi:10.1002/asi.24856. ISSN 2330-1635.
- ^ Cahyawijaya, Samuel; Lovenia, Holy; Koto, Fajri; Adhista, Dea; Dave, Emmanuel; Oktavianti, Sarah; Akbar, Salsabil Maulana; Lee, Jhonson; Shadieq, Nuur; Cenggoro, Tjeng Wawan; Linuwih, Hanung Wahyuning; Wilie, Bryan; Muridan, Galih Pradipta; Winata, Genta Indra; Moeljadi, David; Aji, Alham Fikri; Purwarianti, Ayu; Fung, Pascale (2023-09-19), NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages, arXiv, doi:10.48550/arXiv.2309.10661 code and dataset
- ^ Mi, Chenggang (June 2023). "Loanword identification based on web resources: A case study on wikipedia". Computer Speech & Language. 81: 101517. doi:10.1016/j.csl.2023.101517. ISSN 0885-2308.
- Supplementary references and notes:
This page is a draft for the next issue of the Signpost. Below is some helpful code that will help you write and format a Signpost draft. If it's blank, you can fill out a template by copy-pasting this in and pressing 'publish changes': {{subst:Wikipedia:Wikipedia Signpost/Templates/Story-preload}}
Images and Galleries
|
---|
To put an image in your article, use the following template (link): This will create the file on the right. Keep the 300px in most cases. If writing a 'full width' article, change
Placing (link) will instead create an inline image like below [[File:|300px|center|alt=Placeholder alt text]]
To create a gallery, use the following to create |
Quotes
| |||
---|---|---|---|
To insert a framed quote like the one on the right, use this template (link): If writing a 'full width' article, change
To insert a pull quote like
use this template (link):
To insert a long inline quote like
use this template (link): |
Side frames
|
---|
Side frames help put content in sidebar vignettes. For instance, this one (link): gives the frame on the right. This is useful when you want to insert non-standard images, quotes, graphs, and the like.
For example, to insert the {{Graph:Chart}} generated by in a frame, simple put the graph code in to get the framed Graph:Chart on the right. If writing a 'full width' article, change |
Two-column vs full width styles
|
---|
If you keep the 'normal' preloaded draft and work from there, you will be using the two-column style. This is perfectly fine in most cases and you don't need to do anything. However, every time you have a However, you can also fine-tune which style is used at which point in an article. To switch from two-column → full width style midway in an article, insert where you want the switch to happen. To switch from full width → two-column style midway in an article, insert where you want the switch to happen. |
Article series
|
---|
To add a series of 'related articles' your article, use the following code or will create the sidebar on the right. If writing a 'full width' article, change Alternatively, you can use at the end of an article to create For more Signpost coverage on the visual editor see our visual editor series. If you think a topic would make a good series, but you don't see a tag for it, or that all the articles in a series seem 'old', ask for help at the WT:NEWSROOM. Many more tags exist, but they haven't been documented yet. |
Links and such
|
---|
By the way, the template that you're reading right now is {{Editnotices/Group/Wikipedia:Wikipedia Signpost/Next issue}} (edit). A list of the preload templates for Signpost articles can be found here. |
Discuss this story
While we don't have and explicit anti-regionalism policy we do have MOS:COMMONALITY which encourages use of English that will be understood across the English speaking world. This might be called an anti-insularity 'policy'. All the best: Rich Farmbrough 20:13, 31 January 2024 (UTC).[reply]
Comment: This was a hysterical read, considering the fact that Serbian Wikipedia absolutely did not escape a similar fate — IмSтevan talk 16:31, 2 February 2024 (UTC)[reply]
Comment:The quote "lack of bureaucratic openness and rules constraining administrator behavior" could more to the point be frased as; 'lack of accountability' Andrez1 (talk) 12:24, 5 February 2024 (UTC)[reply]