Jump to content

Wikipedia:Wikipedia Signpost/2024-01-31/Recent research: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
augment hrwiki paper review (CC Bri)
No edit summary
Line 52: Line 52:
The authors state in the paper that it is the first academic work they know of "that has considered how distributed [[influence operations]] target, become deeply engaged with, and are facilitated by institutional and organizational arrangements within [[peer production]] communities like Wikipedia".
The authors state in the paper that it is the first academic work they know of "that has considered how distributed [[influence operations]] target, become deeply engaged with, and are facilitated by institutional and organizational arrangements within [[peer production]] communities like Wikipedia".


=== ... ===
:''Reviewed by ...''

=== ... ===
:''Reviewed by ....''


===Briefly===
===Briefly===
* See the [[mw:Wikimedia Research/Showcase|page of the monthly '''Wikimedia Research Showcase''']] for videos and slides of past presentations.
* See the [[mw:Wikimedia Research/Showcase|page of the monthly '''Wikimedia Research Showcase''']] for videos and slides of past presentations.
* ...


===Other recent publications===
===Other recent publications===
''Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, [[m:Research:Newsletter#How to contribute|are always welcome]].''
''Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, [[m:Research:Newsletter#How to contribute|are always welcome]].''
:<small>''Compiled by ...''</small>
:<small>''Compiled by [[User:HaeB|Tilman Bayer]]''</small>


===="..."====
From the abstract:
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
...</blockquote>


===="..."====
From the abstract:
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
...</blockquote>


===="From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?"====
===="..."====
From the abstract:<ref>{{Cite journal| doi = 10.1002/asi.24856| issn = 2330-1635| last1 = Arroyo‐Machado| first1 = Wenceslao| last2 = Díaz‐Faes| first2 = Adrián A.| last3 = Herrera‐Viedma| first3 = Enrique| last4 = Costas| first4 = Rodrigo| title = From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?| journal = Journal of the Association for Information Science and Technology| date = 2023-11-23| url = https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24856}}</ref>
From the abstract:
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
"[...] in most cases estimates of scientific reputation are based on composite or weighted indicators and absolute positions in university rankings. In this study, we adopt a more granular approach to assessment of universities' scientific performance using a multidimensional set of indicators from the [[CWTS Leiden Ranking|Leiden Ranking]] and testing their individual effects on university [English] Wikipedia page views. We distinguish between international and local attention and find a positive association between research performance and Wikipedia attention which holds for regions and linguistic areas. Additional analysis shows that productivity, scientific impact, and international collaboration have a curvilinear effect on universities' Wikipedia attention. This finding suggests that there may be other factors than scientific reputation driving the general public's interest in universities."</blockquote>


===="NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages"====
From [https://threadreaderapp.com/thread/1718786247139922024.html a Twitter/X thread] by one of the authors of this preprint<ref>{{Cite| publisher = arXiv| doi = 10.48550/arXiv.2309.10661| last1 = Cahyawijaya| first1 = Samuel| last2 = Lovenia| first2 = Holy| last3 = Koto| first3 = Fajri| last4 = Adhista| first4 = Dea| last5 = Dave| first5 = Emmanuel| last6 = Oktavianti| first6 = Sarah| last7 = Akbar| first7 = Salsabil Maulana| last8 = Lee| first8 = Jhonson| last9 = Shadieq| first9 = Nuur| last10 = Cenggoro| first10 = Tjeng Wawan| last11 = Linuwih| first11 = Hanung Wahyuning| last12 = Wilie| first12 = Bryan| last13 = Muridan| first13 = Galih Pradipta| last14 = Winata| first14 = Genta Indra| last15 = Moeljadi| first15 = David| last16 = Aji| first16 = Alham Fikri| last17 = Purwarianti| first17 = Ayu| last18 = Fung| first18 = Pascale| title = NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages| date = 2023-09-19| url = http://arxiv.org/abs/2309.10661}} [https://github.com/IndoNLP/nusa-writes code and dataset]</ref>:
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
"Scraped data such as from Wikipedia is vital for NLP, but how reliable is it in low-resource settings? [...]<br>
...</blockquote>
We explore 2 methods of building a corpus for 12 underrepresented Indonesian languages: by human translation, and by doing free-form paragraph writing given a theme.<br>
We then compare their quality vs Wikipedia text.<br>
[Compared to] Wikipedia data, both Nusa Translation (NusaT) and Nusa Paragraph (NusaP) are generally more lexically diverse and use fewer loan words.
We also realize that apparently some of the Wikipedia pages for low-resource languages are mostly boilerplate. [...]<br>
To conclude:<br>
- We release NusaT and NusaP, high-quality corpus for 12 underrepresented languages<br>
- Underrepresented languages corpus from Wikipedia does not represent the true language distribution [...]"</blockquote>



===="Loanword identification based on web resources: A case study on Wikipedia"====
From the abstract:<ref>{{Cite journal| doi = 10.1016/j.csl.2023.101517| issn = 08852308| volume = 81| pages = 101517| last = Mi| first = Chenggang| title = Loanword identification based on web resources: A case study on wikipedia| journal = Computer Speech & Language| date = June 2023| url = https://linkinghub.elsevier.com/retrieve/pii/S0885230823000360}} {{closed access}}</ref>
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">"To alleviate the resource scarcity and improve the robustness in [[loanword]] identification, the current study proposes a novel loanword identification method based on Wikipedia. In this paper, we first present how to obtain loanword candidate datasets and comparable corpora from Wikipedia. On the basis of these corpora, we develop a pseudo-data generation model for loanword identification tasks. And then we put forward a loanword identification model [...]"
</blockquote>
From the introduction:
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">"In order to evaluate the performance of our method, we have applied it to different receipt languages (Uyghur, Chinese and English). Experimental results showed that the proposed method achieves the best performance compared with other baseline systems in all domains."
</blockquote>




===References===
===References===

Revision as of 20:17, 28 January 2024

Recent research

YOUR ARTICLE'S DESCRIPTIVE TITLE HERE


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


A "lack of bureaucratic openness and rules constraining administrator behavior" enabled nationalist takeover of Croatian Wikipedia

Reviewed by Bri and Tilman Bayer
Presentation at Wikimania 2023 about the findings

A paper titled "Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Serbo-Croatian Wikipedias"[1] (accepted for publication in the CSCW 2024 proceedings) examines the well-known case of the Croatian Wikipedia hijacking by far-right nationalists (from at least 2011 to 2020), and asks why the similarly situated Serbian, Bosnian and Serbo-Croatian Wikipedias managed to escape this fate.

As summarized in a post by the University of Washington's Center for an Informed Public (an interdisciplinary center involving UW's Information School, School of Law, and Department of Human Centered Design & Engineering), on the Croatian Wikipedia

[A] cabal [of nationalist editors] seized complete control of the governance of the encyclopedia, banned and blocked those who disagreed with them, and operated a network of fake accounts to give the appearance of grassroots support for their policies...
— CIP summary

This has already been documented in detail in a report commissioned by the Wikimedia Foundation (see e.g. prior Signpost coverage: "Croatian Wikipedia: capture and release", Disinformation report, 2021-06-27 and "Wikimedia Foundation builds 'Knowledge Integrity Risk Observatory' to enable communities to monitor at-risk Wikipedias", Recent research, 2022-11-28).

However, the present paper's findings go beyond that, focusing on the capture of governance on Croatian Wikipedia as distinguished from other language-group wikis where it did not happen, particularly the Serbian Wikipedia. The findings point at weak policies and norms that allowed capture to happen, especially the lack of policies around blocking, and the importance of integrity amongst the community's bureaucrats, who can grant and remove admin permissions.

The researchers used a grounded theory approach, specifically a "qualitative analysis of interview data with a range of participants in Croatian and Serbian Wikipedia and in the broader Wikipedia community" (15 interviews in total). Based on this,

... we arrived at three propositions that, together, help explain why Croatian Wikipedia succumbed to capture while Serbian Wikipedia did not:

1. Perceived Value as a Target. Is the project worth expending the effort to capture?

2. Bureaucratic Openness. How easy is it for contributors outside the core founding team to ascend to local governance positions?

3. Institutional Formalization. To what degree does the project prefer personalistic, informal forms of organization over formal ones?

We found that both Croatian Wikipedia and Serbian Wikipedia were attractive targets for far-right nationalist capture due to their sizable readership and resonance with a national identity. However, we also found that the two projects diverged early on in their trajectories in terms of how open they remained to new contributors ascending to local governance positions and the degree to which they privileged informal relationships over formal rules and processes as organizing principles of the project. Ultimately, Croatian’s relative lack of bureaucratic openness and rules constraining administrator behavior created a window of opportunity for a motivated contingent of editors to seize control of the governance mechanisms of the project.


— CIP summary

The authors state in the paper that it is the first academic work they know of "that has considered how distributed influence operations target, become deeply engaged with, and are facilitated by institutional and organizational arrangements within peer production communities like Wikipedia".


Briefly

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer


"From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?"

From the abstract:[2]

"[...] in most cases estimates of scientific reputation are based on composite or weighted indicators and absolute positions in university rankings. In this study, we adopt a more granular approach to assessment of universities' scientific performance using a multidimensional set of indicators from the Leiden Ranking and testing their individual effects on university [English] Wikipedia page views. We distinguish between international and local attention and find a positive association between research performance and Wikipedia attention which holds for regions and linguistic areas. Additional analysis shows that productivity, scientific impact, and international collaboration have a curvilinear effect on universities' Wikipedia attention. This finding suggests that there may be other factors than scientific reputation driving the general public's interest in universities."


"NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages"

From a Twitter/X thread by one of the authors of this preprint[3]:

"Scraped data such as from Wikipedia is vital for NLP, but how reliable is it in low-resource settings? [...]
We explore 2 methods of building a corpus for 12 underrepresented Indonesian languages: by human translation, and by doing free-form paragraph writing given a theme.
We then compare their quality vs Wikipedia text.
[Compared to] Wikipedia data, both Nusa Translation (NusaT) and Nusa Paragraph (NusaP) are generally more lexically diverse and use fewer loan words. We also realize that apparently some of the Wikipedia pages for low-resource languages are mostly boilerplate. [...]
To conclude:
- We release NusaT and NusaP, high-quality corpus for 12 underrepresented languages

- Underrepresented languages corpus from Wikipedia does not represent the true language distribution [...]"


"Loanword identification based on web resources: A case study on Wikipedia"

From the abstract:[4]

"To alleviate the resource scarcity and improve the robustness in loanword identification, the current study proposes a novel loanword identification method based on Wikipedia. In this paper, we first present how to obtain loanword candidate datasets and comparable corpora from Wikipedia. On the basis of these corpora, we develop a pseudo-data generation model for loanword identification tasks. And then we put forward a loanword identification model [...]"

From the introduction:

"In order to evaluate the performance of our method, we have applied it to different receipt languages (Uyghur, Chinese and English). Experimental results showed that the proposed method achieves the best performance compared with other baseline systems in all domains."


References

  1. ^ Kharazian, Zarine; Starbird, Kate; Hill, Benjamin Mako (2023-11-06), Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Serbo-Croatian Wikipedias, arXiv. Accepted for publication in Proceedings of the ACM on Human-Computer Interaction (CSCW 2024)
  2. ^ Arroyo‐Machado, Wenceslao; Díaz‐Faes, Adrián A.; Herrera‐Viedma, Enrique; Costas, Rodrigo (2023-11-23). "From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?". Journal of the Association for Information Science and Technology. doi:10.1002/asi.24856. ISSN 2330-1635.
  3. ^ Cahyawijaya, Samuel; Lovenia, Holy; Koto, Fajri; Adhista, Dea; Dave, Emmanuel; Oktavianti, Sarah; Akbar, Salsabil Maulana; Lee, Jhonson; Shadieq, Nuur; Cenggoro, Tjeng Wawan; Linuwih, Hanung Wahyuning; Wilie, Bryan; Muridan, Galih Pradipta; Winata, Genta Indra; Moeljadi, David; Aji, Alham Fikri; Purwarianti, Ayu; Fung, Pascale (2023-09-19), NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages, arXiv, doi:10.48550/arXiv.2309.10661 code and dataset
  4. ^ Mi, Chenggang (June 2023). "Loanword identification based on web resources: A case study on wikipedia". Computer Speech & Language. 81: 101517. doi:10.1016/j.csl.2023.101517. ISSN 0885-2308. Closed access icon
Supplementary references and notes:

This page is a draft for the next issue of the Signpost. Below is some helpful code that will help you write and format a Signpost draft. If it's blank, you can fill out a template by copy-pasting this in and pressing 'publish changes': {{subst:Wikipedia:Wikipedia Signpost/Templates/Story-preload}}


Images and Galleries
Sidebar images

To put an image in your article, use the following template (link):

[[File:|center|300px|alt=TKTK]]

O frabjous day.
{{Wikipedia:Wikipedia Signpost/Templates/Filler image-v2
 |image     = 
 |size      = 300px
 |alt       = TKTK
 |caption   = 
 |fullwidth = no
}}

This will create the file on the right. Keep the 300px in most cases. If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Inline images

Placing

{{Wikipedia:Wikipedia Signpost/Templates/Inline image
 |image   =
 |size    = 300px
 |align   = center
 |alt     = Placeholder alt text
 |caption = CAPTION
}}

(link) will instead create an inline image like below

[[File:|300px|center|alt=Placeholder alt text]]
CAPTION
Galleries

To create a gallery, use the following

<gallery mode = packed | heights = 200px>
|Caption for second image
</gallery>

to create

Quotes
Framed quotes

To insert a framed quote like the one on the right, use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/Filler quote-v2
 |1         = 
 |author    = 
 |source    = 
 |fullwidth = 
}}

If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Pull quotes

To insert a pull quote like

use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/Quote
 |1         = 
 |source    = 
}}
Long quotes

To insert a long inline quote like

The goose is on the loose! The geese are on the lease!
— User:Oscar Wilde
— Quotations Notes from the Underpoop

use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/block quote
 | text   = 
 | by     = 
 | source = 
 | ts     = 
 | oldid  = 
}}
Side frames

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

A caption

Side frames help put content in sidebar vignettes. For instance, this one (link):

{{Wikipedia:Wikipedia Signpost/Templates/Filler frame-v2
 |1         = Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 |caption   = A caption
 |fullwidth = no
}}

gives the frame on the right. This is useful when you want to insert non-standard images, quotes, graphs, and the like.

Example − Graph/Charts
A caption

For example, to insert the {{Graph:Chart}} generated by

{{Graph:Chart
 |width=250|height=100|type=line
 |x=1,2,3,4,5,6,7,8|y=10,12,6,14,2,10,7,9
}}

in a frame, simple put the graph code in |1=

{{Wikipedia:Wikipedia Signpost/Templates/Filler frame-v2
 |1=
{{Graph:Chart
 |width=250|height=100|type=line
 |x=1,2,3,4,5,6,7,8|y=10,12,6,14,2,10,7,9
}}
 |caption=A caption
 |fullwidth=no
}}

to get the framed Graph:Chart on the right.

If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Two-column vs full width styles

If you keep the 'normal' preloaded draft and work from there, you will be using the two-column style. This is perfectly fine in most cases and you don't need to do anything.

However, every time you have a |fullwidth=no and change it to |fullwidth=yes (or vice-versa), the article will take that style from that point onwards (|fullwidth=yes → full width, |fullwidth=no → two-column). By default, omitting |fullwidth= is the same as putting |fullwidth=no and the article will have two columns after that. Again, this is perfectly fine in most cases, and you don't need to do anything.

However, you can also fine-tune which style is used at which point in an article.

To switch from two-column → full width style midway in an article, insert

{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-start-v2|fullwidth=yes}}

where you want the switch to happen.

To switch from full width → two-column style midway in an article, insert

{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-start-v2|fullwidth=no}}

where you want the switch to happen.

Article series

To add a series of 'related articles' your article, use the following code

Related articles
Visual Editor

Five, ten, and fifteen years ago
1 January 2023

VisualEditor, endowment, science, and news in brief
5 August 2015

HTTPS-only rollout completed, proposal to enable VisualEditor for new accounts
17 June 2015

VisualEditor and MediaWiki updates
29 April 2015

Security issue fixed; VisualEditor changes
4 February 2015


More articles

{{Signpost series
 |type        = sidebar-v2
 |tag         = VisualEditor
 |seriestitle = Visual Editor
 |fullwidth   = no
}}

or

{{Signpost series
 |type        = sidebar-v2
 |tag         = VisualEditor
 |seriestitle = Visual Editor
 |fullwidth   = yes
}}

will create the sidebar on the right. If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes. A partial list of valid |tag= parameters can be found at here and will decide the list of articles presented. |seriestitle= is the title that will appear below 'Related articles' in the box.

Alternatively, you can use

{{Signpost series
 |type        = inline
 |tag         = VisualEditor
 |tag_name    = visual editor
 |tag_pretext = the
}}

at the end of an article to create

For more Signpost coverage on the visual editor see our visual editor series.

If you think a topic would make a good series, but you don't see a tag for it, or that all the articles in a series seem 'old', ask for help at the WT:NEWSROOM. Many more tags exist, but they haven't been documented yet.

Links and such

By the way, the template that you're reading right now is {{Editnotices/Group/Wikipedia:Wikipedia Signpost/Next issue}} (edit). A list of the preload templates for Signpost articles can be found here.