Jump to content

Wikipedia:Wikipedia Signpost/2023-01-01/Recent research: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Fixed a reference. Please see Category:CS1 errors: unsupported parameter.
No edit summary
Line 12: Line 12:
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-header-v2
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-header-v2
|{{{1|YOUR ARTICLE'S DESCRIPTIVE TITLE HERE<!-- REPLACE THIS-->}}}
|{{{1|YOUR ARTICLE'S DESCRIPTIVE TITLE HERE<!-- REPLACE THIS-->}}}
|By [[User:HaeB|Tilman Bayer]]
|By ...
}}
}}


Line 27: Line 27:
:''Reviewed by ...''
:''Reviewed by ...''


==="How to disagree well: Investigating the dispute tactics used on Wikipedia"===
=== ... ===
[[File:Graham's Hierarchy of Disagreement-en.svg|right|thumb|upright=1.8|Graham's hierarchy of disagreement]]
:''Reviewed by ....''

This paper,<ref>{{Cite conference| conference = Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing| pages = 3824–3837| last1 = Kock| first1 = Christine De| last2 = Vlachos| first2 = Andreas| title = How to disagree well: Investigating the dispute tactics used on Wikipedia| date = December 2022| url = https://preview.aclanthology.org/emnlp-22-ingestion/2022.emnlp-main.252/}}
[https://github.com/christinedekock11/wikitactics Data]</ref> presented earlier this month at the [[Empirical Methods in Natural Language Processing]] conference, applies a modified version of [[Paul_Graham_(programmer)#Graham's_hierarchy_of_disagreement|Graham's_hierarchy_of_disagreement]] to classify talk page comments on the English Wikipedia. As explained by the authors:
{{tqb|"[English] Wikipedia recommends the hierarchy of disagreement formulated by Graham (2008) as a guide for constructive dispute resolution [in the [[Wikipedia:Dispute resolution]] policy]. Graham’s hierarchy posits that there are seven levels of disagreement, ranging from namecalling (at the bottom) to refuting the central point. [...] Despite its popularity, this hierarchy has not been verified empirically."}}
The authors call these "rebuttal tactics", and distinguish them from a second categeory of dispute tactics, "attempts to promote understanding and consensus (referred to as coordination tactics)." Coordination tactics are classified with a separate set of "non-disagreement labels" which is combined from comment types identified in several previous research publications about Wikipedia talk pages (e.g. a paper by Ferschke et al. that was summarized in our March 2012 issue: "[[m:Research:Newsletter/2012/March#Understanding_collaboration-related_dialog_in_Simple_English_Wikipedia|Understanding_collaboration-related_dialog_in_Simple_English_Wikipedia]]").
* "Bailing out" ("An indication that an editor is giving up on a conversation and will no longer engage.")
* "Contextualisation" (where "an editor 'sets the stage; by describing which aspect of the article they are challenging. This does not directly disagree with anyone")
* "Asking questions"
* "Providing clarification"
* "Suggesting a compromise"
* "Coordinating edits" to the article page ("This can signal that a compromise has been found.")
* "Conceding / recanting"
* "I don’t know" (i.e. "Admitting that one is uncertain. This signals that an editor is receptive to the idea that there are unknowns which may impact their argument.")
* "Other"

The authors provide a [https://github.com/christinedekock11/wikitactics dataset] "of 213 disputes (comprising 3,865 utterances) on Wikipedia Talk pages, manually annotated with the dispute tactics employed in the process of resolving a disagreement between editors", allowing multiple labels for each comment ("up to three rebuttal strategies and two
resolution strategies per utterance").

These discussions are drawn from the authors' own "WikiDisputes" dataset, which provides information "which is annotated according to whether the dispute was resolved without the need for a moderator." This allows the researchers to identify relations between specific dispute tactics and the risk of a conversation escalating. For example, they
{{tqb|find that a lower mean rebuttal level in a disagreement is correlated with less constructive dispute resolutions, providing empirical validation of the ordering proposed by Graham (2008) and recommended by Wikipedia to its editors.}}

Futhermore, they examine the effect of personal attacks, finding e.g. that conversations can still recover after a personal attack happens:
{{tqb|"We define recovery in terms of having an utterance labeled as rebuttal level 5 or higher and no further personal attacks. By this definition, half of the disputes were found to recover after a personal attack, indicating
that personal attacks do not necessarily result in conversational failure."}}
Furthermore,
{{tqb|Of the escalated disputes with personal attacks, only 44.3% are found to recover, whereas 59.2% of resolved disputes recover post attack. This indicates that although personal attacks also occur in non-escalated disputes, participants are better adept at moving beyond them.

We further find that immediate retaliation (i.e. a personal attack being followed by another personal attack) occurred in 25.7% of cases. In disputes where at least one personal attack had occurred, the probability that the initial offender will re-offend in the same conversation is 53%, while the probability of another user using a personal attack at some point subsequently is 64%.}}

The study proceeds to use machine learning for automatically classifying talk page comments with these multi-labels. A BERT-based model performed best (according to three different performance metrics), but still struggled with some of the labels:
{{tqb|"The label most frequently correctly predicted is ''coordinating edits'' (111 of 137 cases), which is also the most common label in the training set. The next most correctly predicted label, proportionally, is ''contextualisation'' (75%, or 24 of 32 cases), despite not being a commonly used label. This is likely due to the additional positional
information available to the model, since this label is often applied to the first utterance in a conversation. On the other hand, ''refutation'' and ''refuting the central'' point are never correctly predicted (out of 44 cases), with ''counterargument'' often mistakenly predicted instead."}}

Lastly, they apply this to the separate task of predicting whether a conversation will escalate, already examined in their earlier paper that gave rise to the "WikiDisputes" dataset. Namely, they use "multitask training with escalation as the main task and tactics as the auxilliary task, such that the features that are predictive of dispute tactics are incorporated in the escalation predictions." This improves upon their earlier prediction algorithm, "indicating that knowledge of these dispute tactics is useful for tasks beyond classifying the tactics employed."


===Briefly===
===Briefly===
* See the [[mw:Wikimedia Research/Showcase|page of the monthly '''Wikimedia Research Showcase''']] for videos and slides of past presentations.
* See the [[mw:Wikimedia Research/Showcase|page of the monthly '''Wikimedia Research Showcase''']] for videos and slides of past presentations.
* The Wikimedia Foundation's Research team [https://research.wikimedia.org/report.html published its seventh biannual activity report].
* ...



===Other recent publications===
===Other recent publications===

Revision as of 22:11, 30 December 2022

Recent research

YOUR ARTICLE'S DESCRIPTIVE TITLE HERE


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


...

Reviewed by ...

...

Reviewed by ...

"How to disagree well: Investigating the dispute tactics used on Wikipedia"

Graham's hierarchy of disagreement

This paper,[1] presented earlier this month at the Empirical Methods in Natural Language Processing conference, applies a modified version of Graham's_hierarchy_of_disagreement to classify talk page comments on the English Wikipedia. As explained by the authors:

"[English] Wikipedia recommends the hierarchy of disagreement formulated by Graham (2008) as a guide for constructive dispute resolution [in the Wikipedia:Dispute resolution policy]. Graham’s hierarchy posits that there are seven levels of disagreement, ranging from namecalling (at the bottom) to refuting the central point. [...] Despite its popularity, this hierarchy has not been verified empirically."

The authors call these "rebuttal tactics", and distinguish them from a second categeory of dispute tactics, "attempts to promote understanding and consensus (referred to as coordination tactics)." Coordination tactics are classified with a separate set of "non-disagreement labels" which is combined from comment types identified in several previous research publications about Wikipedia talk pages (e.g. a paper by Ferschke et al. that was summarized in our March 2012 issue: "Understanding_collaboration-related_dialog_in_Simple_English_Wikipedia").

  • "Bailing out" ("An indication that an editor is giving up on a conversation and will no longer engage.")
  • "Contextualisation" (where "an editor 'sets the stage; by describing which aspect of the article they are challenging. This does not directly disagree with anyone")
  • "Asking questions"
  • "Providing clarification"
  • "Suggesting a compromise"
  • "Coordinating edits" to the article page ("This can signal that a compromise has been found.")
  • "Conceding / recanting"
  • "I don’t know" (i.e. "Admitting that one is uncertain. This signals that an editor is receptive to the idea that there are unknowns which may impact their argument.")
  • "Other"

The authors provide a dataset "of 213 disputes (comprising 3,865 utterances) on Wikipedia Talk pages, manually annotated with the dispute tactics employed in the process of resolving a disagreement between editors", allowing multiple labels for each comment ("up to three rebuttal strategies and two resolution strategies per utterance").

These discussions are drawn from the authors' own "WikiDisputes" dataset, which provides information "which is annotated according to whether the dispute was resolved without the need for a moderator." This allows the researchers to identify relations between specific dispute tactics and the risk of a conversation escalating. For example, they

find that a lower mean rebuttal level in a disagreement is correlated with less constructive dispute resolutions, providing empirical validation of the ordering proposed by Graham (2008) and recommended by Wikipedia to its editors.

Futhermore, they examine the effect of personal attacks, finding e.g. that conversations can still recover after a personal attack happens:

"We define recovery in terms of having an utterance labeled as rebuttal level 5 or higher and no further personal attacks. By this definition, half of the disputes were found to recover after a personal attack, indicating that personal attacks do not necessarily result in conversational failure."

Furthermore,

Of the escalated disputes with personal attacks, only 44.3% are found to recover, whereas 59.2% of resolved disputes recover post attack. This indicates that although personal attacks also occur in non-escalated disputes, participants are better adept at moving beyond them. We further find that immediate retaliation (i.e. a personal attack being followed by another personal attack) occurred in 25.7% of cases. In disputes where at least one personal attack had occurred, the probability that the initial offender will re-offend in the same conversation is 53%, while the probability of another user using a personal attack at some point subsequently is 64%.

The study proceeds to use machine learning for automatically classifying talk page comments with these multi-labels. A BERT-based model performed best (according to three different performance metrics), but still struggled with some of the labels:

"The label most frequently correctly predicted is coordinating edits (111 of 137 cases), which is also the most common label in the training set. The next most correctly predicted label, proportionally, is contextualisation (75%, or 24 of 32 cases), despite not being a commonly used label. This is likely due to the additional positional information available to the model, since this label is often applied to the first utterance in a conversation. On the other hand, refutation and refuting the central point are never correctly predicted (out of 44 cases), with counterargument often mistakenly predicted instead."

Lastly, they apply this to the separate task of predicting whether a conversation will escalate, already examined in their earlier paper that gave rise to the "WikiDisputes" dataset. Namely, they use "multitask training with escalation as the main task and tactics as the auxilliary task, such that the features that are predictive of dispute tactics are incorporated in the escalation predictions." This improves upon their earlier prediction algorithm, "indicating that knowledge of these dispute tactics is useful for tasks beyond classifying the tactics employed."

Briefly


Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by ...

"Analyzing Digital Discourses: Between Convergence and Controversy"

From the abstract:[2]

"This study analyses Wikipedia’s sites for negotiating convergence, conflict and identity, concentrating on two aspects. First, convergence and conflict at the macro-level of intercultural comparison are investigated using the example of the construction of concepts of nationalism, citizenship, identity and tribe in their English and German language versions. Second, the English articles serve as a basis to examine the types of convergence and conflict tendencies at the micro-level of the Talk-section."

From the paper's section on talk pages:

"[...] in our data, criticism of content (81 instances/31% of all 259 conflictual codings) is the most frequent conflictual category [...], followed by general metapragmatic criticism concerning clarity and more general stylistic features [...], metapragmatic criticism related to Wikipedia's principles (each comprising about half of the total of 81 metapragmatic tokens), or a mixture of both [...].

Giving reasons for disagreeing is the mitigating strategy used most frequently in all for Talk1-sections, followed by suggesting, inviting and hedged imperatives to induce further improvement of an article, agreement and additional explanation to clarify an issue [...]."

"..."

From the abstract:

...

"Building a Public Domain Voice Database for Odia"

From the abstract and paper:[3]

"The pilot detailed in this paper is about creating a large freely-licensed public repository of transcribed speech in the Odia language as such a repository was not known to be available. The strategy and methodology behind this process are based on the OpenSpeaks project [which is hosted on the English Wikiversity at https://en.wikiversity.org/wiki/OpenSpeaks ].

"The 'Methodology' section details the process of collecting words [from a dump of Odia Wikipedia], compiling a wordlist [making use of Wikidata lexeme forms to generate additional forms], recording the pronunciation of those words, and uploading the speech data to Wikimedia Commons using Lingua Libre."


References

  1. ^ Kock, Christine De; Vlachos, Andreas (December 2022). How to disagree well: Investigating the dispute tactics used on Wikipedia. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 3824–3837. Data
  2. ^ Kleinke, Sonja; Landmann, Julia (2021). "Cross-Cultural Observations on English and German Wikipedia Entries at the Interface of Convergence and Controversy". In Johansson, Marjut; Tanskanen, Sanna-Kaisa; Chovanec, Jan (eds.). Analyzing Digital Discourses: Between Convergence and Controversy. Cham: Springer International Publishing. pp. 135–162. ISBN 9783030846022. Closed access icon Google Books
  3. ^ Panigrahi, Subhashish (2022-04-25). "Building a Public Domain Voice Database for Odia" (PDF). Companion Proceedings of the Web Conference 2022. WWW '22. New York, NY, USA: Association for Computing Machinery. pp. 1331–1338. doi:10.1145/3487553.3524931. ISBN 9781450391306.
Supplementary references and notes:

This page is a draft for the next issue of the Signpost. Below is some helpful code that will help you write and format a Signpost draft. If it's blank, you can fill out a template by copy-pasting this in and pressing 'publish changes': {{subst:Wikipedia:Wikipedia Signpost/Templates/Story-preload}}


Images and Galleries
Sidebar images

To put an image in your article, use the following template (link):

[[File:|center|300px|alt=TKTK]]

O frabjous day.
{{Wikipedia:Wikipedia Signpost/Templates/Filler image-v2
 |image     = 
 |size      = 300px
 |alt       = TKTK
 |caption   = 
 |fullwidth = no
}}

This will create the file on the right. Keep the 300px in most cases. If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Inline images

Placing

{{Wikipedia:Wikipedia Signpost/Templates/Inline image
 |image   =
 |size    = 300px
 |align   = center
 |alt     = Placeholder alt text
 |caption = CAPTION
}}

(link) will instead create an inline image like below

[[File:|300px|center|alt=Placeholder alt text]]
CAPTION
Galleries

To create a gallery, use the following

<gallery mode = packed | heights = 200px>
|Caption for second image
</gallery>

to create

Quotes
Framed quotes

To insert a framed quote like the one on the right, use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/Filler quote-v2
 |1         = 
 |author    = 
 |source    = 
 |fullwidth = 
}}

If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Pull quotes

To insert a pull quote like

use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/Quote
 |1         = 
 |source    = 
}}
Long quotes

To insert a long inline quote like

The goose is on the loose! The geese are on the lease!
— User:Oscar Wilde
— Quotations Notes from the Underpoop

use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/block quote
 | text   = 
 | by     = 
 | source = 
 | ts     = 
 | oldid  = 
}}
Side frames

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

A caption

Side frames help put content in sidebar vignettes. For instance, this one (link):

{{Wikipedia:Wikipedia Signpost/Templates/Filler frame-v2
 |1         = Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 |caption   = A caption
 |fullwidth = no
}}

gives the frame on the right. This is useful when you want to insert non-standard images, quotes, graphs, and the like.

Example − Graph/Charts
A caption

For example, to insert the {{Graph:Chart}} generated by

{{Graph:Chart
 |width=250|height=100|type=line
 |x=1,2,3,4,5,6,7,8|y=10,12,6,14,2,10,7,9
}}

in a frame, simple put the graph code in |1=

{{Wikipedia:Wikipedia Signpost/Templates/Filler frame-v2
 |1=
{{Graph:Chart
 |width=250|height=100|type=line
 |x=1,2,3,4,5,6,7,8|y=10,12,6,14,2,10,7,9
}}
 |caption=A caption
 |fullwidth=no
}}

to get the framed Graph:Chart on the right.

If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Two-column vs full width styles

If you keep the 'normal' preloaded draft and work from there, you will be using the two-column style. This is perfectly fine in most cases and you don't need to do anything.

However, every time you have a |fullwidth=no and change it to |fullwidth=yes (or vice-versa), the article will take that style from that point onwards (|fullwidth=yes → full width, |fullwidth=no → two-column). By default, omitting |fullwidth= is the same as putting |fullwidth=no and the article will have two columns after that. Again, this is perfectly fine in most cases, and you don't need to do anything.

However, you can also fine-tune which style is used at which point in an article.

To switch from two-column → full width style midway in an article, insert

{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-start-v2|fullwidth=yes}}

where you want the switch to happen.

To switch from full width → two-column style midway in an article, insert

{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-start-v2|fullwidth=no}}

where you want the switch to happen.

Article series

To add a series of 'related articles' your article, use the following code

Related articles
Visual Editor

Five, ten, and fifteen years ago
1 January 2023

VisualEditor, endowment, science, and news in brief
5 August 2015

HTTPS-only rollout completed, proposal to enable VisualEditor for new accounts
17 June 2015

VisualEditor and MediaWiki updates
29 April 2015

Security issue fixed; VisualEditor changes
4 February 2015


More articles

{{Signpost series
 |type        = sidebar-v2
 |tag         = VisualEditor
 |seriestitle = Visual Editor
 |fullwidth   = no
}}

or

{{Signpost series
 |type        = sidebar-v2
 |tag         = VisualEditor
 |seriestitle = Visual Editor
 |fullwidth   = yes
}}

will create the sidebar on the right. If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes. A partial list of valid |tag= parameters can be found at here and will decide the list of articles presented. |seriestitle= is the title that will appear below 'Related articles' in the box.

Alternatively, you can use

{{Signpost series
 |type        = inline
 |tag         = VisualEditor
 |tag_name    = visual editor
 |tag_pretext = the
}}

at the end of an article to create

For more Signpost coverage on the visual editor see our visual editor series.

If you think a topic would make a good series, but you don't see a tag for it, or that all the articles in a series seem 'old', ask for help at the WT:NEWSROOM. Many more tags exist, but they haven't been documented yet.

Links and such

By the way, the template that you're reading right now is {{Editnotices/Group/Wikipedia:Wikipedia Signpost/Next issue}} (edit). A list of the preload templates for Signpost articles can be found here.