Wikipedia:Wikipedia Signpost/2023-06-19/Recent research: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
hoaxes, list item
wasn't clear to me from the paper, but looking at the code they did indeed use the initial revision (as one should)
Line 32: Line 32:




Before getting to their main research question about the relation between online attention to a topic (operationalized using Wikipedia pageviews) and disinformation, the authors examine the characteristics of these confirmed hoax articles in various features, concluding that compared to a cohort of non-hoax articles started on the same day,
Before getting to their main research question about the relation between online attention to a topic (operationalized using Wikipedia pageviews) and disinformation, the authors compare how these confirmed hoax articles (in their [https://github.com/CSDL-UMD/wikihoaxes/blob/main/data.ipynb initial revision]) differ in various content features from a cohort of non-hoax articles started on the same day, concluding that
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
"[...] hoaxes tend to have more plain text than legitimate articles and fewer links to external web pages outside of Wikipedia. This means that non-hoax articles, in general, contain more references to links residing outside Wikipedia. Such behavior is expected as a hoax’s author would need to put a significant effort into crafting external resources at which the hoax can point."
"[...] hoaxes tend to have more plain text than legitimate articles and fewer links to external web pages outside of Wikipedia. This means that non-hoax articles, in general, contain more references to links residing outside Wikipedia. Such behavior is expected as a hoax’s author would need to put a significant effort into crafting external resources at which the hoax can point."
</blockquote>
</blockquote>
In other words, successful hoaxers may display an anti-[[FUTON bias]].
<!--TKTK mention bias issues as articles on the hoax list are no longer being edited-->


To quantify the attention of a topic area that an article pertains to (even before it is created), the authors use its "wiki-link neighbors":
To quantify the attention of a topic area that an article pertains to (even before it is created), the authors use its "wiki-link neighbors":
Line 46: Line 46:
"To understand the nature of the relationship between the creation of hoaxes and the attention their respective topics receive, we first seek to quantify the relative volume change [in attention] before and after this creation day. Here, a topic is defined as all of the (non-hoax) neighbors linked within the contents of an article i.e., its (non-hoax) out-links."
"To understand the nature of the relationship between the creation of hoaxes and the attention their respective topics receive, we first seek to quantify the relative volume change [in attention] before and after this creation day. Here, a topic is defined as all of the (non-hoax) neighbors linked within the contents of an article i.e., its (non-hoax) out-links."
</blockquote>
</blockquote>
This "volume change" is calculated as the change in median pageviews among the neighboring articles, using the timespans from 7 days before and 7 days after the creation of an article, to account for "short spikes in attention and weekly changes in traffic"(a somewhat simplistic way of handling this kind of [[time-series analysis]], compared to [[m:Research:Newsletter/2019/December#An_awareness_campaign_in_India_did_not_affect_Wikipedia_pageviews,_but_a_new_software_feature_did|some]] [[m:Research:Newsletter/2015/September#Predicting_Wikimedia_pageviews_with_2%_accuracy|earlier]] research on Wikipedia traffic). The authors limited themselves to an older pageviews dataset that is only available for the time from 2007 to 2016, reducing their sample for this part of the analysis to 83 hoaxes. 75 of those exhibited a greater attention change than their cohort (of non-hoax articles started on the same day). Despite the relatively small sample size, this finding was judged to be statistically significant (more precisely, the authors find "a bootstrapped 95% confidence interval of (0.1227, 0.1234)" for the difference between hoax and non-hoax articles, far away from zero). In conclusion, this "indicates that the generation of hoaxes in Wikipedia is associated with prior consumption of information, in the form of online attention."
This "volume change" is calculated as the change in median pageviews among the neighboring articles, using the timespans from 7 days before and 7 days after the creation of an article, to account for "short spikes in attention and weekly changes in traffic" (a somewhat simplistic way of handling this kind of [[time-series analysis]], compared to [[m:Research:Newsletter/2019/December#An_awareness_campaign_in_India_did_not_affect_Wikipedia_pageviews,_but_a_new_software_feature_did|some]] [[m:Research:Newsletter/2015/September#Predicting_Wikimedia_pageviews_with_2%_accuracy|earlier]] research on Wikipedia traffic). The authors limited themselves to an older pageviews dataset that is only available for the time from 2007 to 2016, reducing their sample for this part of the analysis to 83 hoaxes. 75 of those exhibited a greater attention change than their cohort (of non-hoax articles started on the same day). Despite the relatively small sample size, this finding was judged to be statistically significant (more precisely, the authors find "a bootstrapped 95% confidence interval of (0.1227, 0.1234)" for the difference between hoax and non-hoax articles, far away from zero). In conclusion, this "indicates that the generation of hoaxes in Wikipedia is associated with prior consumption of information, in the form of online attention."


===Other recent publications===
===Other recent publications===

Revision as of 21:08, 18 June 2023

Recent research

Hoaxers prefer currently popular topics


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


New articles about currently popular topics are more likely to be hoaxes

A study titled "The role of online attention in the supply of disinformation in Wikipedia"[1] finds that

"[...] compared to legitimate articles created on the same day, hoaxes [on English Wikipedia] tend to be more associated with traffic spikes preceding their creation. This is consistent with the idea that the supply of false or misleading information on a topic is driven by the attention it receives. [... a finding that the authors hope] could help promote the integrity of knowledge on Wikipedia."

The authors remark that "little is known about Wikipedia hoaxes" in the research literature, with only one previous paper focusing on this topic (Kumar et al., who among other findings had reported in 2016 that "while most hoaxes are detected quickly and have little impact on Wikipedia, a small number of hoaxes survive long and are well cited across the Web"). In contrast to that earlier study (which had access to the presumably more extensive new pages patrol data about deleted articles), the authors use base their analysis on the list of hoaxes curated by the community. Graphing these confirmed articles by year of creating, they note in passing that "the majority of hoaxes [are] appearing in the period 2005–2007, and a decline starting in 2008. This observed behavior can be in part explained by the fact that the Wikipedia community started patrolling new pages in November of 2007 [...] and is also consistent with the well-known peak of activity of the English Wikipedia community [...]."


Before getting to their main research question about the relation between online attention to a topic (operationalized using Wikipedia pageviews) and disinformation, the authors compare how these confirmed hoax articles (in their initial revision) differ in various content features from a cohort of non-hoax articles started on the same day, concluding that

"[...] hoaxes tend to have more plain text than legitimate articles and fewer links to external web pages outside of Wikipedia. This means that non-hoax articles, in general, contain more references to links residing outside Wikipedia. Such behavior is expected as a hoax’s author would need to put a significant effort into crafting external resources at which the hoax can point."

In other words, successful hoaxers may display an anti-FUTON bias.

To quantify the attention of a topic area that an article pertains to (even before it is created), the authors use its "wiki-link neighbors":

"The presence of a link between two Wikipedia entries is an indication that they are semantically related. Therefore, traffic to these neighbors gives us a rough measure of the level of online attention to a topic before a new [Wikipedia article] is created."

"To understand the nature of the relationship between the creation of hoaxes and the attention their respective topics receive, we first seek to quantify the relative volume change [in attention] before and after this creation day. Here, a topic is defined as all of the (non-hoax) neighbors linked within the contents of an article i.e., its (non-hoax) out-links."

This "volume change" is calculated as the change in median pageviews among the neighboring articles, using the timespans from 7 days before and 7 days after the creation of an article, to account for "short spikes in attention and weekly changes in traffic" (a somewhat simplistic way of handling this kind of time-series analysis, compared to some earlier research on Wikipedia traffic). The authors limited themselves to an older pageviews dataset that is only available for the time from 2007 to 2016, reducing their sample for this part of the analysis to 83 hoaxes. 75 of those exhibited a greater attention change than their cohort (of non-hoax articles started on the same day). Despite the relatively small sample size, this finding was judged to be statistically significant (more precisely, the authors find "a bootstrapped 95% confidence interval of (0.1227, 0.1234)" for the difference between hoax and non-hoax articles, far away from zero). In conclusion, this "indicates that the generation of hoaxes in Wikipedia is associated with prior consumption of information, in the form of online attention."

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer and Andreas Kolbe

"Information literacy in South Korea: similarities and differences between Korean and international students’ research trajectories"

From the abstract:[2]

Work on students’ information literacy and research trajectories is usually based on studies of Western, English-speaking students. South Korea presents an opportunity to investigate an environment where Internet penetration is very high, but local Internet users operate in a different digital ecosystem than in the West, with services such as Google and Wikipedia being less popular. [...] We find that Korean students use Wikipedia but less so than their peers from other countries, despite their recognition that Wikipedia is more reliable and comprehensive than the alternatives. Their preferences are instead affected by their perception of Wikipedia as providing an inferior user experience and less local content than competing, commercial services, which also benefit from better search engine result placement in Naver, the search engine dominating the Korean market.

"A Wiki-Based Dataset of Military Operations with Novel Strategic Technologies (MONSTr)"

From the abstract:[3]

This paper introduces a comprehensive dataset on the universe of United States military operations from 1989 to 2021 from a single source: Wikipedia. Using automated extraction techniques on its two structured knowledge databases–Wikidata and DBpedia–we uncover information about individual operations within nearly every post-1989 military intervention described in existing academic datasets. The data we introduce offers unprecedented coverage and granularity that enables analysis of myriad factors associated with when, where, and how the United States employs military force. We describe the data collection process, demonstrate its contents and validity, and discuss its potential applications to existing theories about force employment and strategy in war.

"European Wikipedia platforms, sharing economy and national differences in participation: a case study"

From the abstract:[4]

The following exploratory study considers which macro-level factors can lead to the sharing economy being more popular in certain countries and less so in others. An example of commons-based peer production in the form of the level of contributing to Wikipedia in 24 European countries is used as a proxy for participation in the sharing economy. Demographic variables (number of native speakers, a proxy for population) and development ones (Human Development Index-level and internet penetration) are found to be less significant than cultural values (particularly self-expression and secular-rational axes of the Inglehart-Welzel model). Three clusters of countries are identified, with the Scandinavian/Baltic/Protestant countries being roughly five times as productive as the Eastern Europe/Balkans/Orthodox ones.

"Between News and History: Identifying Networked Topics of Collective Attention on Wikipedia"

From the abstract:[5]

"[...] how is information on and attention towards current events integrated into the existing topical structures of Wikipedia? To address this [question] we develop a temporal community detection approach towards topic detection that takes into account both short term dynamics of attention as well as long term article network structures. We apply this method to a dataset of one year of current events on Wikipedia to identify clusters distinct from those that would be found solely from page view time series correlations or static network structure. We are able to resolve the topics that more strongly reflect unfolding current events vs more established knowledge by the relative importance of collective attention dynamics vs link structures. We also offer important developments by identifying and describing the emergent topics on Wikipedia."

References

  1. ^ Anis Elebiary and Giovanni Luca Ciampaglia: "The role of online attention in the supply of disinformation in Wikipedia". Proc. of Truth and Trust Online 2022. Also as Elebiary, Anis; Ciampaglia, Giovanni Luca (2023-02-16), The role of online attention in the supply of disinformation in Wikipedia, arXiv. Code and figures
  2. ^ Konieczny, Piotr; Danabayev, Kakim; Kennedy, Kara; Varpahovskis, Eriks (2023-06-07). "Information literacy in South Korea: similarities and differences between Korean and international students' research trajectories". Asia Pacific Journal of Education: 1–16. doi:10.1080/02188791.2023.2220936. ISSN 0218-8791.
  3. ^ Gannon, J. Andrés; Chávez, Kerry (2023-05-30). "A Wiki-Based Dataset of Military Operations with Novel Strategic Technologies (MONSTr)". International Interactions: 1–30. doi:10.1080/03050629.2023.2214845. ISSN 0305-0629.
  4. ^ Konieczny, Piotr (2023-04-08). "European Wikipedia platforms, sharing economy and national differences in participation: a case study". Innovation: The European Journal of Social Science Research: 1–30. doi:10.1080/13511610.2023.2195584. ISSN 1351-1610.
  5. ^ Gildersleve, Patrick; Lambiotte, Renaud; Yasseri, Taha (2022-11-14), Between News and History: Identifying Networked Topics of Collective Attention on Wikipedia, arXiv, doi:10.48550/arXiv.2211.07616 ("accepted for publication in Journal of Computational Social Science")
Supplementary references and notes:

This page is a draft for the next issue of the Signpost. Below is some helpful code that will help you write and format a Signpost draft. If it's blank, you can fill out a template by copy-pasting this in and pressing 'publish changes': {{subst:Wikipedia:Wikipedia Signpost/Templates/Story-preload}}


Images and Galleries
Sidebar images

To put an image in your article, use the following template (link):

[[File:|center|300px|alt=Placeholder alt text]]

CAPTION
{{Wikipedia:Wikipedia Signpost/Templates/Filler image-v2
 |image     = 
 |size      = 300px
 |alt       = Placeholder alt text
 |caption   = CAPTION
 |fullwidth = no
}}

This will create the file on the right. Keep the 300px in most cases. If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Inline images

Placing

{{Wikipedia:Wikipedia Signpost/Templates/Inline image
 |image   =
 |size    = 300px
 |align   = center
 |alt     = Placeholder alt text
 |caption = CAPTION
}}

(link) will instead create an inline image like below

[[File:|300px|center|alt=Placeholder alt text]]
CAPTION
Galleries

To create a gallery, use the following

<gallery mode = packed | heights = 200px>
|Caption for second image
</gallery>

to create

Quotes
Framed quotes

To insert a framed quote like the one on the right, use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/Filler quote-v2
 |1         = The goose is on the loose!
 |author    = AUTHOR
 |source    = SOURCE
 |fullwidth = no
}}

If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Pull quotes

To insert a pull quote like

use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/Quote
 |1         = The goose is on the loose!
 |source    = SOURCE
}}
Long quotes

To insert a long inline quote like

The goose is on the loose! The geese are on the lease!
— User:Oscar Wilde
— Quotations Notes from the Underpoop

use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/block quote
 | text   = The goose is on the loose! The geese are on the lease!
 | by     = Oscar Wilde
 | source = Quotations
 | ts     = Notes from the Underpoop
 | oldid  = 1234567890
}}
Side frames

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

A caption

Side frames help put content in sidebar vignettes. For instance, this one (link):

{{Wikipedia:Wikipedia Signpost/Templates/Filler frame-v2
 |1         = Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 |caption   = A caption
 |fullwidth = no
}}

gives the frame on the right. This is useful when you want to insert non-standard images, quotes, graphs, and the like.

Example − Graph/Charts
A caption

For example, to insert the {{Graph:Chart}} generated by

{{Graph:Chart
 |width=250|height=100|type=line
 |x=1,2,3,4,5,6,7,8|y=10,12,6,14,2,10,7,9
}}

in a frame, simple put the graph code in |1=

{{Wikipedia:Wikipedia Signpost/Templates/Filler frame-v2
 |1=
{{Graph:Chart
 |width=250|height=100|type=line
 |x=1,2,3,4,5,6,7,8|y=10,12,6,14,2,10,7,9
}}
 |caption=A caption
 |fullwidth=no
}}

to get the framed Graph:Chart on the right.

If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Two-column vs full width styles

If you keep the 'normal' preloaded draft and work from there, you will be using the two-column style. This is perfectly fine in most cases and you don't need to do anything.

However, every time you have a |fullwidth=no and change it to |fullwidth=yes (or vice-versa), the article will take that style from that point onwards (|fullwidth=yes → full width, |fullwidth=no → two-column). By default, omitting |fullwidth= is the same as putting |fullwidth=no and the article will have two columns after that. Again, this is perfectly fine in most cases, and you don't need to do anything.

However, you can also fine-tune which style is used at which point in an article.

To switch from two-column → full width style midway in an article, insert

{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-start-v2|fullwidth=yes}}

where you want the switch to happen.

To switch from full width → two-column style midway in an article, insert

{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-start-v2|fullwidth=no}}

where you want the switch to happen.

Article series

To add a series of 'related articles' your article, use the following code

Related articles
Visual Editor

Five, ten, and fifteen years ago
1 January 2023

VisualEditor, endowment, science, and news in brief
5 August 2015

HTTPS-only rollout completed, proposal to enable VisualEditor for new accounts
17 June 2015

VisualEditor and MediaWiki updates
29 April 2015

Security issue fixed; VisualEditor changes
4 February 2015


More articles

{{Signpost series
 |type=sidebar-v2
 |tag=VisualEditor
 |seriestitle=Visual Editor
 |fullwidth=no
}}

or

{{Signpost series
 |type=sidebar-v2
 |tag=VisualEditor
 |seriestitle=Visual Editor
 |fullwidth=yes
}}

will create the sidebar on the right. If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes. A partial list of valid |tag= parameters can be found at here and will decide the list of articles presented. |seriestitle= is the title that will appear below 'Related articles' in the box.

Alternatively, you can use

{{Signpost series
 |type=inline
 |tag=VisualEditor
 |tag_name=visual editor
 |tag_pretext=the
}}

at the end of an article to create

For more Signpost coverage on the visual editor see our visual editor series.

If you think a topic would make a good series, but you don't see a tag for it, or that all the articles in a series seem 'old', ask for help at the WT:NEWSROOM. Many more tags exist, but they haven't been documented yet.

Links and such

By the way, the template that you're reading right now is {{Editnotices/Group/Wikipedia:Wikipedia Signpost/Next issue}}.