Wikipedia:Wikipedia Signpost/2023-06-19/Recent research: Difference between revisions
hoaxes, list item |
wasn't clear to me from the paper, but looking at the code they did indeed use the initial revision (as one should) |
||
Line 32: | Line 32: | ||
Before getting to their main research question about the relation between online attention to a topic (operationalized using Wikipedia pageviews) and disinformation, the authors |
Before getting to their main research question about the relation between online attention to a topic (operationalized using Wikipedia pageviews) and disinformation, the authors compare how these confirmed hoax articles (in their [https://github.com/CSDL-UMD/wikihoaxes/blob/main/data.ipynb initial revision]) differ in various content features from a cohort of non-hoax articles started on the same day, concluding that |
||
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;"> |
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;"> |
||
"[...] hoaxes tend to have more plain text than legitimate articles and fewer links to external web pages outside of Wikipedia. This means that non-hoax articles, in general, contain more references to links residing outside Wikipedia. Such behavior is expected as a hoax’s author would need to put a significant effort into crafting external resources at which the hoax can point." |
"[...] hoaxes tend to have more plain text than legitimate articles and fewer links to external web pages outside of Wikipedia. This means that non-hoax articles, in general, contain more references to links residing outside Wikipedia. Such behavior is expected as a hoax’s author would need to put a significant effort into crafting external resources at which the hoax can point." |
||
</blockquote> |
</blockquote> |
||
In other words, successful hoaxers may display an anti-[[FUTON bias]]. |
|||
<!--TKTK mention bias issues as articles on the hoax list are no longer being edited--> |
|||
To quantify the attention of a topic area that an article pertains to (even before it is created), the authors use its "wiki-link neighbors": |
To quantify the attention of a topic area that an article pertains to (even before it is created), the authors use its "wiki-link neighbors": |
||
Line 46: | Line 46: | ||
"To understand the nature of the relationship between the creation of hoaxes and the attention their respective topics receive, we first seek to quantify the relative volume change [in attention] before and after this creation day. Here, a topic is defined as all of the (non-hoax) neighbors linked within the contents of an article i.e., its (non-hoax) out-links." |
"To understand the nature of the relationship between the creation of hoaxes and the attention their respective topics receive, we first seek to quantify the relative volume change [in attention] before and after this creation day. Here, a topic is defined as all of the (non-hoax) neighbors linked within the contents of an article i.e., its (non-hoax) out-links." |
||
</blockquote> |
</blockquote> |
||
This "volume change" is calculated as the change in median pageviews among the neighboring articles, using the timespans from 7 days before and 7 days after the creation of an article, to account for "short spikes in attention and weekly changes in traffic"(a somewhat simplistic way of handling this kind of [[time-series analysis]], compared to [[m:Research:Newsletter/2019/December#An_awareness_campaign_in_India_did_not_affect_Wikipedia_pageviews,_but_a_new_software_feature_did|some]] [[m:Research:Newsletter/2015/September#Predicting_Wikimedia_pageviews_with_2%_accuracy|earlier]] research on Wikipedia traffic). The authors limited themselves to an older pageviews dataset that is only available for the time from 2007 to 2016, reducing their sample for this part of the analysis to 83 hoaxes. 75 of those exhibited a greater attention change than their cohort (of non-hoax articles started on the same day). Despite the relatively small sample size, this finding was judged to be statistically significant (more precisely, the authors find "a bootstrapped 95% confidence interval of (0.1227, 0.1234)" for the difference between hoax and non-hoax articles, far away from zero). In conclusion, this "indicates that the generation of hoaxes in Wikipedia is associated with prior consumption of information, in the form of online attention." |
This "volume change" is calculated as the change in median pageviews among the neighboring articles, using the timespans from 7 days before and 7 days after the creation of an article, to account for "short spikes in attention and weekly changes in traffic" (a somewhat simplistic way of handling this kind of [[time-series analysis]], compared to [[m:Research:Newsletter/2019/December#An_awareness_campaign_in_India_did_not_affect_Wikipedia_pageviews,_but_a_new_software_feature_did|some]] [[m:Research:Newsletter/2015/September#Predicting_Wikimedia_pageviews_with_2%_accuracy|earlier]] research on Wikipedia traffic). The authors limited themselves to an older pageviews dataset that is only available for the time from 2007 to 2016, reducing their sample for this part of the analysis to 83 hoaxes. 75 of those exhibited a greater attention change than their cohort (of non-hoax articles started on the same day). Despite the relatively small sample size, this finding was judged to be statistically significant (more precisely, the authors find "a bootstrapped 95% confidence interval of (0.1227, 0.1234)" for the difference between hoax and non-hoax articles, far away from zero). In conclusion, this "indicates that the generation of hoaxes in Wikipedia is associated with prior consumption of information, in the form of online attention." |
||
===Other recent publications=== |
===Other recent publications=== |
Revision as of 21:08, 18 June 2023
Article display preview: | This is a draft of a potential Signpost article, and should not be interpreted as a finished piece. Its content is subject to review by the editorial team and ultimately by JPxG, the editor in chief. Please do not link to this draft as it is unfinished and the URL will change upon publication. If you would like to contribute and are familiar with the requirements of a Signpost article, feel free to be bold in making improvements!
|
Hoaxers prefer currently popular topics
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
New articles about currently popular topics are more likely to be hoaxes
A study titled "The role of online attention in the supply of disinformation in Wikipedia"[1] finds that
"[...] compared to legitimate articles created on the same day, hoaxes [on English Wikipedia] tend to be more associated with traffic spikes preceding their creation. This is consistent with the idea that the supply of false or misleading information on a topic is driven by the attention it receives. [... a finding that the authors hope] could help promote the integrity of knowledge on Wikipedia."
The authors remark that "little is known about Wikipedia hoaxes" in the research literature, with only one previous paper focusing on this topic (Kumar et al., who among other findings had reported in 2016 that "while most hoaxes are detected quickly and have little impact on Wikipedia, a small number of hoaxes survive long and are well cited across the Web"). In contrast to that earlier study (which had access to the presumably more extensive new pages patrol data about deleted articles), the authors use base their analysis on the list of hoaxes curated by the community. Graphing these confirmed articles by year of creating, they note in passing that "the majority of hoaxes [are] appearing in the period 2005–2007, and a decline starting in 2008. This observed behavior can be in part explained by the fact that the Wikipedia community started patrolling new pages in November of 2007 [...] and is also consistent with the well-known peak of activity of the English Wikipedia community [...]."
Before getting to their main research question about the relation between online attention to a topic (operationalized using Wikipedia pageviews) and disinformation, the authors compare how these confirmed hoax articles (in their initial revision) differ in various content features from a cohort of non-hoax articles started on the same day, concluding that
"[...] hoaxes tend to have more plain text than legitimate articles and fewer links to external web pages outside of Wikipedia. This means that non-hoax articles, in general, contain more references to links residing outside Wikipedia. Such behavior is expected as a hoax’s author would need to put a significant effort into crafting external resources at which the hoax can point."
In other words, successful hoaxers may display an anti-FUTON bias.
To quantify the attention of a topic area that an article pertains to (even before it is created), the authors use its "wiki-link neighbors":
"The presence of a link between two Wikipedia entries is an indication that they are semantically related. Therefore, traffic to these neighbors gives us a rough measure of the level of online attention to a topic before a new [Wikipedia article] is created."
"To understand the nature of the relationship between the creation of hoaxes and the attention their respective topics receive, we first seek to quantify the relative volume change [in attention] before and after this creation day. Here, a topic is defined as all of the (non-hoax) neighbors linked within the contents of an article i.e., its (non-hoax) out-links."
This "volume change" is calculated as the change in median pageviews among the neighboring articles, using the timespans from 7 days before and 7 days after the creation of an article, to account for "short spikes in attention and weekly changes in traffic" (a somewhat simplistic way of handling this kind of time-series analysis, compared to some earlier research on Wikipedia traffic). The authors limited themselves to an older pageviews dataset that is only available for the time from 2007 to 2016, reducing their sample for this part of the analysis to 83 hoaxes. 75 of those exhibited a greater attention change than their cohort (of non-hoax articles started on the same day). Despite the relatively small sample size, this finding was judged to be statistically significant (more precisely, the authors find "a bootstrapped 95% confidence interval of (0.1227, 0.1234)" for the difference between hoax and non-hoax articles, far away from zero). In conclusion, this "indicates that the generation of hoaxes in Wikipedia is associated with prior consumption of information, in the form of online attention."
Other recent publications
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
- Compiled by Tilman Bayer and Andreas Kolbe
"Information literacy in South Korea: similarities and differences between Korean and international students’ research trajectories"
From the abstract:[2]
Work on students’ information literacy and research trajectories is usually based on studies of Western, English-speaking students. South Korea presents an opportunity to investigate an environment where Internet penetration is very high, but local Internet users operate in a different digital ecosystem than in the West, with services such as Google and Wikipedia being less popular. [...] We find that Korean students use Wikipedia but less so than their peers from other countries, despite their recognition that Wikipedia is more reliable and comprehensive than the alternatives. Their preferences are instead affected by their perception of Wikipedia as providing an inferior user experience and less local content than competing, commercial services, which also benefit from better search engine result placement in Naver, the search engine dominating the Korean market.
"A Wiki-Based Dataset of Military Operations with Novel Strategic Technologies (MONSTr)"
From the abstract:[3]
This paper introduces a comprehensive dataset on the universe of United States military operations from 1989 to 2021 from a single source: Wikipedia. Using automated extraction techniques on its two structured knowledge databases–Wikidata and DBpedia–we uncover information about individual operations within nearly every post-1989 military intervention described in existing academic datasets. The data we introduce offers unprecedented coverage and granularity that enables analysis of myriad factors associated with when, where, and how the United States employs military force. We describe the data collection process, demonstrate its contents and validity, and discuss its potential applications to existing theories about force employment and strategy in war.
"European Wikipedia platforms, sharing economy and national differences in participation: a case study"
From the abstract:[4]
The following exploratory study considers which macro-level factors can lead to the sharing economy being more popular in certain countries and less so in others. An example of commons-based peer production in the form of the level of contributing to Wikipedia in 24 European countries is used as a proxy for participation in the sharing economy. Demographic variables (number of native speakers, a proxy for population) and development ones (Human Development Index-level and internet penetration) are found to be less significant than cultural values (particularly self-expression and secular-rational axes of the Inglehart-Welzel model). Three clusters of countries are identified, with the Scandinavian/Baltic/Protestant countries being roughly five times as productive as the Eastern Europe/Balkans/Orthodox ones.
"Between News and History: Identifying Networked Topics of Collective Attention on Wikipedia"
From the abstract:[5]
"[...] how is information on and attention towards current events integrated into the existing topical structures of Wikipedia? To address this [question] we develop a temporal community detection approach towards topic detection that takes into account both short term dynamics of attention as well as long term article network structures. We apply this method to a dataset of one year of current events on Wikipedia to identify clusters distinct from those that would be found solely from page view time series correlations or static network structure. We are able to resolve the topics that more strongly reflect unfolding current events vs more established knowledge by the relative importance of collective attention dynamics vs link structures. We also offer important developments by identifying and describing the emergent topics on Wikipedia."
References
- ^ Anis Elebiary and Giovanni Luca Ciampaglia: "The role of online attention in the supply of disinformation in Wikipedia". Proc. of Truth and Trust Online 2022. Also as Elebiary, Anis; Ciampaglia, Giovanni Luca (2023-02-16), The role of online attention in the supply of disinformation in Wikipedia, arXiv. Code and figures
- ^ Konieczny, Piotr; Danabayev, Kakim; Kennedy, Kara; Varpahovskis, Eriks (2023-06-07). "Information literacy in South Korea: similarities and differences between Korean and international students' research trajectories". Asia Pacific Journal of Education: 1–16. doi:10.1080/02188791.2023.2220936. ISSN 0218-8791.
- ^ Gannon, J. Andrés; Chávez, Kerry (2023-05-30). "A Wiki-Based Dataset of Military Operations with Novel Strategic Technologies (MONSTr)". International Interactions: 1–30. doi:10.1080/03050629.2023.2214845. ISSN 0305-0629.
- ^ Konieczny, Piotr (2023-04-08). "European Wikipedia platforms, sharing economy and national differences in participation: a case study". Innovation: The European Journal of Social Science Research: 1–30. doi:10.1080/13511610.2023.2195584. ISSN 1351-1610.
- ^ Gildersleve, Patrick; Lambiotte, Renaud; Yasseri, Taha (2022-11-14), Between News and History: Identifying Networked Topics of Collective Attention on Wikipedia, arXiv, doi:10.48550/arXiv.2211.07616 ("accepted for publication in Journal of Computational Social Science")
- Supplementary references and notes:
This page is a draft for the next issue of the Signpost. Below is some helpful code that will help you write and format a Signpost draft. If it's blank, you can fill out a template by copy-pasting this in and pressing 'publish changes': {{subst:Wikipedia:Wikipedia Signpost/Templates/Story-preload}}
Images and Galleries
|
---|
To put an image in your article, use the following template (link): This will create the file on the right. Keep the 300px in most cases. If writing a 'full width' article, change
Placing (link) will instead create an inline image like below [[File:|300px|center|alt=Placeholder alt text]]
To create a gallery, use the following to create |
Quotes
| |||
---|---|---|---|
To insert a framed quote like the one on the right, use this template (link): If writing a 'full width' article, change
To insert a pull quote like
use this template (link):
To insert a long inline quote like
use this template (link): |
Side frames
|
---|
Side frames help put content in sidebar vignettes. For instance, this one (link): gives the frame on the right. This is useful when you want to insert non-standard images, quotes, graphs, and the like.
For example, to insert the {{Graph:Chart}} generated by in a frame, simple put the graph code in to get the framed Graph:Chart on the right. If writing a 'full width' article, change |
Two-column vs full width styles
|
---|
If you keep the 'normal' preloaded draft and work from there, you will be using the two-column style. This is perfectly fine in most cases and you don't need to do anything. However, every time you have a However, you can also fine-tune which style is used at which point in an article. To switch from two-column → full width style midway in an article, insert where you want the switch to happen. To switch from full width → two-column style midway in an article, insert where you want the switch to happen. |
Article series
|
---|
To add a series of 'related articles' your article, use the following code or will create the sidebar on the right. If writing a 'full width' article, change Alternatively, you can use at the end of an article to create For more Signpost coverage on the visual editor see our visual editor series. If you think a topic would make a good series, but you don't see a tag for it, or that all the articles in a series seem 'old', ask for help at the WT:NEWSROOM. Many more tags exist, but they haven't been documented yet. |
Links and such
|
---|
By the way, the template that you're reading right now is {{Editnotices/Group/Wikipedia:Wikipedia Signpost/Next issue}}. |
Discuss this story
For clarity's sake, the abstract for Sihame Assbague’s paper (“Wikipedia, or the discreet neutralisation of antiracism”, doi:10.3917/crieu.021.0140) does not appear at the paywalled link on Cairn.info, but I was sent a copy of the paper, from which my translation of the abstract appears above. The original French reads:
— OwenBlacker (he/him; Talk) 13:00, 19 June 2023 (UTC)[reply]