Jump to content

Wikipedia:Wikipedia Signpost/2023-07-17/Recent research: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
title, blurb, byline
No edit summary
Line 101: Line 101:
===Other recent publications===
===Other recent publications===
''Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, [[m:Research:Newsletter#How to contribute|are always welcome]].''
''Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, [[m:Research:Newsletter#How to contribute|are always welcome]].''
:<small>''Compiled by ...''</small>
:<small>''Compiled by [[User:HaeB|Tilman Bayer]]''</small>


====Prompting ChatGPT to answer according to Wikipedia reduces hallucinations ====
===="..."====
From the abstract:<ref>{{Cite| publisher = arXiv| doi = 10.48550/arXiv.2305.13252| last1 = Weller| first1 = Orion| last2 = Marone| first2 = Marc| last3 = Weir| first3 = Nathaniel| last4 = Lawrie| first4 = Dawn| last5 = Khashabi| first5 = Daniel| last6 = Van Durme| first6 = Benjamin| title = "According to ..." Prompting Language Models Improves Quoting from Pre-Training Data| date = 2023-05-22| url = http://arxiv.org/abs/2305.13252}}</ref>
From the abstract:
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
"Large Language Models (LLMs) may hallucinate and generate fake information, despite pre-training on factual data. Inspired by the journalistic device of 'according to sources', we propose according-to prompting: directing LLMs to ground responses against previously observed text. To quantify this grounding, we propose a novel evaluation metric (QUIP-Score) that measures the extent to which model-produced answers are directly found in underlying text corpora. We illustrate with experiments on Wikipedia that these prompts improve grounding under our metrics, with the additional benefit of often improving end-task performance."</blockquote>
...</blockquote>
The authors tested various variations of such "grounding prompts" (e.g. "As an expert editor for Wikipedia, I am confident in the following answer." or "I found some results for that on Wikipedia. Here’s a direct quote:"). The best performing prompt was "Respond to this question using only information that can be attributed to Wikipedia".

===="..."====
From the abstract:
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
...</blockquote>

===="..."====
From the abstract:
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
...</blockquote>


===References===
===References===

Revision as of 20:47, 16 July 2023

Recent research

Wikipedia and open access; Wikipedia-grounded chatbot "outperforms all baselines" on factual accuracy


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


Wikipedia and open access

Reviewed by Nicolas Jullien

From the abstract:[1]:

"we analyze a large dataset of citations from Wikipedia and model the role of open access in Wikipedia's citation patterns. We find that open-access articles are extensively and increasingly more cited in Wikipedia. What is more, they show a 15% higher likelihood of being cited in Wikipedia when compared to closed-access articles, after controlling for confounding factors. This open-access citation effect is particularly strong for articles with low citation counts, including recently published ones. Our results show that open access plays a key role in the dissemination of scientific knowledge, including by providing Wikipedia editors timely access to novel results."

Why does it matter for the Wikipedia community?

This article is a first draft of an analysis of the relationship between the availability of a scientific journal as open access and the fact that it is cited in the English Wikipedia (note: although it speaks of "Wikipedia", the article looks only at the English pages). It is a preprint and has not been peer-reviewed, so its results should be read with caution, especially since I am not sure about the robustness of the model and the results derived from it (see below). It is of course a very important issue, as access to scientific sources is key to the diffusion of scientific knowledge, but also, as the authors mention, because Wikipedia is seen as central to the diffusion of scientific facts (and is sometimes used by scientists to push their ideas).

Review

The results presented in the article (and its abstract) highlight two important issues for Wikipedia that will likely be addressed in a more complete version of the paper:

  • The question of the reliability of the sources used by Wikipedians
=> the regressions seem to indicate that the reputation of the journal is not important to be cited in Wikipedia.
=> Predatory journals are known to be more often open access than classical journals, which means that this result potentially indicates that the phenomenon of open access reduces the seriousness of Wikipedia sources.

The authors say on p. 4 that they provided "each journal with an SJR score, H-index, and other relevant information." Why did they not use this as a control variable? (this echoes a debate on the role of Wikipedia: is it to disseminate verified knowledge, or to serve as a platform for the dissemination of new theories? The authors seem to lean towards the second view: p. 2: "With the rapid development of the Internet, traditional peer review and journal publication can no longer meet the need for the development of new ideas".)

  • The solidity of the results
The authors said: "STEM fields, especially biology and medicine, comprise the most prominent scientific topics in Wikipedia [17]." "General science, technology, and biomedical research have relatively higher OA rates."
=> So, it is obvious that, on average, there are more citations of Open Access articles in Wikipedia (than in the entire available research corpus), and explain that open access articles are cited more.
=> Why not control for discipline in the models?

More problematic (and acknowledged by the authors, so probably in the process of being addressed), the authors said, on p.7, that they built their model with the assumption that the age of a research article and the number of citations it has both influence the probability of an article being cited in Wikipedia.

Of course, for this causal effect to hold, the age and the number of citations must be taken into account at the moment the article is cited in Wikipedia (if some of the citations are made after the citation in Wikipedia, one could argue that the causal effect could be in the other direction). For example, many articles are open access after an embargo period, and are therefore considered open access in the analysis, whereas they may have been cited in Wikipedia when they were under embargo. The authors did not check for this, as acknowledged in the last sentence of the article => would the result be as robust if they do their model taking the first citation in the English Wikipedia, for example, and the age of the article, its open access status, etc. at that moment)?

In short

Although this first draft is probably not solid enough to be cited in Wikipedia, it signals important research in progress, and I am sure that the richness of the data and the quality of the team will quickly lead to very interesting insights for the Wikipedia community.

Related earlier coverage

"Controversies over Historical Revisionism in Wikipedia"

Reviewed by Andreas Kolbe

From the abstract:[2]

This study investigates the development of historical revisionism on Wikipedia. The edit history of Wikipedia pages allows us to trace the dynamics of individuals and coordinated groups surrounding controversial topics. This project focuses on Japan, where there has been a recent increase in right-wing discourse and dissemination of different interpretations of historical events.

This brief study, one of the extended abstracts accepted at the Wiki Workshop (10th edition), follows up on reports that some historical pages on the Japanese Wikipedia, particularly those related to World War II and war crimes, have been edited in ways that reflect radical right-wing ideas (see previous Signpost coverage). It sets out to answer three questions:

  1. What types of historical topics are most susceptible to historical revisionism?
  2. What are the common factors for the historical topics that are subject to revisionism?
  3. Are there groups of editors who are seeking to disseminate revisionist narratives?

The study focuses on the level of controversy of historical articles, based on the notion that the introduction of revisionism is likely to lead to edit wars. The authors found that the most controversial historical articles in the Japanese Wikipedia were indeed focused on areas that are of particular interest to revisionists. From the findings:

Articles related to WWII exhibited significantly greater controversy than general historical articles. Among the top 20 most controversial articles, eleven were largely related to Japanese war crimes and right-wing ideology. Over time, the number of contributing editors and the level of controversy increased. Furthermore, editors involved in edit wars were more likely to contribute to a higher number of controversial articles, particularly those related to right-wing ideology. These findings suggest the possible presence of groups of editors seeking to disseminate revisionist narratives.

The paper establishes that articles covering these topic areas in the Japanese Wikipedia are contested and subject to edit wars. However, it does not measure to what extent article content has been compromised. Edit wars could be a sign of mainstream editors pushing back against revisionists, while conversely an absence of edit wars could indicate that a project has been captured (cf. the Croatian Wikipedia). While this little paper is a useful start, further research on the Japanese Wikipedia seems warranted.


Wikipedia-based LLM chatbot "outperforms all baselines" regarding factual accuracy

Reviewed by Tilman Bayer

This preprint by four graduate students at Stanford University's computer science department discusses the construction of a Wikpedia-based chatbot:

"We design WikiChat [...] to ground LLMs using Wikipedia to achieve the following objectives. While LLMs tend to hallucinate, our chatbot should be factual. While introducing facts to the conversation, we need to maintain the qualities of LLMs in being relevant, conversational, and engaging."

The paper sets out from the observation that

"LLMs cannot speak accurately about events that occurred after their training, which are often topics of great interest to users, and [...] are highly prone to hallucination when talking about less popular (tail) topics. [...] Through many iterations of experimentation, we have crafted a pipeline based on information retrieval that (1) uses LLMs to suggest interesting and relevant facts that are individually verified against Wikipedia, (2) retrieves additional up-to-date information, and (3) composes coherent and engaging time-aware responses. [...] We focus on evaluating important but previously neglected issues such as conversing about recent and tail topics. We find that WikiChat outperforms all baselines in terms of the factual accuracy of its claims, by up to 12.1%, 28.3% and 32.7% on head, recent and tail topics, while matching GPT-3.5 in terms of providing natural, relevant, non-repetitive and informational responses."

The researcher argue that "most chatbots are evaluated only on static crowdsourced benchmarks like Wizard of Wikipedia (Dinan et al., 2019) and Wizard of Internet (Komeili et al., 2022). Even when human evaluation is used, evaluation is conducted only on familiar discussion topics. This leads to an overestimation of the capabilities of chatbots." They call such topics "head topics" ("Examples include Albert Einstein or FC Barcelona"), contrasting them with lesser known "tail topics" ("likely to be present in the pre-training data of LLMs at low frequency. Examples include Thomas Percy Hilditch or Hell’s Kitchen Suomi"), and with "recent topics" ("topics that happened in 2023, and therefore are absent from the pre-training corpus of LLMs, even though some back ground information about them could be present. Examples include Spare (memoir) or 2023 Australian Open". The latter are obtained from a list of most edited Wikipedia articles in early 2023.

Also, regarding the "core verification problem is whether a claim is backed up by the retrieved paragraphs [, we] found that there is a significant gap between LLMs (even GPT-4) and human performance [...]. Therefore, we conduct human evaluation via crowdsourcing, to classify each claim as supported, refuted, or there is not enough information in the paragraphs." (This observation may also be of interest regarding efforts to use LLMs as a tools for Wikipedians to check the integrity of citations on Wikipedia.) In contrast, the evalution for "conversationality" is conducted "with simulated users using LLMs. LLMs are good at simulating users: they have the general familiarity with world knowledge and know how users behave socially. They are free to occasionally hallucinate, make mistakes, and repeat or even contradict themselves, as human users sometimes do."

In the paper's evaluation, WikiChat impressively outperforms the two comparison baselines in all three topic areas (even the well-known "head" topics). It may be worth noting that that the comparison did not include widely used chatbots such as ChatGPT or Bing AI. Instead, the comparison involves Atlas (described by the authors as based on a retrieval-augmented language model that is "state-of-the-art [...] on the KILT benchmark") and GPT-3.5 (which ChatGPT is or has been based on GPT-3.5 too, it involved extensive additional finetuning).


Briefly

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer

Prompting ChatGPT to answer according to Wikipedia reduces hallucinations

From the abstract:[3]

"Large Language Models (LLMs) may hallucinate and generate fake information, despite pre-training on factual data. Inspired by the journalistic device of 'according to sources', we propose according-to prompting: directing LLMs to ground responses against previously observed text. To quantify this grounding, we propose a novel evaluation metric (QUIP-Score) that measures the extent to which model-produced answers are directly found in underlying text corpora. We illustrate with experiments on Wikipedia that these prompts improve grounding under our metrics, with the additional benefit of often improving end-task performance."

The authors tested various variations of such "grounding prompts" (e.g. "As an expert editor for Wikipedia, I am confident in the following answer." or "I found some results for that on Wikipedia. Here’s a direct quote:"). The best performing prompt was "Respond to this question using only information that can be attributed to Wikipedia".

References

  1. ^ Yang, Puyu; Shoaib, Ahad; West, Robert; Colavizza, Giovanni (2023-05-23), Wikipedia and open access, arXiv, doi:10.48550/arXiv.2305.13945
  2. ^ Kim, Taehee; Garcia, David; Aragón, Pablo (2023-05-11). "Controversies over Historical Revisionism in Wikipedia" (PDF). Wiki Workshop (10th edition).
  3. ^ Weller, Orion; Marone, Marc; Weir, Nathaniel; Lawrie, Dawn; Khashabi, Daniel; Van Durme, Benjamin (2023-05-22), "According to ..." Prompting Language Models Improves Quoting from Pre-Training Data, arXiv, doi:10.48550/arXiv.2305.13252
Supplementary references and notes:

This page is a draft for the next issue of the Signpost. Below is some helpful code that will help you write and format a Signpost draft. If it's blank, you can fill out a template by copy-pasting this in and pressing 'publish changes': {{subst:Wikipedia:Wikipedia Signpost/Templates/Story-preload}}


Images and Galleries
Sidebar images

To put an image in your article, use the following template (link):

[[File:|center|300px|alt=TKTK]]

O frabjous day.
{{Wikipedia:Wikipedia Signpost/Templates/Filler image-v2
 |image     = 
 |size      = 300px
 |alt       = TKTK
 |caption   = 
 |fullwidth = no
}}

This will create the file on the right. Keep the 300px in most cases. If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Inline images

Placing

{{Wikipedia:Wikipedia Signpost/Templates/Inline image
 |image   =
 |size    = 300px
 |align   = center
 |alt     = Placeholder alt text
 |caption = CAPTION
}}

(link) will instead create an inline image like below

[[File:|300px|center|alt=Placeholder alt text]]
CAPTION
Galleries

To create a gallery, use the following

<gallery mode = packed | heights = 200px>
|Caption for second image
</gallery>

to create

Quotes
Framed quotes

To insert a framed quote like the one on the right, use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/Filler quote-v2
 |1         = 
 |author    = 
 |source    = 
 |fullwidth = 
}}

If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Pull quotes

To insert a pull quote like

use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/Quote
 |1         = 
 |source    = 
}}
Long quotes

To insert a long inline quote like

The goose is on the loose! The geese are on the lease!
— User:Oscar Wilde
— Quotations Notes from the Underpoop

use this template (link):

{{Wikipedia:Wikipedia Signpost/Templates/block quote
 | text   = 
 | by     = 
 | source = 
 | ts     = 
 | oldid  = 
}}
Side frames

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

A caption

Side frames help put content in sidebar vignettes. For instance, this one (link):

{{Wikipedia:Wikipedia Signpost/Templates/Filler frame-v2
 |1         = Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 |caption   = A caption
 |fullwidth = no
}}

gives the frame on the right. This is useful when you want to insert non-standard images, quotes, graphs, and the like.

Example − Graph/Charts
A caption

For example, to insert the {{Graph:Chart}} generated by

{{Graph:Chart
 |width=250|height=100|type=line
 |x=1,2,3,4,5,6,7,8|y=10,12,6,14,2,10,7,9
}}

in a frame, simple put the graph code in |1=

{{Wikipedia:Wikipedia Signpost/Templates/Filler frame-v2
 |1=
{{Graph:Chart
 |width=250|height=100|type=line
 |x=1,2,3,4,5,6,7,8|y=10,12,6,14,2,10,7,9
}}
 |caption=A caption
 |fullwidth=no
}}

to get the framed Graph:Chart on the right.

If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes.

Two-column vs full width styles

If you keep the 'normal' preloaded draft and work from there, you will be using the two-column style. This is perfectly fine in most cases and you don't need to do anything.

However, every time you have a |fullwidth=no and change it to |fullwidth=yes (or vice-versa), the article will take that style from that point onwards (|fullwidth=yes → full width, |fullwidth=no → two-column). By default, omitting |fullwidth= is the same as putting |fullwidth=no and the article will have two columns after that. Again, this is perfectly fine in most cases, and you don't need to do anything.

However, you can also fine-tune which style is used at which point in an article.

To switch from two-column → full width style midway in an article, insert

{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-start-v2|fullwidth=yes}}

where you want the switch to happen.

To switch from full width → two-column style midway in an article, insert

{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-start-v2|fullwidth=no}}

where you want the switch to happen.

Article series

To add a series of 'related articles' your article, use the following code

Related articles
Visual Editor

Five, ten, and fifteen years ago
1 January 2023

VisualEditor, endowment, science, and news in brief
5 August 2015

HTTPS-only rollout completed, proposal to enable VisualEditor for new accounts
17 June 2015

VisualEditor and MediaWiki updates
29 April 2015

Security issue fixed; VisualEditor changes
4 February 2015


More articles

{{Signpost series
 |type        = sidebar-v2
 |tag         = VisualEditor
 |seriestitle = Visual Editor
 |fullwidth   = no
}}

or

{{Signpost series
 |type        = sidebar-v2
 |tag         = VisualEditor
 |seriestitle = Visual Editor
 |fullwidth   = yes
}}

will create the sidebar on the right. If writing a 'full width' article, change |fullwidth=no to |fullwidth=yes. A partial list of valid |tag= parameters can be found at here and will decide the list of articles presented. |seriestitle= is the title that will appear below 'Related articles' in the box.

Alternatively, you can use

{{Signpost series
 |type        = inline
 |tag         = VisualEditor
 |tag_name    = visual editor
 |tag_pretext = the
}}

at the end of an article to create

For more Signpost coverage on the visual editor see our visual editor series.

If you think a topic would make a good series, but you don't see a tag for it, or that all the articles in a series seem 'old', ask for help at the WT:NEWSROOM. Many more tags exist, but they haven't been documented yet.

Links and such

By the way, the template that you're reading right now is {{Editnotices/Group/Wikipedia:Wikipedia Signpost/Next issue}} (edit). A list of the preload templates for Signpost articles can be found here.