Wikipedia:Wikipedia Signpost/2023-12-24/Recent research
Article display preview: | This is a draft of a potential Signpost article, and should not be interpreted as a finished piece. Its content is subject to review by the editorial team and ultimately by JPxG, the editor in chief. Please do not link to this draft as it is unfinished and the URL will change upon publication. If you would like to contribute and are familiar with the requirements of a Signpost article, feel free to be bold in making improvements!
|
"LLMs Know More, Hallucinate Less" with Wikidata
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
"Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata"
This paper[1] (by five graduate students at Stanford University's computer science department and Monica S. Lam as last author) sets out to show that
While large language models (LLMs) can answer many questions correctly, they can also hallucinate and give wrong answers. Wikidata, with its over 12 billion facts, can be used to ground LLMs to improve their factuality.
To do this, the paper "presents WikiSP, a few-shot sequence-to-sequence semantic parser for Wikidata that translates a user query, along with results from an entity linker, directly into SPARQL queries [to retrieve information from Wikidata]." It is obtained by fine-tuning the LLaMA large language model.
For example, the user question "What year did giants win the world series?" is supposed to be converted into the query SELECT DISTINCT ?x WHERE {?y wdt:sports_season_of_league_or_competition wd:Q265538; wdt:winner wd:Q308966; wdt:point_in_time ?x. }
. The paper uses a modified SPARQL syntax that replaces numerical property IDs (here, P3450) with their English-language label (here, "sports season of league or competition"). The authors motivate this choice by observing that "While zero-shot LLMs [e.g. ChatGPT] can generate SPARQL queries for the easiest and most common questions, they do not know all the PIDs and QIDs [property and item IDs in Wikidata], and nor is it possible to include them in a prompt."
To evaluate the performance of "WikiSP", and as a second contribution of the paper, the authors present
[...] WikiWebQuestions, a high-quality question answering benchmark for Wikidata. Ported over from WebQuestions for Freebase, it consists of real-world data with SPARQL annotation. [...]
Despite being the most popular large knowledge base for a long time, existing benchmarks on Wiki- data with labeled SPARQL queries are unfortunately either small or of low quality. On the other hand, benchmarks over the deprecated Freebase still dominate the KBQA research with better-quality data.
Using this new benchmark, "Our experimental results demonstrate the effectiveness of [WikiSP], establishing a strong baseline of 76% and 65% answer accuracy in the dev and test sets of WikiWeb- Questions, respectively." However, the paper's "Limitations" section hints that despite the impressive "12 billion facts" factoid that the paper opens with, Wikidata's coverage may be too limited to answer most user questions in a satisfying manner:
Even though knowledge bases are an important source of facts, a large portion of the knowledge available in digital form (e.g. Wikipedia, news articles, etc.), is not organized into knowledge bases. As such, the results of this paper can be considered complementary to the larger body of fact-checking research based on free text.
To address this weakness, the authors combine this Wikidata-based setup with a standard LLM that provides the answer if the Wikidata query fails to return a result. They state that
By pairing our semantic parser with GPT-3, we combine verifiable results with qualified GPT-3 guesses to provide useful answers to 96% of the questions in dev.
Data and evaluation code from the paper have been released in a GitHub repo, where the authors state that "We are now working on releasing fine-tuned models."
The paper's endeavour bears some similarity to a paper authored by a different team of Stanford graduate students with professor Lam that sought to use Wikipedia (rather than Wikidata) to reduce LLM hallucations, see the review in our July issue: "Wikipedia-based LLM chatbot 'outperforms all baselines' regarding factual accuracy".
Briefly
- See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
Other recent publications
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
"Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata"
From the abstract:[2]
"In this work, we explore the use of Large Language Models (LLMs) for knowledge engineering tasks in the context of the ISWC 2023 LM-KBC Challenge. For this task, given subject and relation pairs sourced from Wikidata, we utilize pre-trained LLMs to produce the relevant objects in string format and link them to their respective Wikidata QIDs. [...] The method achieved a macro-averaged F1-score of 0.701 across the properties, with the scores varying from 1.00 to 0.328. These results demonstrate that the knowledge of LLMs varies significantly depending on the domain and that further experimentation is required to determine the circumstances under which LLMs can be used for automatic Knowledge Base (e.g., Wikidata) completion and correction. The investigation of the results also suggests the promising contribution of LLMs in collaborative knowledge engineering. LLMKE won Track 2 of the challenge.
"Large language models learn to organize concepts in ways that are strikingly similar to how concepts are organized in [Wikidata]"
From the abstract:[3]
"Knowledge bases such as WikiData provide large-scale, high-quality representations of inferential semantics and world knowledge. We show that large language models learn to organize concepts in ways that are strikingly similar to how concepts are organized in such knowledge bases. Knowledge bases model collective, institutional knowledge, and large language models seem to induce such knowledge from raw text. We show that bigger and better models exhibit more human-like concept organization, across four families of language models and three knowledge graph embeddings."
"KGConv, a Conversational Corpus grounded in Wikidata"
From the abstract:[4]
"We present KGConv, a large, conversational corpus of 71k conversations where each question-answer pair is grounded in a Wikidata fact. Conversations contain on average 8.6 questions and for each Wikidata fact, we provide multiple variants (12 on average) of the corresponding question using templates, human annotations, hand-crafted rules and a question rewriting neural model. We provide baselines for the task of Knowledge-Based, Conversational Question Generation. [...]"
"WikiDialog" dataset: "Dialog inpainting" using Wikipedia
From the abstract:[5]
"[...] conversational question answering (ConvQA) systems have long been stymied by scarce training data that is expensive to collect. To address this problem, we propose a new technique for synthetically generating diverse and high-quality dialog data: dialog inpainting. Our approach takes the text of any document and transforms it into a two-person dialog between the writer and an imagined reader: we treat sentences from the article as utterances spoken by the writer, and then use a dialog inpainter to predict what the imagined reader asked or said in between each of the writer's utterances. By applying this approach to passages from Wikipedia and the web, we produce WikiDialog and WebDialog, two datasets totalling 19 million diverse information-seeking dialogs -- 1,000x larger than the largest existing ConvQA dataset. Furthermore, human raters judge the answer adequacy and conversationality of WikiDialog to be as good or better than existing manually-collected datasets."
References
- ^ Xu, Silei; Liu, Shicheng; Culhane, Theo; Pertseva, Elizaveta; Wu, Meng-Hsi; Semnani, Sina; Lam, Monica (December 2023). "Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata". Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. EMNLP 2023. Singapore: Association for Computational Linguistics. pp. 5778–5791. doi:10.18653/v1/2023.emnlp-main.353.
{{cite conference}}
: Unknown parameter|editors=
ignored (|editor=
suggested) (help) Data and evaluation code - ^ Zhang, Bohui; Reklos, Ioannis; Jain, Nitisha; Peñuela, Albert Meroño; Simperl, Elena (2023-09-15), Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata, arXiv code
- ^ Gammelgaard, Mathias Lykke; Christiansen, Jonathan Gabel; Søgaard, Anders (2023-08-29), Large language models converge toward human-like concept organization, arXiv
- ^ Brabant, Quentin; Lecorve, Gwenole; Rojas-Barahona, Lina M.; Gardent, Claire (2023-08-29), KGConv, a Conversational Corpus grounded in Wikidata, arXiv
- ^ Dai, Zhuyun; Chaganty, Arun Tejasvi; Zhao, Vincent; Amini, Aida; Rashid, Qazi Mamunur; Green, Mike; Guu, Kelvin (2022-05-31), Dialog Inpainting: Turning Documents into Dialogs, arXiv, doi:10.48550/arXiv.2205.09073 Dataset, poster presentation
- Supplementary references and notes:
This page is a draft for the next issue of the Signpost. Below is some helpful code that will help you write and format a Signpost draft. If it's blank, you can fill out a template by copy-pasting this in and pressing 'publish changes': {{subst:Wikipedia:Wikipedia Signpost/Templates/Story-preload}}
Images and Galleries
|
---|
To put an image in your article, use the following template (link): This will create the file on the right. Keep the 300px in most cases. If writing a 'full width' article, change
Placing (link) will instead create an inline image like below [[File:|300px|center|alt=Placeholder alt text]]
To create a gallery, use the following to create |
Quotes
| |||
---|---|---|---|
To insert a framed quote like the one on the right, use this template (link): If writing a 'full width' article, change
To insert a pull quote like
use this template (link):
To insert a long inline quote like
use this template (link): |
Side frames
|
---|
Side frames help put content in sidebar vignettes. For instance, this one (link): gives the frame on the right. This is useful when you want to insert non-standard images, quotes, graphs, and the like.
For example, to insert the {{Graph:Chart}} generated by in a frame, simple put the graph code in to get the framed Graph:Chart on the right. If writing a 'full width' article, change |
Two-column vs full width styles
|
---|
If you keep the 'normal' preloaded draft and work from there, you will be using the two-column style. This is perfectly fine in most cases and you don't need to do anything. However, every time you have a However, you can also fine-tune which style is used at which point in an article. To switch from two-column → full width style midway in an article, insert where you want the switch to happen. To switch from full width → two-column style midway in an article, insert where you want the switch to happen. |
Article series
|
---|
To add a series of 'related articles' your article, use the following code or will create the sidebar on the right. If writing a 'full width' article, change Alternatively, you can use at the end of an article to create For more Signpost coverage on the visual editor see our visual editor series. If you think a topic would make a good series, but you don't see a tag for it, or that all the articles in a series seem 'old', ask for help at the WT:NEWSROOM. Many more tags exist, but they haven't been documented yet. |
Links and such
|
---|
By the way, the template that you're reading right now is {{Editnotices/Group/Wikipedia:Wikipedia Signpost/Next issue}}. |
Discuss this story
The value of Wikidata becoming clearer in a world with AI is an unsurprising development. It'll be similar with Abstract Wikipedia. {{u|Sdkb}} talk 15:52, 24 December 2023 (UTC)[reply]
I personally do not support AI given their stratospheric impact towards politics and copyright. I cannot believe what I'm seeing here. MarioJump83 (talk) 00:43, 3 January 2024 (UTC)[reply]