Wikipedia:Wikipedia Signpost/2024-03-02/Recent research: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
title, blurb, more pertinent headline
tthese are different datasets (footnote 27 and 28 in the paper) / the sample of only 22 occupations was for the second part (IAT); in the first part they used the full list
Line 27: Line 27:


=== Images on Wikipedia "amplify gender bias" compared to article text ===
=== Images on Wikipedia "amplify gender bias" compared to article text ===
:''Reviewed by [[User:Bri|Bri]]''
:''Reviewed by [[User:Bri|Bri]] and [[User:HaeB|Tilman Bayer]]''
A ''[[Nature (journal)|Nature]]'' paper titled "Online Images Amplify Gender Bias"<ref>{{citation|journal=[[Nature (journal)|Nature]]|type=online ahead of print|title=Online Images Amplify Gender Bias|author1-first=Douglas |author1-last=Guilbeault |author2-first=Solène |author2-last=Delecourt |author3-first=Tasker |author3-last=Hull |author4-first=Bhargav Srinivasa |author4-last=Desikan |author5-first=Mark |author5-last=Chu |author6-first=Ethan |author6-last=Nadler |date= February 14, 2024|doi=10.1038/s41586-024-07068-x}}{{open access}}</ref> studies
A ''[[Nature (journal)|Nature]]'' paper titled "Online Images Amplify Gender Bias"<ref>{{citation|journal=[[Nature (journal)|Nature]]|type=online ahead of print|title=Online Images Amplify Gender Bias|author1-first=Douglas |author1-last=Guilbeault |author2-first=Solène |author2-last=Delecourt |author3-first=Tasker |author3-last=Hull |author4-first=Bhargav Srinivasa |author4-last=Desikan |author5-first=Mark |author5-last=Chu |author6-first=Ethan |author6-last=Nadler |date= February 14, 2024|doi=10.1038/s41586-024-07068-x}}{{open access}}</ref> studies
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">
Line 40: Line 40:
* examined text and images from the Internet for gender bias
* examined text and images from the Internet for gender bias
* examined the responses of experimental subjects who were exposed to text and images from the Internet
* examined the responses of experimental subjects who were exposed to text and images from the Internet
While the paper's {{tq|main analyses}} focus on Google, the authors replicated their findings with text and image data from Wikipedia and IMDb.


====Gender bias in text and images====
For the first part, images were drawn from Google search results, and tagged with gender by workers recruited via [[Amazon Mechanical Turk]]. The reliability of tagging was validated against a "canonical set" of celebrity portraits culled from IMDB and Wikipedia.<ref group=supp>the Wikipedia-based Image Text Dataset [https://github.com/google-research-datasets/wit]</ref>
For the first part, images were drawn from Google search results for {{tq|3,495 social categories drawn from [[WordNet]], a canonical database of categories in the English language. These categories include occupations—such as doctor, lawyer and carpenter—and generic social roles, such as neighbour, friend and colleague.}} These images were tagged with gender by workers recruited via [[Amazon Mechanical Turk]]. The reliability of tagging was validated against the self-identified gender from a "canonical set" of celebrity portraits culled from IMDb and Wikipedia.<ref group=supp>the [https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/ "IMDB-WIKI dataset"], from: {{Cite journal| doi = 10.1007/s11263-016-0940-3| issn = 1573-1405| volume = 126| issue = 2| pages = 144–157| last1 = Rothe| first1 = Rasmus| last2 = Timofte| first2 = Radu| last3 = Van Gool| first3 = Luc| title = Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks| journal = International Journal of Computer Vision|date = 2018-04-01| url = https://doi.org/10.1007/s11263-016-0940-3}}</ref>


This main image dataset from Google was complemented with a dataset of images from English Wikipedia, derived using another existing Wikipedia image dataset,<ref group=supp>the [https://github.com/google-research-datasets/wit "Wikipedia-based Image Text Dataset"] (cf. our earlier coverage: [[:m:Research:Newsletter/2021/September#"Announcing_WIT:_A_Wikipedia-Based_Image-Text_Dataset"|"Announcing WIT: A Wikipedia-Based Image-Text Dataset"]])</ref> whose text descriptions yielded matches for 1,523 of the 3,495 WordNet-derived social categories.
The images represented examples of holders of "social categories" (mostly occupations) in a preselected category list; the 22 occupations included immunologist, harpist, hygienist, and intelligence analyst, as examples, all found in [[WordNet]].


Text samples were taken from Google News and gender bias analyzed with [[word embedding]] model, a computational natural language processing technique. The news story text was also associated with social categories using automation.
Text samples were taken from [[Google News]] and gender bias analyzed with [[word embedding]]s, a computational natural language processing technique. Each news story text was also associated with social categories using automation.


====Impact of image vs. text search on users' gender bias====
For the second part, an [[implicit association test]] (IAT) methodology was used, which supposedly reveals unconscious bias in a timed sorting task. In the researchers' words, {{tq|"the participant will be fast at sorting in a manner that is consistent with one's latent associations, which is expected to lead to greater cognitive fluency [lower measured sorting times] in one's intuitive reactions."}} The test measured times when images {{highlight|and text?}} were presented in sets, whose individuals could be separated both into male/female and into science/liberal arts (based on their Wikipedia biographies). The labeling of text descriptions was performed by other humans recruited via [[Amazon Mechanical Turk]]. Both the test subject, and the labelers, were adults from the United States, and the test subjects were screened to be representative of the U.S. population to include a nearly 50/50 male/female split (none self identified as other than those two categories).
For the second part, the researchers
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">... conducted a nationally representative, preregistered experiment that shows that googling for images rather than textual descriptions of occupations amplifies gender bias in participants’ beliefs.</blockquote>
To measure participants' gender bias, an [[implicit association test]] (IAT) methodology was used, which supposedly reveals unconscious bias in a timed sorting task. In the researchers' words, {{tq|"the participant will be fast at sorting in a manner that is consistent with one's latent associations, which is expected to lead to greater cognitive fluency [lower measured sorting times] in one's intuitive reactions."}} The test measured times when images {{highlight|and text?}} were presented in sets, whose individuals could be separated both into male/female and into science/liberal arts (based on their Wikipedia biographies). The labeling of text descriptions was performed by other humans recruited via [[Amazon Mechanical Turk]]. Both the test subject, and the labelers, were adults from the United States, and the test subjects were screened to be representative of the U.S. population to include a nearly 50/50 male/female split (none self identified as other than those two categories). The experiment focused on a sample of 22 occupations, e.g. immunologist, harpist, hygienist, and intelligence analyst.


Some test subjects were given a task related to occupation-related text prior to the IAT, and some were given a task related to images. The task was either to use Google search to retrieve images of representative individuals in the occupation, or Google search to retrieve a textual description of the occupation. A control group performed an unrelated Google search. Before the IAT was performed, the test subjects were required to indicate on a sliding scale, for each of the occupations, "which gender do you most expect to belong to this category?" The test was performed again a few days later with the same test subjects.
Some test subjects were given a task related to occupation-related text prior to the IAT, and some were given a task related to images. The task was either to use Google search to retrieve images of representative individuals in the occupation, or Google search to retrieve a textual description of the occupation. A control group performed an unrelated Google search. Before the IAT was performed, the test subjects were required to indicate on a sliding scale, for each of the occupations, "which gender do you most expect to belong to this category?" The test was performed again a few days later with the same test subjects.
Line 53: Line 58:
On the second test, subjects exposed to images in the first test had a stronger IAT score for bias than those exposed to text.
On the second test, subjects exposed to images in the first test had a stronger IAT score for bias than those exposed to text.


The experimental part of the study depends partly on IAT and partly on self-assessment to detect [[Priming (psychology)|priming]], and there are concerns about replicability concerning the priming effect, and the validity and reliability of IAT. Some of the concerns are described at {{section link|Implicit-association test|Criticism and controversy}}. It seemed that the authors recognized this in the statement {{tq|We acknowledge important continuing debate about the reliability of the IAT}}, and in their own study found that {{tq|We note, however, that the distribution of participants' implicit bias scores [arrived at with IAT] was less stable across our preregistered studies than the distribution of participants' explicit bias scores}}, and discounted the implicit bias scores somewhat.
The experimental part of the study depends partly on IAT and partly on self-assessment to detect [[Priming (psychology)|priming]], and there are concerns about replicability concerning the priming effect, and the validity and reliability of IAT. Some of the concerns are described at {{section link|Implicit-association test|Criticism and controversy}}. It seemed that the authors recognized this ({{tq|We acknowledge important continuing debate about the reliability of the IAT}}), and in their own study found that {{tq|We note, however, that the distribution of participants' implicit bias scores [arrived at with IAT] was less stable across our preregistered studies than the distribution of participants' explicit bias scores}}, and discounted the implicit bias scores somewhat.


The conclusion drawn by the researchers, based partly but not entirely on the different IAT scores of experimental subjects, was that of the paper title, "images amplify gender bias" – both explicitly as determined by the subject's assignments of occupation to gender on a sliding scale, and implicitly as determined by reaction times measured in the IAT. Combined with the <!--thinly referenced--> observation that {{tq|"Each year, people spend less time reading and more time viewing images"}} that the paper opens with, this forms an {{tq|"alarming"}} trend according to the study's lead author ([[Douglas Guilbeault]] of UC Berkeley's [[Haas School of Business]]), as quoted by [https://www.todayonline.com/world/online-images-reinforce-gender-stereotypes-more-text-study-2363301 AFP] <!-- not (yet) found at AFP.com --> on {{tq|"the potential consequences this can have on reinforcing stereotypes that are harmful, mostly to women, but also to men"}}.
The conclusion drawn by the researchers, based partly but not entirely on the different IAT scores of experimental subjects, was that of the paper title, "images amplify gender bias" – both explicitly as determined by the subject's assignments of occupation to gender on a sliding scale, and implicitly as determined by reaction times measured in the IAT. Combined with the <!--thinly referenced--> observation that {{tq|"Each year, people spend less time reading and more time viewing images"}} that the paper opens with, this forms an {{tq|"alarming"}} trend according to the study's lead author ([[Douglas Guilbeault]] of UC Berkeley's [[Haas School of Business]]), as quoted by [https://www.todayonline.com/world/online-images-reinforce-gender-stereotypes-more-text-study-2363301 AFP] <!-- not (yet) found at AFP.com --> on {{tq|"the potential consequences this can have on reinforcing stereotypes that are harmful, mostly to women, but also to men"}}.


The researchers also determined, apart from experimental subjects, that the Internet – represented singularly by Google News – exhibits a strong gender bias. It was unclear to this reviewer how much of the reported Internet bias is really "Google selection bias". Based on these findings, the authors go on to speculate that "gender biases in [[multimodal AI]] may stem in part from the fact that they are trained on public images from platforms such as Google and Wikipedia, which are rife with gender bias..."
The researchers also determined, apart from experimental subjects, that the Internet – represented singularly by Google News – exhibits a strong gender bias. It was unclear to [[User:Bri|this reviewer]] how much of the reported Internet bias is really "Google selection bias". Based on these findings, the authors go on to speculate that {{tq|"gender biases in [[multimodal AI]] may stem in part from the fact that they are trained on public images from platforms such as Google and Wikipedia, which are rife with gender bias [...]"}}.





Revision as of 07:49, 29 February 2024

Recent research

Images on Wikipedia "amplify gender bias"

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.


Images on Wikipedia "amplify gender bias" compared to article text

Reviewed by Bri and Tilman Bayer

A Nature paper titled "Online Images Amplify Gender Bias"[1] studies

"gender associations of 3,495 social categories (such as 'nurse' or 'banker') in more than one million images from Google, Wikipedia and Internet Movie Database (IMDb), and in billions of words from these platforms"

As summarized by Neuroscience News:

This pioneering study indicates that online images not only display a stronger bias towards men but also leave a more lasting psychological impact compared to text, with effects still notable after three days.

This was a two-part research paper in which the authors

  • examined text and images from the Internet for gender bias
  • examined the responses of experimental subjects who were exposed to text and images from the Internet

While the paper's main analyses focus on Google, the authors replicated their findings with text and image data from Wikipedia and IMDb.

Gender bias in text and images

For the first part, images were drawn from Google search results for 3,495 social categories drawn from WordNet, a canonical database of categories in the English language. These categories include occupations—such as doctor, lawyer and carpenter—and generic social roles, such as neighbour, friend and colleague. These images were tagged with gender by workers recruited via Amazon Mechanical Turk. The reliability of tagging was validated against the self-identified gender from a "canonical set" of celebrity portraits culled from IMDb and Wikipedia.[supp 1]

This main image dataset from Google was complemented with a dataset of images from English Wikipedia, derived using another existing Wikipedia image dataset,[supp 2] whose text descriptions yielded matches for 1,523 of the 3,495 WordNet-derived social categories.

Text samples were taken from Google News and gender bias analyzed with word embeddings, a computational natural language processing technique. Each news story text was also associated with social categories using automation.

Impact of image vs. text search on users' gender bias

For the second part, the researchers

... conducted a nationally representative, preregistered experiment that shows that googling for images rather than textual descriptions of occupations amplifies gender bias in participants’ beliefs.

To measure participants' gender bias, an implicit association test (IAT) methodology was used, which supposedly reveals unconscious bias in a timed sorting task. In the researchers' words, "the participant will be fast at sorting in a manner that is consistent with one's latent associations, which is expected to lead to greater cognitive fluency [lower measured sorting times] in one's intuitive reactions." The test measured times when images and text? were presented in sets, whose individuals could be separated both into male/female and into science/liberal arts (based on their Wikipedia biographies). The labeling of text descriptions was performed by other humans recruited via Amazon Mechanical Turk. Both the test subject, and the labelers, were adults from the United States, and the test subjects were screened to be representative of the U.S. population to include a nearly 50/50 male/female split (none self identified as other than those two categories). The experiment focused on a sample of 22 occupations, e.g. immunologist, harpist, hygienist, and intelligence analyst.

Some test subjects were given a task related to occupation-related text prior to the IAT, and some were given a task related to images. The task was either to use Google search to retrieve images of representative individuals in the occupation, or Google search to retrieve a textual description of the occupation. A control group performed an unrelated Google search. Before the IAT was performed, the test subjects were required to indicate on a sliding scale, for each of the occupations, "which gender do you most expect to belong to this category?" The test was performed again a few days later with the same test subjects.

On the second test, subjects exposed to images in the first test had a stronger IAT score for bias than those exposed to text.

The experimental part of the study depends partly on IAT and partly on self-assessment to detect priming, and there are concerns about replicability concerning the priming effect, and the validity and reliability of IAT. Some of the concerns are described at Implicit-association test § Criticism and controversy. It seemed that the authors recognized this (We acknowledge important continuing debate about the reliability of the IAT), and in their own study found that We note, however, that the distribution of participants' implicit bias scores [arrived at with IAT] was less stable across our preregistered studies than the distribution of participants' explicit bias scores, and discounted the implicit bias scores somewhat.

The conclusion drawn by the researchers, based partly but not entirely on the different IAT scores of experimental subjects, was that of the paper title, "images amplify gender bias" – both explicitly as determined by the subject's assignments of occupation to gender on a sliding scale, and implicitly as determined by reaction times measured in the IAT. Combined with the observation that "Each year, people spend less time reading and more time viewing images" that the paper opens with, this forms an "alarming" trend according to the study's lead author (Douglas Guilbeault of UC Berkeley's Haas School of Business), as quoted by AFP on "the potential consequences this can have on reinforcing stereotypes that are harmful, mostly to women, but also to men".

The researchers also determined, apart from experimental subjects, that the Internet – represented singularly by Google News – exhibits a strong gender bias. It was unclear to this reviewer how much of the reported Internet bias is really "Google selection bias". Based on these findings, the authors go on to speculate that "gender biases in multimodal AI may stem in part from the fact that they are trained on public images from platforms such as Google and Wikipedia, which are rife with gender bias [...]".


Briefly

  • See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
  • Submissions are open until April 22, 2024 for Wiki Workshop 2024, to take place on June 20, 2024. The virtual event will be the eleventh in this annual series (formerly part of The Web Conference), and is organized by the Wikimedia Foundation's research team with other collaborators. The call for contributions asks for 2-page extended abstracts which will be "non-archival, meaning we welcome ongoing, completed, and already published work."

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

Compiled by Tilman Bayer


"Gender stereotypes embedded in natural language [of Wikipedia articles] are stronger in more economically developed and individualistic countries"

From the abstract:[2]

From the abstract: "[...] measuring stereotypes is difficult, particularly in a cross-cultural context. Word embeddings are a recent useful tool in natural language processing permitting to measure the collective gender stereotypes embedded in a society. [...] We considered stereotypes associating men with career and women with family as well as those associating men with math or science and women with arts or liberal arts. Relying on two different sources (Wikipedia and Common Crawl), we found that these gender stereotypes are all significantly more pronounced in the text corpora of more economically developed and more individualistic countries. [...] our analysis sheds light on the “gender equality paradox,” i.e. on the fact that gender imbalances in a large number of domains are paradoxically stronger in more developed/gender equal/individualistic countries."

To determined "the relative contribution of residents from each country to each language [version of Wikipedia]", the author (a researcher at CNRS) used the Wikimedia Foundation's "WiViVi" dataset which provides the percentage of pageviews per country for a given language Wikipedia. This data is somewhat outdated (last updated in 2018) and also, for the goal of measuring contribution (rather than consumption), the separate Geoeditors dataset might have been worth considering (which provides the number of editors per country, although with - somewhat controversial - privacy redactions).


"Poor attention: The wealth and regional gaps in event attention and coverage on Wikipedia"

From the abstract:[3]

"for many people around the world, [Wikipedia] serves as an essential news source for major events such as elections or disasters. Although Wikipedia covers many such events, some events are underrepresented and lack attention, despite their newsworthiness predicted from news value theory. In this paper, we analyze 17 490 event articles in four Wikipedia language editions and examine how the economic status and geographic region of the event location affects the attention [page views] and coverage [edits] it receives. We find that major Wikipedia language editions have a skewed focus, with more attention given to events in the world’s more economically developed countries and less attention to events in less affluent regions. However, other factors, such as the number of deaths in a disaster, are also associated with the attention an event receives."

Relatedly, a 2016 paper titled "Dynamics and biases of online attention: the case of aircraft crashes"[4] had had found

that the attention given by Wikipedia editors to pre-Wikipedia aircraft incidents and accidents depends on the region of the airline for both English and Spanish editions. North American airline companies receive more prompt coverage in English Wikipedia. We also observe that the attention given by Wikipedia visitors is influenced by the airline region but only for events with a high number of deaths. Finally we show that the rate and time span of the decay of attention is independent of the number of deaths and a fast decay within about a week seems to be universal.

"Understanding Structured Knowledge Production: A Case Study of Wikidata’s Representation Injustice"

From the paper:[5]

"... through a case study of comparing human [Wikidata] items of two countries, Vietnam and Germany, we propose several reasons that might lead to the existing biases in the knowledge contribution process. [...]
We chose Germany and Vietnam as subjects based on three primary considerations. Firstly, both nations have comparable population sizes. Secondly, the editors who speak the predominant languages of each country maintain their distinct Wiki communities on Wikidata. [...]
The first analysis we did was comparing different components of Wikidata pages between pages in two countries. The components we are comparing are labels, descriptions, claims, and sitelinks. For a single Wikidata page, label is the name that this item is known by, while description is a short sentence or phrase that also serves disambiguate purpose. [...] In the dataset we collected, there are 290,750 people who have citizenship of Germany, and there are only 4,744 people who have citizenship of Vietnam. [...] German pages on average had 13 more labels, 5 more descriptions and 7 more claims compared to Vietnamese pages. While surprisingly, Vietnamese pages had slightly more sitelinks, the difference according to effect size was negligible.
The second analysis focused on the edit history of Wikidata items. [...] we quantified the attention metric into five features: Number of total edits, number of human edits, number of bot edits, and number of distinct bot and human edits. [...] in all the five features the [difference in means between the German and Vietnamese Wikidata human pages] is significant and in terms of bot activity and total activity, the effect size is beyond medium threshold (0.5).


References

  1. ^ Guilbeault, Douglas; Delecourt, Solène; Hull, Tasker; Desikan, Bhargav Srinivasa; Chu, Mark; Nadler, Ethan (February 14, 2024), "Online Images Amplify Gender Bias", Nature (online ahead of print), doi:10.1038/s41586-024-07068-xOpen access icon
  2. ^ Napp, Clotilde (2023-11-01). "Gender stereotypes embedded in natural language are stronger in more economically developed and individualistic countries". PNAS Nexus. 2 (11). Michele Gelfand (ed.): –355. doi:10.1093/pnasnexus/pgad355. ISSN 2752-6542.
  3. ^ Ruprechter, Thorsten; Burghardt, Keith; Helic, Denis (2023-11-08). "Poor attention: The wealth and regional gaps in event attention and coverage on Wikipedia". PLOS ONE. 18 (11). Robin Haunschild (ed.): –0289325. doi:10.1371/journal.pone.0289325. ISSN 1932-6203.{{cite journal}}: CS1 maint: unflagged free DOI (link) Data and code: https://github.com/ruptho/wiki-event-bias https://zenodo.org/record/7701969
  4. ^ García-Gavilanes, Ruth; Tsvetkova, Milena; Yasseri, Taha (2016-10-01). "Dynamics and biases of online attention: the case of aircraft crashes". Open Science. 3 (10): 160460. doi:10.1098/rsos.160460. ISSN 2054-5703.
  5. ^ Ma, Jeffrey Jun-jie; Zhang, Charles Chuankai (2023-11-05), Understanding Structured Knowledge Production: A Case Study of Wikidata's Representation Injustice, arXiv extended abstract. In: CSCW ’23 Workshop on Epistemic injustice in online communities, October 2023, Minneapolis, MN.. ACM, New York, NY, USA
Supplementary references and notes:
  1. ^ the "IMDB-WIKI dataset", from: Rothe, Rasmus; Timofte, Radu; Van Gool, Luc (2018-04-01). "Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks". International Journal of Computer Vision. 126 (2): 144–157. doi:10.1007/s11263-016-0940-3. ISSN 1573-1405.
  2. ^ the "Wikipedia-based Image Text Dataset" (cf. our earlier coverage: "Announcing WIT: A Wikipedia-Based Image-Text Dataset")