Research talk:Wikimedia Research Best Practices Around Privacy Whitepaper/Draft: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
→‎General Remarks: new section
belated signing
Line 18: Line 18:
* We can and should be expecting researchers who wish to engage with us to be aware of relevant ethics standards (including privacy and transparency activities; things like pre-registration, ethics boards, etc, etc) in their fields of research. [[User:Stuartyeates|Stuartyeates]] ([[User talk:Stuartyeates|talk]]) 06:45, 9 April 2024 (UTC)
* We can and should be expecting researchers who wish to engage with us to be aware of relevant ethics standards (including privacy and transparency activities; things like pre-registration, ethics boards, etc, etc) in their fields of research. [[User:Stuartyeates|Stuartyeates]] ([[User talk:Stuartyeates|talk]]) 06:45, 9 April 2024 (UTC)
*:@[[User:Stuartyeates|Stuartyeates]] This is a good suggestion. What we're questioning is whether such standards are in practice widely available in different geographies and languages. For example, while ethics/IRB boards are pretty much standard bodies in certain geographies, I have personally seen research/academic institutions in different parts of the world that don't have them, or if they have them, their standard of practice varies significantly when compared to some other institutions. So something like "if it's available to you, make sure you utilize or follow them" is something we can comfortably encourage in the paper. I'm not sure if we can say beyond that. [[User:LZia (WMF)|LZia (WMF)]] ([[User talk:LZia (WMF)|talk]]) 23:50, 17 April 2024 (UTC)
*:@[[User:Stuartyeates|Stuartyeates]] This is a good suggestion. What we're questioning is whether such standards are in practice widely available in different geographies and languages. For example, while ethics/IRB boards are pretty much standard bodies in certain geographies, I have personally seen research/academic institutions in different parts of the world that don't have them, or if they have them, their standard of practice varies significantly when compared to some other institutions. So something like "if it's available to you, make sure you utilize or follow them" is something we can comfortably encourage in the paper. I'm not sure if we can say beyond that. [[User:LZia (WMF)|LZia (WMF)]] ([[User talk:LZia (WMF)|talk]]) 23:50, 17 April 2024 (UTC)
*::@[[User:LZia (WMF)|LZia (WMF)]] maybe: ''Researchers are expected be aware of and following the ethics guidelines and codes of conduct of both there institution (if the have one) and of the prestige journals in their field. Researchers should be up front about ethics approvals they have obtained or plan to obtained for the planned work and the body granting those approvals.'' ?
*::@[[User:LZia (WMF)|LZia (WMF)]] maybe: ''Researchers are expected be aware of and following the ethics guidelines and codes of conduct of both there institution (if the have one) and of the prestige journals in their field. Researchers should be up front about ethics approvals they have obtained or plan to obtained for the planned work and the body granting those approvals.'' ? [[User:Stuartyeates|Stuartyeates]] ([[User talk:Stuartyeates|talk]]) 04:20, 20 April 2024 (UTC)
* There used to be some sentiment in the research documentation that researchers should try editing first. Try to be a part of the community before parachuting in. Is this still the recommendation? I know this sentiment is strongly supported in other online communities. That is, take the "external" out of external researcher. [[User:Zentavious|Zentavious]] ([[User talk:Zentavious|talk]]) 19:28, 18 April 2024 (UTC)
* There used to be some sentiment in the research documentation that researchers should try editing first. Try to be a part of the community before parachuting in. Is this still the recommendation? I know this sentiment is strongly supported in other online communities. That is, take the "external" out of external researcher. [[User:Zentavious|Zentavious]] ([[User talk:Zentavious|talk]]) 19:28, 18 April 2024 (UTC)
* There is a distinction between research ambiance and active support from editors. For the recommendations for editors, points 4-6 are about Wikipedians engaging with researchers which takes effort on editors part. Extending the idea of removing communication barriers, researchers should make it easy for individual editors to contribute/provide feedback to research. Additionally, we need some shared understanding of what lack of engagement means. What does it mean for a researcher to post on meta-wiki and receive no positive or negative feedback from Wikipedians? [[User:Zentavious|Zentavious]] ([[User talk:Zentavious|talk]]) 19:28, 18 April 2024 (UTC)
* There is a distinction between research ambiance and active support from editors. For the recommendations for editors, points 4-6 are about Wikipedians engaging with researchers which takes effort on editors part. Extending the idea of removing communication barriers, researchers should make it easy for individual editors to contribute/provide feedback to research. Additionally, we need some shared understanding of what lack of engagement means. What does it mean for a researcher to post on meta-wiki and receive no positive or negative feedback from Wikipedians? [[User:Zentavious|Zentavious]] ([[User talk:Zentavious|talk]]) 19:28, 18 April 2024 (UTC)

Revision as of 04:21, 20 April 2024

Questions for the Wikipedia Communities and Arbitration Committees

In particular, from this group we are seeking input on the following questions:

1. Starting with a review of some of the basics we've sketched so far in the white paper draft, what should the recommendations be for Wikipedians (section 4.2)? What is missing?

  • response_1
  • response_2
  • ...

2. What values do you think should be mentioned in Section 3.1 'Understanding key values of Wikipedians'? What community essays, policies, or guidelines should be referenced in communicating key values of Wikipedians?

Outside of policy I think the principle of "Transparency for the powerful, privacy for the weak" is one that is strongly held by the community and is what generally gets applied when things like NOTCENSORED and BLP conflict. Horse Eye's Back (talk) 16:31, 9 April 2024 (UTC)Reply

3. What do you see as missing, if anything, as recommendations for researchers?

  • We can and should be expecting researchers who wish to engage with us to be aware of relevant ethics standards (including privacy and transparency activities; things like pre-registration, ethics boards, etc, etc) in their fields of research. Stuartyeates (talk) 06:45, 9 April 2024 (UTC)Reply
    @Stuartyeates This is a good suggestion. What we're questioning is whether such standards are in practice widely available in different geographies and languages. For example, while ethics/IRB boards are pretty much standard bodies in certain geographies, I have personally seen research/academic institutions in different parts of the world that don't have them, or if they have them, their standard of practice varies significantly when compared to some other institutions. So something like "if it's available to you, make sure you utilize or follow them" is something we can comfortably encourage in the paper. I'm not sure if we can say beyond that. LZia (WMF) (talk) 23:50, 17 April 2024 (UTC)Reply
    @LZia (WMF) maybe: Researchers are expected be aware of and following the ethics guidelines and codes of conduct of both there institution (if the have one) and of the prestige journals in their field. Researchers should be up front about ethics approvals they have obtained or plan to obtained for the planned work and the body granting those approvals. ? Stuartyeates (talk) 04:20, 20 April 2024 (UTC)Reply
  • There used to be some sentiment in the research documentation that researchers should try editing first. Try to be a part of the community before parachuting in. Is this still the recommendation? I know this sentiment is strongly supported in other online communities. That is, take the "external" out of external researcher. Zentavious (talk) 19:28, 18 April 2024 (UTC)Reply
  • There is a distinction between research ambiance and active support from editors. For the recommendations for editors, points 4-6 are about Wikipedians engaging with researchers which takes effort on editors part. Extending the idea of removing communication barriers, researchers should make it easy for individual editors to contribute/provide feedback to research. Additionally, we need some shared understanding of what lack of engagement means. What does it mean for a researcher to post on meta-wiki and receive no positive or negative feedback from Wikipedians? Zentavious (talk) 19:28, 18 April 2024 (UTC)Reply
  • ...

4. Where would you like to see future work on this topic? What opportunities should be highlighting for researchers to examine more deeply?

  • response_1
  • response_2
  • next_response
  • ...

5. Do you have ideas about how we can address one or more of the "TODO"s we have listed throughout the draft?

  • response_1
  • response_2
  • next_response
  • ...

Questions for researchers

In particular, from this group we are seeking input on the following questions:

1. What recommendations are unclear, and maybe therefore unhelpful, for you?

  • Not particularly unclear, just adding more detail: I would consider adding in the sections on PII (2.1, 3.2):
    • Links to the WMF definitions of personal data, as it provides a useful list of common types and how data that isn't typically considered PII can become PII when combined with other data.
    • The enwiki article on personal data actually seems quite thorough and useful as well, and bridges some US and GDPR laws
    • I would consider specifically naming IP addresses as PII, as many researchers do not think of it this way (possibly in section 3.2 where you list other PII).
    • For section 4.1.7, I would be careful not to conflate open source with privacy (the quick summary from enwiki) in this instance (particularly so that use of open source tools doesn't signify that a research participant shouldn't also consider the privacy implications of the research itself outside of the tool), though I agree open source tools are in general preferred for many reasons. In any case, whether or not the researchers use open or closed tools, they should communicate the relevant privacy policy and terms of use of the tool they're using. :) - TAndic (WMF) (talk) 15:47, 17 April 2024 (UTC)Reply
  • response_2
  • ...

2. What questions do you still have about privacy on Wikipedia after reading this?

  • As someone who's not done research with Wikipedia (yet!) and so would/will be relying this page pretty heavily for guidance, there were a couple of things that weren't clear that might be obvious to a seasoned Wikipedia researcher: first, are there existing privacy norms around the type of data collected? For example, would Wikipedians likely feel differently about collection and analysis of article content vs edits vs talk pages? Second, are the privacy considerations that qualitative, including ethnographic researchers should consider? (for example, an ethnographer using quotes from a talk page or an interviewer reaching out to people to talk?). Finally, is there anything that researchers should be doing to familiarize themselves with norms and values? Is it recommended that they lurk on talk pages? participate as editors? CAT SarahGilbert (talk) 23:32, 16 April 2024 (UTC)Reply
    • Adding here that I found this article to have some straightforward recommendations for qualitative researchers working with sensitive online data, and could be incorporated to help with part of this question (though I think there is no perfect resolution for online ethnography and discourse analysis that allows for direct quotes even without attribution): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4376240/ in the Conclusion, two listed strategies I've used in the past that I found especially helpful were:
      • "Devising elaborate strategies for disguise involving altering non-essential details."
      • "Presenting extracts from the same participant under multiple pseudonyms." (quoting the authors) -TAndic (WMF) (talk) 15:47, 17 April 2024 (UTC)Reply
    • Whether or not researchers/academics should create wikipedia accounts is something that I don't think we really have a defined best practice on. There are upsides such as being able to interview people but there are also downsides such as having to comply with our rules and expectations, a researcher who isn't an editor doesn't have to follow any of the rules which an editor has to follow. We can't/don't sanction people who don't edit wikipedia. If for example a researcher's work inherently includes conduct which the community would construe as off-wiki outing (for example historians or international security studies researchers) it would be inadvisable for that researcher to become a wikipedia editor either openly or semi-anonymously. Those whose work doesn't contravene any community standards or expectations should feel free to wade into the pool to do their research so to speak, my understanding is that for example ethnographic research in a semi-anonymous community would generally maintain that semi-anonymity in the final product. Horse Eye's Back (talk) 17:44, 17 April 2024 (UTC)Reply
  • response_2
  • ...

3. Do you have ideas about how we can address one or more of the "TODO"s we have listed throughout the draft?

  • response_1
  • response_2
  • ...

Additional comments and feedback

To all reviewers and commenters, please try to organize your feedback by either adding it to an existing topic on this talk page, or by adding new topics so we can keep discussions around similar topics organized.

We will be monitoring this page until 30 April 2024, but won't be able to respond directly to comments. However, all comments will be reviewed and considered in the ongoing drafting and revising process.

If you are more comfortable leaving comments in a language other than English, please feel welcome to do so. Please note that we may utilize machine translation in reviewing non-English content. Thanks for your feedback!

Existing guidance

First, why hyperlinks to said guidances or Wikipedia articles if they have them are not implemented? There are footnotes, but the lack of hyperlinks is suprising. See ex. en:Common Rule. Second, in addition to linking to Ethic Committee bodies, we should link to the relevant ethic codes, directly, and or quote from them. This WP currently links, for example, to APA's "Advisory Group on Conducting Research on the Internet" ([2]), which is good, but it should also link to APA's "Ethics Code Standard" [3]. Further, I think it would be good to quote the relvant (short) parts of existing codes. Here, from APA: Psychologists take reasonable steps to avoid harming... research participants... I'd add two more I've found a while ago: Royal Historical Society's Statement on Good Practice: taking particular care when research concerns those still living and when the anonymity of individuals is required and ASA's ethical code Sociologists take all reasonable steps to implement protections for the rights and welfare of research participants as well as other persons and groups that might be affected due to the research... In their research, sociologists do not behave in ways that increase risk, or are threatening to the health or life of research participants or others. RHS's is particularly relevant as the paper that led to this WP's being drafted was wrtten by historians. Piotrus (talk) 01:09, 11 April 2024 (UTC)Reply

In agreement with Piotrus here. It is fine to have "Follow existing human subjects research protocols at your institution" but it should also have "Obey the ethical guidelines established by the relevant professional bodies", or similar, with examples. Zero0000 (talk) 01:52, 11 April 2024 (UTC)Reply

Other remarks (by Piotrus)

Overall, I am quite impressed with this WP.

I think we should clearly state somewhere (perhaps in the abstract/nutshell as well as recommendation and conclusion) that TL'DR best practice is to not name anyone unless they have permitted that. Instead, researchers should refer to volunteers as User1, Wikipedian-B, etc.

It would be good to spell out somewhere that for volunteers who disclose their real name on Wikipedia, privacy concerns exist as well (as in, they should not be named in a paper unless permission has been given). What to do when a researcher wants to link to a diff by such a user is a question to discuss further.

Regarding For some, possibly even many, editors, an attack on their username may be perceived as a serious personal attack, one on par with an attack on their real-world name and identity. - if you want an academic citation, I am pretty sure this was discussed in en:Common Knowledge?. Ping User:Pundit, the author, who may be able to quickly provide the relevant chapter/page info.

I definitely agree that veteran nicknames are treated like names, they are a source of one's identity and should be treated with respect. I would not quote nicknames, unless from widely known public discussions or when it is essential to give the name. Pundit (talk) 18:14, 11 April 2024 (UTC)Reply

I am curious about what can go to "Escalation avenues". Maybe consider linking to en:Committee on Publication Ethics here, for example - but many journals (including the one that sparked this WP) are not members of COPE. Writing a letter to the journal, or publisher, can be mentioned, but the reality is such letters are likely to be ignored. Support from WMF is somewhat theoretical - WMF declined to comment on the said paper (that led to this WP), for example.

While as I said, this WP is overall a very good start, I do think one key aspect is missing (partially related to the outlined but not yet written section on "Escalation avenues"). If one feels that their privacy has been violated by a piece of research, what can they do - and what support, if any, can they expect from WMF? Something I would like to see is a public WMF system where people could ask WMF for help, publically, and where WMF would respond in public. For example, I think WMF should publically comment, when asked, on non-controversial issues such as stating whether a particular research paper followed best practices such as anonymizing volunteer names, asking them for permission to be named, whether a paper passed through a relevant IRB procedure, and assist in writing a letter to the journal expressing concerns if such best practices where not followed, and if the journal declines to publish such a letter, publish it on WMF's pages. Piotrus (talk) 01:31, 11 April 2024 (UTC)Reply

I think its debatable whether best practices would extend to not naming editors in circumstances when naming them carries legitimate public interest and academic value. Horse Eye's Back (talk) 16:13, 13 April 2024 (UTC)Reply
One person's public interest and academic value is another person's harassment and trolling. Current WP already correctly observes that "Consider how any research narratives around individual editors could be leveraged by malicious actors. Even if the researcher’s intent is good, their analyses - especially any that include narratives meant to explain data that focus on specific editors - have the potential to be leveraged by malicious actors for doxxing. Researchers should make an attempt to have their reporting reviewed by research participants whenever possible as editors may be able to flag dangers that researchers may not identify." as well as "Be aware of anti-privacy in a sheepskin. Agendas may be pushed under “investigative journalistic work” and “doxxing for good”. Researchers should proceed very cautiously in this area. If you’re encountering a potential situation in which you or others are considering the applicability of “doxxing for good” or anything in that name, it is best to escalate the concerns to project administrators." Related to this is "Consider the cost of your action on Wikipedia. (TBD)", although I think in some cases problematic "research" (aka activism/harassment for greater good) does it and has the real intent of hounding some editors and making them retire from Wikipedia. A key aspect of this WP is to say that Wikipedia community does not endorse outside parties (including researchers) engaging in “doxxing for good”, and hopefully we will develop mechanism for publicly criticizing such papers. Electronic Frontier Foundation's Takedown Hall of Shame [4] comes to mind. en:Censorship by copyright (can't believe we were missing this article until few days ago...) is an issue conceptually similar to what some “doxxing for good” "research" tries to achieve (censorship by doxxing/shaming/harassment?) - influence the content of articles, here, by attracting partisan editors and chasing existing ones away. Piotrus (talk) 23:39, 13 April 2024 (UTC)Reply

Thanks for this conversation and feedback so far. I find it helpful to sometimes be able to show folks the two ends of a spectrum and then help them see the full spectrum and the grey zone they should navigate. In the case of this paper, and particularly when it comes to handling PII, as an example names, we have said (and we will say) the things researchers should not do, and the things they should exercise caution about. Can we give clear guidance on when it is okay to share this information? Of course one scenario is if the person has given consent. Are there other scenarios?

One of our wishes for the paper is that at the end, it reads as a welcoming space that is helping researchers do better by providing clear guidance as much as possible. As you're thinking about it, if you have ideas that can help us achieve that, please let us know. --LZia (WMF) (talk) 00:12, 18 April 2024 (UTC)Reply

Conversation Hour to Gather Feedback

Join us for a Conversation Hour on 23 April 2024 at 15:00 UTC to share feedback. This conversation will be guided by some questions to encourage actionable feedback. Join via Google Meet. JKoerner (WMF) (talk) 16:40, 16 April 2024 (UTC)Reply

General Remarks

Section 1.3 talks about existing regulations for human subjects under the common rule. While many projects do involve human-generated data, often times IRBs won't consider observation forms of research as human subjects. I believe this point is brought up later section 2.1, but I think that is an important motivator for these sets of guidelines.

+1 to the idea that there are limited efforts to define best practices for community engagement. My experience has been very back of the napkin action research of actively talking to people and following their recommendations, which in turn leads to talking to more people with other recommendations. These efforts have been fruitful, but it definitely isn't uniform between projects and is still at risk of violating norms.

I noticed some different usage of Wikimedians and Wikipedians. This may be a moot point, but in what ways are these the same vs. different?

+1 to learning about a future course for onboarding researchers.

Finally, I was a little confused by this line. "As for researchers, by providing support for more to get involved and proceed with confidence with investments of time and effort into studying the projects, the number of researchers can be increased which can in turn also lead to increased awareness of the projects and their value through the dissemination of such research, in turn resulting in further additional benefits to the projects." Is this pointing out a positive cycle when researchers increase the capacity of editors, it will support more researchers in the community, and so on...? Zentavious (talk) 19:47, 18 April 2024 (UTC)Reply