Jump to content

Talk:CopyPatrol

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by DanCherek (talk | contribs) at 22:50, 10 April 2024 (→‎New CopyPatrol is live: Reply). It may differ significantly from the current version.

Latest comment: 1 month ago by DanCherek in topic New CopyPatrol is live
NOTE: This page may not be regularly checked. If you need prompt attention from the maintainers please ping a member of Community Tech.

Can the tool access paywalled full texts?

Curious whether this tool would detect violations like this from 2015 which copied from this source(you'll need to log in)? If not, have you considered whether the tool can be linked up with The Wikipedia Library to access full texts? Smartse (talk) 10:59, 19 December 2023 (UTC)Reply

@Smartse I tried it by copying that old version to Draft:Sandbox. CopyPatrol picked up the edit [1]. In the iThenticate-Report it shows that source as a 13% match. Nobody (talk) 13:16, 19 December 2023 (UTC)Reply
@1AmNobody24: Thanks for that - I see that percentage at 9% for link.springer.com, but looking at https://www.ithenticate.com/ I see that they do indeed have the full texts for many paywalled articles. Good to see that we should catch edits like this today, but I wonder how many we missed! Smartse (talk) 12:29, 21 December 2023 (UTC)Reply

Question about marking edits

When I encounter an edit that somebody else has already fixed (by removing content and adding copyvio-revdel tags, or by tagging for G12), should I mark the edit as "Page fixed" or as "No action needed"? I've been marking these sorts of things as "Page fixed", since it was a true copyvio and the page was fixed, but the use of you in If you fixed the problem, tagged the page for revision deletion, or tagged the page for deletion as a copyright violation, mark it as "Page fixed" is now giving me a bit of pause. — Red-tailed hawk (nest) 02:54, 21 December 2023 (UTC)Reply

@Red-tailed hawk I also mark those as Page fixed. You think something like If the problem is fixed, the page tagged for revision deletion, or tagged for deletion as a copyright violation, mark it as "Page fixed" could be better? Nobody (talk) 06:27, 21 December 2023 (UTC)Reply
I think the proposed text would work well, yes. — Red-tailed hawk (nest) 16:38, 22 December 2023 (UTC)Reply

User whitelist

Is that list still working? Cause this came in. Nobody (talk) 09:52, 22 December 2023 (UTC)Reply

Sometimes things slip through; I don't know why. Diannaa (talk) 15:35, 26 December 2023 (UTC)Reply

Is copy patrol down?

only 4 cases going back quite some hours enL3X1 ¡‹delayed reaction›¡ 21:34, 25 December 2023 (UTC)Reply

I'm not seeing any significant gaps, just a general slowdown. I guess people had something else to do on Christmas Day. Diannaa (talk) 15:34, 26 December 2023 (UTC)Reply

New CopyPatrol is live

I'm thrilled to announce the new version of CopyPatrol is now live at https://copypatrol.wmcloud.org. All existing links should redirect to the right place. Please join me in thanking @JJMC89 for his tremendous help in this effort. He probably deserves most of the credit here, but certainly all of it for the backend that he completely rewrote from scratch. The new backend should be much more resilient, with the sporadic downtime that we occasionally see hopefully being a thing of the past. In addition, the new frontend offers a number of new features:

  • Significant performance improvements
  • Edit summaries, change tags, and diff sizes
  • "Undo" or "revdel" links for users who have the requisite permissions

One notable change you might see is that the iThenticate reports no longer include the crawl date. Unfortunately this is outside our control. The Turnitin product team has been made aware of this feature request, so we hope it will eventually be reinstated.

Please let myself or JJMC89 know of any issues you see. At the time of writing, the backfill script is still running, so many older reports are missing. They should all be restored in due time. Additionally, we're still ironing out integration with mw:Extension:PageTriage. We'll mark phab:T333724 as resolved once all of the aforementioned has been completed.

This release also marks the conclusion of a formal agreement with Turnitin. This has been in the works since at least May 2022. Turnitin has been kind enough to give us free credits when we need them, but from a legal standpoint nothing solidified our relationship in the past. Now it is set in stone, and we have the reassurance that CopyPatrol is here to thrive for years to come. They were gracious enough to give us quite a bit of credits exceeding our current consumption, so we will soon be exploring adding more languages to CopyPatrol. On the front of negotiations with Turnitin, I'd like to thank @Ocaasi who started the conversations, and more recently my colleagues @SSpalding (WMF) from Legal, @JVargas (WMF) from Partnerships, my manager @KSiebert (WMF), and our new Lead Community Tech Manager @JWheeler-WMF.

Above all, allow me to thank all of you – our users – who are doing the actual work of helping cleanse the wikis of copyright violations. Your tireless efforts are what drove us to reaching this milestone.

Warm regards, MusikAnimal (WMF) (talk) 21:42, 9 April 2024 (UTC)Reply

Wow, I can actually feel everything loading faster (imagine my shock on discovering that marking the status of reports is now near-instant). The new features are great, could I share a little bit of feedback?
  • The undo button is really useful, but its location next to the diff button has led to me now clicking it unintentionally multiple times (maybe it could be moved down)
Other than that, everyone looks good. The leaderboard seems a bit funky, but I imagine that will be fixed with the backfill script. Isochrone (talk) 22:06, 9 April 2024 (UTC)Reply
It's so awesome to see how this technology and this partnership has evolved and matured. Congrats to everyone who has pushed it so much further!! Ocaasi (talk) 00:13, 10 April 2024 (UTC)Reply
The new version has many positive changes, such as the quick loading time and the expected reduction in outages. However, on the down side, I see that there's already 212 cases posted for April 10 and there's still three hours to go, so a projected 240 cases to assess in the 24 hour period. Given that most days we only have two people working the queue, this needs to be cut in half if that's possible. It's unrealistic and unstustainable to expect our tiny crew to keep up with the voume otherwise. (I can typically only clear about 20 cases per hour and can only commit to working on this for 3-4 hours per day.) Diannaa (talk) 21:20, 10 April 2024 (UTC)Reply
Yes, many thanks for the improvements! Very grateful. I agree with Diannaa that we may need some tweaks in terms of what the bot flags as a potential copyright violation as the threshold seems to have been lowered compared to before (one example I mentioned on her talk page was that it now flags cases where someone changes one or two words in a paragraph because it detects a match for the remaining text in the paragraph). Not sure we'll be able to handle the reports otherwise. DanCherek (talk) 22:34, 10 April 2024 (UTC)Reply
@Diannaa @DanCherek Thanks for all of the feedback! Can you link to specific example(s)? someone changes one or two words in a paragraph because it detects a match for the remaining text in the paragraph – wouldn't that still usually be a copyright violation, or do you mean the source is a backwards copy (in which case it's not a copyvio at all)?
Assuming the cases are still valid, my opinion is that it's perfectly fine to have a backlog. While it's admirable to aim for completeness, you can only volunteer but so much time. If however you're seeing a lot of noise, with backwards copies, or otherwise too many cases that are right on the "borderline", etc., we certainly can work to improve that. MusikAnimal (WMF) (talk) 22:45, 10 April 2024 (UTC)Reply
I'm seeing a lot of cases like [2], where someone copyedits a paragraph and then it matches the rest of the unchanged text to a backwards copy. We still had to deal with backwards copies in the old CopyPatrol, of course, but so far it feels like a lot more after the update. DanCherek (talk) 22:50, 10 April 2024 (UTC)Reply