Wikisource:Bot requests

From Wikisource
Jump to navigation Jump to search
Bot requests

This page allows users to request that an existing bot accomplish a given task. Note that some tasks may require that an entirely new bot or script be written. This is not the place to ask for help running or writing a bot.

A bot operating performing a task should make note of it so that other bots don't attempt to do the same. Tasks that are permanently assigned or scheduled for long-term execution are listed on Persistent tasks.

See also

Unassigned requests

Years of works

As I update many {{PD-old-70}} to show why PD in the USA, I sometimes move categories of years of works into the {{header}} with the parameter "year=", but I would like to ask if anyone can make a bot to do this task, such as converting [[Category:1900 works]] to | year = 1900 in the header. Manually changing these is time-consuming.--Jusjih (talk) 02:11, 14 October 2009 (UTC)[reply]

Had a start at this, and you can see the AWB code that I used at User:SDrewthbot/AWB_modules#removing_Category:.5Cd.5Cd.5Cd.5Cd_works.2C_replace_with_year_in_headerbillinghurst sDrewth 14:47, 21 October 2010 (UTC)[reply]
Finished the recurse categories of Category:19th century works, and working through the 18thC now. I do have a query, what do we want to do with works in the top level of that category, ie. no specific year of the work? My initial thought would be that as they are non-specific that adding them as a year is unproductive and they should be left as they are. I have converted those that are within a decade to YYYYs works as that seemed close enough. — billinghurst sDrewth 01:14, 3 November 2010 (UTC)[reply]
To keep track of progress see User:SDrewthbot
To note issues and thoughts that have been raised with me about some of the varied uses of category addition
  • Adding category to subpages. Solutions: Where the work is a periodical to keep the year parameter; where the work is chapters, or similar, to remove the year; where the work is an addition of an earlier work to keep the year category
  • Where categories are added to subpages of works where the original work is later incorporated into a collection. Solution: to add the category manually rather than utilise the year parameter which could be misleading, especially due to its placement against the title rather than the section.
  • Where a work may have been published over as a series (sometimes over years) in a journal, so where parts are transcluded together multiple years are required. Solution: add the first year as year parameter and subsequent years to be added manually.
  • Where Volumes of a work are created over multiple years (and added as subpages). Solution: to add year parameter to each of the volumes, and retain whatever has been used at the top level of the work,
Added here to record findings against request. — billinghurst sDrewth 12:05, 6 November 2010 (UTC)[reply]

Tag all pages that use {{use page image}} as "Problematic"

Pages that use {{Use page image}} are Problematic by definition. This list of Page: pages that use that template contains a great many pages that are tagged otherwise. Hesperian 01:02, 9 February 2010 (UTC)[reply]

and I would say that the term problematic itself adds to the confusion. They are not problems, they are unresolved or pending work. That said, I would agree that it is probably the closest along with non-proofread. I would prefer that we found an image processing slave, rather than a bot job. billinghurst sDrewth 12:50, 9 February 2010 (UTC)[reply]
Template:Book cover is a variant of this template, and I have proposed that template for deletion, see Wikisource:Proposed deletions#Template:Book cover. — billinghurst sDrewth 01:26, 3 November 2010 (UTC)[reply]
Done Beyond a specific batch under discussion, they are either marked "Problematic" or the template has been removed from the respective pages. — billinghurst sDrewth 06:17, 3 November 2010 (UTC)[reply]

Move Wikisource index pages

As per Wikisource:Scriptorium#Attempt_at_a_summary, it would be helpful for a bot to move all pages in Category:Wikisource index pages to the portal namespace, except Wikisource:About, Wikisource:Index, Wikisource:Index/Tools and scripts, Wikisource:Index/Community, Help:Contents, and Wikisource:Authors . The pages that are being moved should have their names maintained, so for example Wikisource:Virginia becomes Portal:Virginia. The old pages should be converted to soft redirects. —Spangineerwp (háblame) 23:25, 10 June 2010 (UTC)[reply]

Not done AdamBMorgan (talkcontribs) is undertaking these manually as he provides an underlying structure to the Portal: namespace. — billinghurst sDrewth 13:42, 7 December 2010 (UTC)[reply]

Marking index pages as 'Not proofread'

Asking for the bot to process Index:A Study of Mexico.djvu to save and assign 'Not proofread' to the pages. Thanks. - Ineuw (talk) 15:17, 28 June 2010 (UTC)[reply]

Comment. That would be a bot that applies the text layer, not specifically to mark as not proofread. Last that I knew that did that was User:mjbot, though it has generally not been required now that the text layer can be added as the pages are created, which was not the case when the bot was created. — billinghurst sDrewth 06:20, 3 November 2010 (UTC)[reply]

joining lines considering hyphens when it imports a Page form OCR

Please set Ryuchbot for this purpose. --Ryuchbot (talk) 07:28, 20 July 2010 (UTC)[reply]

Please see the information and process at Wikisource:Bots. — billinghurst sDrewth 14:01, 20 July 2010 (UTC)[reply]

Move pages of poorly named DJVU file

The pages at Index:Whatsocialclasse00sumnrich.djvu have all been proofread. The DJVU file should be moved to File:What Social Classes Owe to Each Other.djvu on Commons. Once that is done, the index can be moved along with all the corresponding pages. Let me know if you need assistance on Commons (I don't think moving files requires admin rights, but if it does I can do so). —Spangineerwp (háblame) 15:07, 30 August 2010 (UTC)[reply]

I am not aware of any bot tool that we have that moves files. Happy to learn if one exists, and how we can get to use it. — billinghurst sDrewth 06:28, 3 November 2010 (UTC)[reply]
Moving the files isn't what the bot is needed for; just the page moves: I can move the file if someone's bot can then do the page moves. —Spangineer (háblame) 19:39, 3 January 2011 (UTC)[reply]
My bot can move these pages: so I'm now moving the .djvu.
FYI, This tool can move some files from one project to another. JackPotte (talk) 20:05, 3 January 2011 (UTC)[reply]
The 180 pages are now being moved... JackPotte (talk) 19:21, 4 January 2011 (UTC)[reply]
Done JackPotte (talk) 20:00, 4 January 2011 (UTC)[reply]
Wonderful, thank you! —Spangineer (háblame) 22:21, 4 January 2011 (UTC)[reply]
I have been through and fixed up the transclusions. We had blank bodies in the main namespace. Oopsie. — billinghurst sDrewth 23:37, 4 January 2011 (UTC)[reply]

Add alternate URLs for US Federal Case Law

I noticed that most of the US case law that we index (e.g. Federal_Reporter/Second_series/Volume_488, etc.) is available at bulk.resource.org (e.g. http://bulk.resource.org/courts.gov/c/F2/488/ ) and Justia (e.g. http://cases.justia.com/us-court-of-appeals/F2/488 ), as well as OpenJurist (who took the initiative to make the index pages). Considering that bulk.resource.org were the first to make it available free on the 'net, we should certainly include them -- and we might as well include Justia, too. JesseW (talk) 08:17, 21 October 2010 (UTC)[reply]

Include them to what end??? To bypass the eventual importation of all case opinions to Wikisource - as is tha case with the United States Reports currently in progress???. Granted without Archive.org existing, BenchBot could not authomatically whittle down the pile of work as it does now but its not like they aren't littered with mistakes & ommissions over there either.

Any interlinks you've seen go from red to blue in any of the other reporters currently listed and framed on WS just happen to have the same case names as ones being created for the United States Reports part of the USSC Project. These will all require disambiguation but we're just not at that point yet.

Sorry; I just don't follow the reasoning for wanting to add something that is, in effect, going to be superseded by the Wikiproject(s) anyway, but I'm sure other folks will review this request irregardless. George Orwell III (talk) 10:42, 21 October 2010 (UTC)[reply]

Er, no -- of course it's better to have our own, well-curated copy -- but the two don't seem opposed. It's good to link to useful sources of information (as these are, particularly the bulk.resource.org ones which seem easiest to read, with less districting stuff added on the side), and it's good to import it into Wikisource. i guess I "just don't follow the reasoning" for opposing linking to multiple copies of material on the basis that we will (eventually) also have a local copy. JesseW (talk) 02:25, 22 October 2010 (UTC)[reply]
Maybe we've talked past each other here. I have absolutely NO objection to linking external sources - quite the opposite actually. My problem is 2-fold. The case reporter that was the most furthest along was the United States Reporter even before I wound up here. The person (or people?) who set up most of the case lists & volumes, transcribed a nice chunk of case opinions and added many other useful portions related to US Law that the current project people are still using or improving today didn't make things easy. The one thing that in retrospect was a rather large oversight was to use full URLs -- like the ones below pointing back to OpenJurist below...
* [hftp://openjurist.org/37/us/1   37 U.S.  1]  (1838)  [[United States v. Laub]]
* [hftp://openjurist.org/37/us/11  37 U.S. 11]  (1838)  [[Lessee of Swayze v. Burke]]
* [hftp://openjurist.org/37/us/27  37 U.S. 27]  (1838)  [[United States v. Woolsey]]
* [hftp://openjurist.org/37/us/32  37 U.S. 32]  (1838)  [[Bank of the United States v. Daniel]]
* [hftp://openjurist.org/37/us/59  37 U.S. 59]  (1838)  [[Bradstreet v. Thomas]]
* [hftp://openjurist.org/37/us/66  37 U.S. 66]  (1838)  [[McKinney v. Carroll]]
* [hftp://openjurist.org/37/us/72  37 U.S. 72]  (1838)  [[United States v. Coombs]] ....etc etc
... and did the same for nearly every inline citation, opinion, footnote, dissent, law review and similar with no rhyme or reason for others to adhere to moving forward. Not only are the full URLs used but they can jump around to the different legal sites from one sentence to the next and then to a third in the one after that!!!!
A template should have been used early on that automated and enabled a minimal window into doing some upkeep by modifying the values in one template if need be rather than thousands of citations across all the cases. Adding to this current state with what you described only introduces another level of entries that will add to the BOTs workload while at the same time introducing a greater chance of botching the conversion(s) in the process.
The other issue is the lack of such templates. The only template that does anything like what is needed on Wikisource right now is {{Federal reporter}} -- and that isn't even setup for all the different external sites last I checked. We need to import, modify then validate something like Template:Ussc currently on Wikipedia {{USSC}} to over here on Wikisource for starters before we even get to throwing a BOT at it. Wouldn't you agree? George Orwell III (talk) 06:44, 22 October 2010 (UTC)[reply]
I certainly do. Thank you so much for clarifying! That is indeed a depressingly convoluted mess, and should certainly be straightened out before we set a bot on further confusing things. If/when I have some time, I'll see what I can do to help. JesseW (talk) 19:07, 22 October 2010 (UTC)[reply]
I am glad we are on the same page too! Pitching in is the most helpful thing you can do while this request stays 'Open'. As long you see your entry on this page, the request is open to anyone willing to address it before the folks over on the USSC Project (& BenchBot) do. While the lower amount of regular users compared to Wikipedia has many advantages for Wikisource, the honest truth is that this ties up limited resources like BOTS and these requests can take a long time to get to (if at all). In the meantime, helping out is always appreciated and if you run into a roadblock or have a question, stop by the Law Portal or USSC Project page - drop me a note directly even if need be. George Orwell III (talk) 22:44, 22 October 2010 (UTC)[reply]

Template:PageQuality

The following discussion is closed and will soon be archived.


Need to run a bot through to save all pages that utilise the template:PageQuality. This will convert all the pages to use <pagequality>. (non-urgent) — billinghurst sDrewth 13:21, 7 December 2010 (UTC)[reply]

As far as I could see this template isn't called directly in all pages: the function pageQuality() from MediaWiki:Common.js does it. Consequently it should be wiser to ask to ThomasV what exactly he had in mind. JackPotte (talk) 19:32, 3 January 2011 (UTC)[reply]
{{PageQuality}} is the old style, and obviously a template. ThomasV has converted to the tag <pagequality> for better measures and updates. As pages with the old template are saved, mw:Extension:ProofreadPage automatically converts them. The bot process is to just go through and save each file, it is about getting around to it. — billinghurst sDrewth 12:22, 4 January 2011 (UTC)[reply]
Tens of thousands of edits, for what purpose? I like the idea of doing this for 'Validated' and 'Without text' pages, as presumably these will only get updated to the new tag if a bot does it. But 'Not proofread', 'Proofread' and 'Problematic' pages are by definition awaiting further edits, so there seems no need to do anything but wait patiently. Hesperian 12:34, 4 January 2011 (UTC)[reply]
I didn't do the detail at the time, it was a parked thought bubble. I have already done the old "without text" when I was checking those 18k of files looking for mislabelled text and images. I like your suggested plan to do the validated, and then look and review. — billinghurst sDrewth 12:59, 4 January 2011 (UTC)[reply]
I can provide a list if you want. Hesperian 00:49, 5 January 2011 (UTC)[reply]
ThomasV has been doing it. Billinghurst (talk) 15:53, 30 March 2011 (UTC)[reply]
Done there was some remainder and I have wiped them away. — billinghurst sDrewth 12:38, 6 May 2011 (UTC)[reply]

Template:Dropcap & Template:Dropinitial

Can all instances of {{Dropcap}} be replaced with {{Dropinitial}}? Dropcap has been replaced but is still in use. - AdamBMorgan (talk) 19:45, 2 March 2011 (UTC) Done and converted it to a redirect. — billinghurst sDrewth 22:52, 7 May 2011 (UTC)[reply]

Template:Portals, Template:Indexes & headers

The function of the templates {{Indexes}} and {{Portals}} have been replaced with the

| portal = 

parameter in the headers. Can all uses of these templates therefore be replaced by bot with the parameter? This will involve replacing pipes (|) with slashes (/). - AdamBMorgan (talk) 19:45, 2 March 2011 (UTC)[reply]

Underway, code for AWB custom module at User:SDrewthbot/AWB_modules. It is for indexes, though will be readily modifiable for portals. Billinghurst (talk) 13:41, 29 March 2011 (UTC)[reply]
both Done Billinghurst (talk) 14:33, 30 March 2011 (UTC)[reply]

There are a few minor issues with some pages in the Portal: ns which I have left a separate note for ABM. Otherwise this looks complete and the templates can be removed. Billinghurst (talk) 15:55, 30 March 2011 (UTC)[reply]

Mass deletion of not-proofread pages

Previous djvu file into Index:Horse shoes and horse shoeing.djvu was not complete; I replaced it and I manually aligned text of proofread pages; now remaining, not proofread pages could be deleted (as suggested by Billinghurst): Please delete:

Thanks. --Alex brollo (talk) 06:43, 2 February 2011 (UTC)[reply]

I'll take it. Script will be posted to User:TalBot/horseshoe-delete.py soon.--GrafZahl (talk) 09:11, 2 February 2011 (UTC)[reply]
The script is ready. Unless there are exceptions objections, I'll start it tomorrow (UTC).--GrafZahl (talk) 09:58, 2 February 2011 (UTC)[reply]
Thanks. I'll be more careful next time while choosing the best djvu file. :-( --Alex brollo (talk) 10:08, 2 February 2011 (UTC)[reply]
No prob, and of course, I meant "objections" above.--GrafZahl (talk) 10:16, 2 February 2011 (UTC)[reply]
Thanks :-) --Alex brollo (talk) 11:22, 3 February 2011 (UTC)[reply]

Done You're welcome.--GrafZahl (talk) 12:35, 3 February 2011 (UTC)[reply]

Bot Request: Save DJVU pages

Index:Os Lusíadas (Camões, tr. Burton, 1880), Volume 1.djvu could stand to have a bot run over it to just save each of the 300 pages as "Not Proofread", showing only the OCR text. Isn't there a bot to do this? TheSkullOfRFBurton (talk) 02:40, 31 March 2011 (UTC)[reply]

Not done No readily available script and no designated discernible value in applying text layer. — billinghurst sDrewth 23:45, 10 April 2011 (UTC)[reply]

A bot to remove the ? marks

Do we have a bot which removes the w:replacement character" (U+FFFD, ), emnbedded at the beginning of the text lines of not proofread pages? If we have one, would it be possible to clean up the PSM Pages? Thanks. — Ineuw talk 20:14, 10 April 2011 (UTC)[reply]

We could run a bot through them, however, if it is pages that you are going to edit, and the characters are just of nuisance value, then it is probably more worthwhile creating a regex script either part of a general cleanup script or as a separate script. That you could just run (click) that when you edit the pages. Running a bot through lots of non-proofread pages to remove a few characters and leave the other text otherwise unaffected is not the most productive. — billinghurst sDrewth 23:43, 10 April 2011 (UTC)[reply]

Please don’t bother with a script because I remove the automatically when editing. My concern was that, (article title pages excepted), I am assembling not-proofread pages as the titles are harvested, and this would have made the articles more decent looking for the casual reader. Am aware and agree that unfinished pages should not be posted on the main namespace, but this puts me in a sort of a Catch-22 situation and this idea was a stop-gap idea. BTW, thanks for the quick reply.— Ineuw talk 00:38, 11 April 2011 (UTC)[reply]

Assigned requests