Jump to content

Toolserver/New accounts

Add topic
From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by John Vandenberg (talk | contribs) at 00:38, 16 October 2007 (→‎en:User:Jayvdb: new section). It may differ significantly from the current version.
Shortcut:
TS/A

Before requesting

  • The toolservers are under the legal responsibility of Wikimedia Deutschland, and their use is restricted according to EU and German laws.
  • On a day to day basis, decisions about users who will receive accounts are delegated to DaB
  • At present, the toolservers do not have direct access to page text, due to Wikimedia's use of external storage clusters. Requests to run projects which depend upon fetching a lot of page text might be postponed.

When requesting

We need to know

  • Who you are. Please provide a link to your user page on your home wiki project; we can follow this and poke about a bit to find out who we're dealing with, e.g. check your involvement in projects, etc.
  • Why you want an account. A short description of what you want to do is needed, and any links to existing sample projects or code, or even working utilities, are very useful.

Next time accounts will be accept: Thursday, 18 October.
Please add your request 3 days before this date at minimum.

Requests

Hello. I'm a little prgrammer from germany and I like to make webapps, and because of i'm interested in wikis and free software, i think its funny and good when I programm a little Wikipedia-Wikibooks-Wiktionary-and so on tool which give out some text. It shall analyze a question too. I can't speak English as good as you - but I hope you allow me to make a tool. --77.128.46.167 12:08, 7 June 2007 (UTC) Sorry - I had no Meta-Acc. Now I'm logged in --Nummer9 12:10, 7 June 2007 (UTC)[reply]

You can write your request in german if you like. Please say more clearly, what you plan. --DaB. 19:52, 12 July 2007 (UTC)[reply]

en:User:Hank

I am a programmer from Alaska. My Blog

I would like to create a simple personal learning app - would find the top 100 articles that the user hadn't read, and allow them to tag the articles as being read, interesting, uninteresting, etc. Would make suggestions based on past experience with articles. Would allow someone to read only the entries that probably interest them.

Sounds interesting, but are you still active? --DaB. 19:51, 12 July 2007 (UTC)[reply]


I'm interested in doing a tool to find the adding of specific text (who added that?, in which revision?). When it was recently added, there's no much trouble, but if you try to find who added this recently deleted image several months ago... you can become mad ;) I have little experience on MW code (they rejected bugzilla:5763 i may do it when i felt free enough). However, i have played a bit with extracting information [1] [2] [3] and JS [4]. Ah, and i run a bot when i'm in mood of then revising its changes. Platonides 17:11, 12 June 2006 (UTC)[reply]

That sounds nice, but a lack of simple access to page text might hamper your ability to do this. How would you work around that? robchurch | talk 01:43, 19 June 2006 (UTC)[reply]

Oh, i wasn't realizing that Daniel Kinzler == Duesentrieb. It's really a problem, as we're stuck on WikiProxy with the external storage problem. However, the Toolserver is still the better way to perform it. Would a limit time between queries need to be enforced? (it seems robust enough but it's better to have things secured) Platonides 22:11, 19 June 2006 (UTC)[reply]

You don't seem to have thought out the implementation with respect to the problems. robchurch | talk 16:32, 23 June 2006 (UTC)[reply]

The algorithm is pretty clear. That the get revision text step can't be done asking the DB but need to ask the WikiProxy is not a big change. It's more a Toolserver's problem than user's one. Moreover, the WikiProxy first looks into the text table, only asking through HTTP if it's not locally available. I don't understand what you mean. That could be achieved through other methods, like JavaScript, but in no way better. Platonides 16:51, 24 June 2006 (UTC)[reply]

My point is that WikiProxy is fine for requesting a few pages every so often. For applications which depend upon continuously fetching and comparing page text, it's usually better to use a database dump. Incidentally, that it is "a toolserver problem" is fairly germane; yes, it is a problem, and yes, we are trying to work out how to fix it, but we also expect some co-operation from our users. Excessive resource usage leads to processes and queries being killed; it's the same on any multi-user system. robchurch | talk 16:08, 2 July 2006 (UTC)[reply]

A database dump is fine if you want statistics, not if you want data about live wikipedia. So you would still need fetching the last week or so. I'm not sure if wikiproxy/db page retrieving already handles it or if it would need to be added at application layer. However, these dumps would only be available for large wikis (define large as you want: by dump size, number of revisions, number of tool queries...). Platonides

It seems the same feature was requested years ago at bugzilla:639. Can it be tried, instead of having theoretic arguments? Platonides 22:27, 15 July 2006 (UTC)[reply]


Hi!

I hereby request a toolserver account for some things:

  • I want to use my commons mover on toolserver after User:Duesentrieb looked over the code
  • I want to develop together with Cool Cat a script wich daily gets the images used on the main pages of the 10 major wikis and protects their image pages on Commons to prevent vandalism and reuploads (especially by the Penis vandal)
  • And I want to code a tool which checks all new pages made by IPs on dewiki against copyvios via CopyScape.
  • And at final I want to code a tool which checks vote legitimations for votes on dewiki so that no one needs to care if the users vote is legit or not.

Best regards, Klever 18:11, 7 July 2006 (UTC)[reply]

Why auto-protect them? To replace a image, the user must be autoconfirmed (or pass a similar account-oldness). For the same cost, they could completely edit many Main Pages. I'd vote for a image-watching advising such replacements (on #wikimedia-commons?) which are supposed to be very rare or the whoole system wouldn't make sense.
Be careful with the vote legitimations, to avoid false warnings on simple comments.
Platonides 20:53, 10 September 2006 (UTC)[reply]
The auto-protect was cool cat's idea. and in dewp, vote comments are illegal in vote section. greets, HardDisk 19:26, 16 September 2006 (UTC)[reply]
The user is informed, that his request is moved to the future. 
That's not a reject, but I have to watch the user (and his code) for longer. --DaB. 21:20, 20 September 2006 (UTC)[reply]

Hello, I would absolutely love to have an account, I want something special I can do, and I would love to develop tools. I understand the rules and am looking forward to developing tools for the Wikimedia project. If you have any questions, please contact me at en:User talk:Minun, cheers Minun 19:44, 30 July 2006 (UTC)[reply]

Please, reread When requesting. You're not saying Why you want an account so you have little chance to get it. Platonides 21:59, 5 August 2006 (UTC)[reply]
I'd like to help out other users, and take special parts in the project, and this is one of them. I've been thinking of a couple of ideas too. I have thought of perhaps a signature generator, and a user page generator, and some more advanced tools include a barnstar generator, and much more. Cheers Minun 18:53, 6 August 2006 (UTC)[reply]
Hello, Please first write down what you like to do. Because I need something to decide :). --DaB. 14:05, 7 August 2006 (UTC)[reply]
I would just be mainly taking a special part in the projects, here are some thoughs I made. Here are a few tools for users who make their personal content (user page, talk page, siganature, etc...)
  • A user page (and talk page) design generator
  • A signature generator

and I also have this tool in my mind

  • A barnstar generator (kinda like what users can use to make theirt own barnstars)

These are just what im thinking of, I will probably be able to make much more Minun 18:48, 7 August 2006 (UTC)[reply]

Those don't require database access or processing time on another server. 86.134.49.147 15:40, 21 August 2006 (UTC)[reply]

Please note that en:User:Minun has been blocked for one year by ruling of the arbitration committee. TDS 01:58, 24 August 2006 (UTC)[reply]

Its been appealed so please wait if your here to reject it for that exact reason, but nevertheless, I can still work on it before it gets lifted.

Now, regarding the anonymous user's question, i've scrapped the idea of the image generator as that may not require such use, but I will still keep the idea of the signature generator and the userpage one. I've also thought of a new one, its a tool that can sniff things they can do to improve, one for normal users to find out if their ready to be an administrator, which can sniff if someones reverted that users contributions, if that person needs to do more stuff, example discussions, deletion discussions, etc... and lots more. Minun 15:06, 30 August 2006 (UTC)[reply]

Of course neither I nor the toolserver is under the control of the en:-arbitration-Board. But the things you do would result in a block on each project I know. So you have lost a huge part of the trust of communities and so I can't imagine that someone would use your tools. So I think the request is going to wait until the trust in you is back and/or the block is finish. --DaB. 21:29, 20 September 2006 (UTC)[reply]



I running the german Wikipedia MP3-Podcast and an automatically gernerated OGG-RSS-Feed for spoken articles at de:Wikipedia:WikiProjekt_Gesprochene_Wikipedia/RSS-Feed. I host all of the services on my own server, causing me 6GB of monthly traffic and the need for approx. 2GB disk space. (only for german podcast) I have all together approx. 50.0000 monthly hits and about 150 listeners to the german MP3-Podcast. I'd like to expand my services to the international wikipedia. Beside hosting of the nesecessary scripts, causing 1GB Traffic and consuming about 0.5GB disk space (for a very short time audio converison cache OGG->WAV->MP3), I like to host MP3 versions of the original OGG versions of spoken articles somewhere (not nescessary on Toolserver). You will find more Information in German on my discussion page at de:Benutzer_Diskussion:Jokannes#Toolserveraccount. de:User:Jokannes 19:19, 19 January 2007 (UTC)[reply]

What about the MP3 patents? Within the United States, royalties are required to create and publish a MP3 file. I personally advocate for greater adoption of Ogg/Vorbis/Theroa and other FLOSS formats. Thanks, GChriss 20:03, 10 April 2007 (UTC)[reply]





Hi, I'm One, an administrator from the English Wikipedia.

I wish to run a bot that would fix common spelling mistakes in articles. I know it sounds simple, but think of how many times common misspellings like 'belive' instead of 'believe' or 'attourney' instead of 'attorney' are used in articles. Those add up, and I believe that it would be much more quicker and efficient to fix these common mistakes through a bot than through manually going to hundreds of different articles and manually changing them. 1ne 07:14, 25 July 2007 (UTC)[reply]

I'm afraid such a bot is often proposed but never approved on enwiki. I suggest you read w:Wikipedia:Bot policy which outlines the problems with automatic spelling correction and specifically prohibits such bots: "There should be no bots that attempt to fix spelling mistakes in an unattended fashion. It is not technically possible to create such a bot that would not make mistakes, as there will always be places where non-standard spellings are in fact intended." If you are interested in fixing spelling mistakes you might wish to look at using w:Wikipedia:AutoWikiBrowser which makes the process semi-automatic, you are required to review each change to ensure the context is correct. w:Wikipedia:Typo has some more info. Regards. Adambro 08:11, 26 July 2007 (UTC)[reply]
Do you have another idea? If your project has a rule against such a bot it is not a good idea to run such a bot on the toolserver :). --DaB. 22:49, 9 August 2007 (UTC)[reply]

Hi I would like to request a shell access to the toolserver. I want to create tool which shows up various interesting feeds from wikipedia/other sister projects. There are some useful feeds available for wikipedia articles, but many more can be useful. Fo example, a feed to read latest quote from wikiquote can be very useful en:User:Shabda 196.15.16.20 07:02, 27 July 2007 (UTC) Sample of my work en:User:Shabda[reply]

These kind of rss-feeds could be a problem because the toolserver can't (in legal way) offer article-content. So a feed "New quotes by Karl May found" would be no problem, but a best-of Kennedy would be a problem. --DaB. 22:22, 9 August 2007 (UTC)[reply]
Aren't all articles of wikipedia GFDL liscenced? So how would creating a RSS feed for them be illegal? Can you point me to the correct policy, if I am in error on this? Thanks. 123.176.33.248 12:55, 12 August 2007 (UTC)[reply]
Also I would like to work with this en:User:Shabda
Hello. You did not do any updates here last time. en:User:Shabda
The problem is not the license, but that the toolserver is owned by the german verein and not by the foundation. The law in germany is other then in the USA for some time.
So if your feed contains only a list to wikimedia-projects-articels, all would be fine. But if your feed offer content (like complete quotes) that could be a problem. So please say, which type of feed you plan. --DaB. 14:11, 13 September 2007 (UTC)[reply]

Hello,

I would like a toolserver account so that I can work on (and share) an alternative to the current use of the querycache, which due to the LIMIT in queries that populate it, is constraining the usefulness of some "Special" reports for larger projects like en Wikipedia. This project would not require page_text fetching.

I proposed a patch earlier this year, which would add an indexed id to the cache, and thus allow returning paged results by index range scans (which are cheap regardless of the size of the cache) rather than the use of limit and offset queries (which become linearly more expensive with a larger cache). At least part of the reason for restricting the result size of the querycache-populating queries seems to have been that a larger cache was more expensive to query.

JeLuF (in bug 3419) expressed some concern that the inserts which populate the cache could also be expensive, and that the work should therefore be carried out on the toolserver.

I would like the opportunity to bring this work forward, building a working model of the solution proposed in the patch which would provide comparative metrics as well as an alternate series of reports to the "Special" pages on Wikipedia which are broken due to the current implementation of caching.

Please feel free to email me with any questions or whatever.

Thanks for your consideration, Shunpiker 14:52, 1 September 2007 (UTC)[reply]

AFAIU Jeluf said, the toolserver should be used when someone need data. I'm unsure if we have the capacity to offer unlimit special-pages for such big projects as enwp realtime. Perhaps we/you should start with a refresh every day. Would you agree? --DaB. 15:56, 27 September 2007 (UTC)[reply]

Hi, I'm an admin ad the German Wikipedia, and I've been a core PyWikipediaBot developer for several years. I'm running de:Benutzer:Zwobot (and, for interwiki placement, his brothers in > 100 other languages).

I'm the author of PyWikipediaBot's weblinkchecker.py. This is a tool to detect and report broken external links. See weblinkchecker.py for details. The problem is that this script uses a LOT of bandwidth. It will download any HTTP or HTTPS URLs linked to from all articles, about 50 simultaneously, including large files like images and PDFs. I don't know exactly how much traffic it causes, but if I run it on my home DSL line, I get some false positives (socket error) because the line is congested. CPU and RAM usage are minor, and I only need a few MB to store the found dead links.

Also, it would be interesting to improve PyWikipediaBot's MySQL features. At the moment, most scripts directly parse an XML dump, and could be heavily sped up by accessing a MySQL database. This is of course limited by the lack of full text access on the toolserver, but maybe some scripts (e.g. redirect.py) could make use of that limited MySQL access anyway. --Head 23:37, 12 September 2007 (UTC)[reply]

Why is it needed to download the hole file? Wouldn't it be enough for to just reas the HEAD-Informations from the server? --DaB. 15:33, 27 September 2007 (UTC)[reply]
Is it not possible to delete 4XX-status-links automatic in the article?--Luxo 18:16, 11 October 2007 (UTC)[reply]

I'm interested in doing some analysis of the meta-data (such as most changed page...) also similar to http://tools.wikimedia.de/~leon/stats/wikicharts/index.php?lang=de&wiki=enwiki&ns=alle&limit=100&month=09%2F2007&mode=view but with different time frames...etc.

I would also be interested in writing a tool to update the page_counter column under the page table by batch processing the apache logs, for the best of both worlds -- the system would avoid writing to the master DB server everytime a page is viewed, but accurate statistics would be available (trailing by 24 hours or whatever)

Thanks. --yourabi, 16 September 2007 (UTC)

We have no apache-data from the wikimedia-projects. And we haven't enough capacity to serve the page_counter-data for every access of a wikimedia-project. --DaB. 15:40, 27 September 2007 (UTC)[reply]




Hi, I am a graduate student at UCSC. I am requesting shell access to the toolserver to help create a real time, complete verison of the trust coloring demo for the English Wikipedia we have created. Thanks, -UCSC_wiki_trust

Would you offer the resultats on the toolserver or need you just data for your project? The first thing could be a problem, the secound not. --DaB. 21:51, 7 October 2007 (UTC)[reply]

I'm an amateur programmer. I ant to create (first) some scripts for Polish Wikipedia projects, then I'll think about some more usable scripts. Hołek ҉ 13:57, 1 October 2007 (UTC)[reply]

A bit more details ould be nice. --DaB. 21:52, 7 October 2007 (UTC)[reply]
First script would find articles with certain words in its title (i.e. "pl:Special:prefixindex/Znani ..."; it's an on-going case about lists of "famous" people) in i.e. selected categories and listify them. Pretty simple for starting with. Maybe i'll also repair some day interiot's opt-in in his counter, who knows... Hołek ҉ 16:48, 15 October 2007 (UTC)[reply]

I'm a pl.wiki sysop. I want to develop a tool for generating lists of articles based on user defined criteria and some additional modules for my pywikipedia bot (en:User:McM.bot, pl:User:McBot), which would need acces to SQL database, for articles quality control and statistical purposes, such as spellchecking (but only for detection, not for automatic fixing them). McMonster 22:45, 2 October 2007 (UTC)[reply]

For spellchecking you'd need the page text, which isn't available on the toolserver (at least at present). Alai 06:31, 4 October 2007 (UTC)[reply]
I can modify interwiki.py script, and do spellchecking at pages fetched for scanning for interwiki links, it shouldn't be a problem. But I want to focus on lists generator first. McMonster 16:05, 4 October 2007 (UTC)[reply]

Hi, I'm a it.wiki sysop and bureaucrat and I own a bot (it:User:Sunbot) with more than 30k edits. I would like to have an account on the toolserver to run some scripts (for example I am doing a find/replace task on every article of the Italian wikipedia, 350k+ articles) and I will finish faster with the toolserver account, because i cannot run the script 24/24 on my pc. In addition i could use this to run script such as "standardize_interwiki.py" and other tasks which require lot of time. Helios 12:42, 8 October 2007 (UTC)[reply]

I'm an admin at Arabic Wikipedia. I have spellcheck, interwikia and images bots these bots work at many Wikipedias and at Wikimedia Commons soon. I hope run these bots using ToolsServer. Thanks!--OsamaK 14:17, 15 October 2007 (UTC)[reply]

I'm an admin at English Wikipedia and a frequent contributor to English Wikisource, where I also run a bot. I have a successfully provided a few new features/fixes to pywikipediabot [5][6] and have outstanding patches.[7][8]

My primary motivation for an account on the toolserver is to set up web based tools to assist in the ease of uploading text onto Wikisource. The rationale for providing these tools on the toolserver is that many users are not able to access command line tools, either due to these tools not running on Windows, or because the command line interface is difficult. Some examples of tools planned are:

  • Running unpaper on common images at user request so that user can upload cleaner images
  • OCR'ing images on commons using software such as Tesseract, and giving the user wikitext to drop into Wikisource
  • extracting text from PDF and DJVU files; again, giving the user wikitext to drop into Wikisource
  • a Google Books tool to simplify obtaining and uploading the underlying images and text to commons and wikisource respectively. I have written a Greasemonkey user script which does a little of this.

Also, if no other solutions are presented, I would like to set up a bot to move mages from English Wikisource to commons as the "guideline" to that all images should be uploaded to commons, and this bot would benefit from running on the toolserver.

Thanks! John Vandenberg 00:38, 16 October 2007 (UTC)[reply]