Talk:Spam blacklist

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by Guido den Broeder (talk | contribs) at 20:34, 25 April 2008 (→‎usbig.net: and more). It may differ significantly from the current version.

Latest comment: 16 years ago by Guido den Broeder in topic Proposed removals
Shortcut:
WM:SPAM
The associated page is used by the Mediawiki Spam Blacklist extension, and lists strings of text that may not be used in URLs in any page in Wikimedia Foundation projects (as well as many external wikis). Any meta administrator can edit the spam blacklist. There is also a more aggressive way to block spamming through direct use of $wgSpamRegex. Only developers can make changes to $wgSpamRegex, and its use is to be avoided whenever possible.

For more information on what the spam blacklist is for, and the processes used here, please see Spam blacklist/About.

Please post comments to the appropriate section below: Proposed additions, Proposed removals, or Troubleshooting and problems, read the messageboxes at the top of each section for an explanation. Also, please check back some time after submitting, there could be questions regarding your request. Per-project whitelists are discussed at MediaWiki talk:Spam-whitelist. In addition to that, please sign your posts with ~~~~ after your comment. Other discussions related to this last, but that are not a problem with a particular link please see, Spam blacklist policy discussion.

Completed requests are archived (list, search), additions and removal are logged.

snippet for logging: {{/request|972895#{{subst:anchorencode:section name here}}}}

If you cannot find your remark below, please do a search for the URL in question with this Archive Search tool.

Spam that only affects single project should go to that project's local blacklist

Proposed additions

This section is for proposing that a website be blacklisted; add new entries at the bottom of the section, using the basic URL so that there is no link (example.com, not http://www.example.com). Provide links demonstrating widespread spamming by multiple users. Completed requests will be marked as done or denied and archived.

Multiple Turkish sites caught by SpamReportBot/cw

Caught by the bot and investigated further by Dirk and Jorunn. I'm consolidating the information here from 3 reports:

This has gone on for two years.

Account

































Domains






















Google Adsense ID: 0125465872104138


Related domains














































--A. B. (talk) 05:18, 20 April 2008 (UTC)Reply

I have closed the SpamReportBot reports, centralised discussion is better for this. --Dirk Beetstra T C (en: U, T) 14:13, 22 April 2008 (UTC)Reply
Thanks. I forgot to do that. --A. B. (talk) 18:01, 22 April 2008 (UTC)Reply

I've Added Added the three bot generated ones. I have no time to look in sufficient detail at the others at present so this is still open. I am not a fan of listed "related" domains pre-emptively --Herby talk thyme 10:35, 24 April 2008 (UTC)Reply

Multiple sites by Mikhailov Kusserow

User:



Links:

















They all reside on different IPs:

I am closing the 8 SpamReportBot reports (to keep that list clear). Discussion here please. --Dirk Beetstra T C (en: U, T) 14:18, 22 April 2008 (UTC)Reply

All in all, the links seem legit, though in some cases whole linkfarms were added in one edit. Mikhailov Kusserow has a userpage on many of the wikis I checked (SUL?). I have asked id:Pengguna:Mikhailov_Kusserow (which appears to be one of the bigger accounts) to help us out here. Awaiting discussion. --Dirk Beetstra T C (en: U, T) 15:03, 22 April 2008 (UTC)Reply
I agree that the links look relevant, but there appears to be a distinct bias here, which is not cause for blacklisting unless it continues egregiously, but is cause for concern. Very likely the link placement may be unwanted upon closer inspection, but that is not for us to decide here. – Mike.lifeguard | @en.wb 19:12, 24 April 2008 (UTC)Reply

More Pince spam since July 2007

Previous blacklist entry from July 2007


Previous related discussions


Subsequent new spam from related domains











































Accounts used for this new spam


en, eo, fr



since last blacklisting: en, fr (many other Wikipedias before that; see Talk:Spam blacklist/2007-07#lang.arabe.free.fr previous blacklist request)

--A. B. (talk) 02:39, 23 April 2008 (UTC)Reply




(ar, ru)



en, es, fr

--A. B. (talk) 11:48, 23 April 2008 (UTC)Reply


Done --A. B. (talk) 13:10, 23 April 2008 (UTC)Reply

Six links from cross wiki for discussion













There are several users involved, generally the users who are only active on one wiki seem to revert vandalism (as the bots involved). User on more than one wiki (not implying that they did something wrong):



But the link gets sometimes reverted while the Asikhi was not active with that link on that wiki. Maybe 'older' spammers (pre-database?).

Please provide some discussion, I am closing the reports and will point the discussions here. --Dirk Beetstra T C (en: U, T) 09:11, 23 April 2008 (UTC)Reply

As far as msapubli.com is concerned the link being placed "looks" relevant as it is quite long. However (for me) it redirects to the home page which appears to have no relevance to Wikipedia and contains the wor "affiliated" which makes me wonder. I will look far more closely at the others. This batch concerns me --Herby talk thyme 09:26, 23 April 2008 (UTC)Reply
I have real doubts about the validity of these sites to Wikipedia as a whole --Herby talk thyme 10:53, 23 April 2008 (UTC)Reply

e-library.net and related

buy-ebook.com →mirror of e-library.net
e-library.us →mirror of e-library.net
artdhtml.com →mirror of e-library.net

See also WikiProject_Spam case

Cross wiki spamming

Thanks, --Hu12 09:41, 23 April 2008 (UTC)Reply

Added Added, thanks — VasilievVV 17:03, 23 April 2008 (UTC)Reply
I re-checked all contrib links more carefully. I see only recent spam on en by 212.12.29.1 (on other wikis this apparently russian user haven't spammed, he only vandalized them and edited some articles mostly on sexual topics).  DeclinedVasilievVV 18:13, 23 April 2008 (UTC)Reply
Cross wiki additions were removed by me today from about 10 other wikis. Spammers are not going to re-spam pages they already seeded with their link. This is a long term problem spread out over a couple years which began in 2006. There is Clear evidence of cross wiki spamming particularly with 212.12.28.1. On en.wikipedia (I've blacklisted today) this user was active under 212.12.29.1 today. The project wide abuse has been demonstrated. --Hu12 00:02, 24 April 2008 (UTC)Reply
Hu12 is right. Spammers don't re-spam a page if there link is still there. We blacklist domains on Meta based on severity and breadth of spamming, not the age of the spam. I strongly recommend blacklisting across all projects, not just en.wikipedia. I also recommend blacklisting all the affiliated domains Hu12 has identified; that has been our standard practice here. Hu12, can you list them here? Also the other IPs and user accounts you found. Thanks. --A. B. (talk) 01:12, 24 April 2008 (UTC)Reply

e-library.net
buy-ebook.com
e-library.us
artdhtml.com
vlasta-tula.ru
istrodina.com
azubicard.de
architekturbuero-eisele.de
luftbildaufnahmen-vs.de
tula.net
seibold-ketterer.com
card3.de
jewishsinglesvideoconnection.com
tuvproductions.com
obivka.ru
bg2001.ru
lights-and-sounds.de
donau-don.info
promowatch.de
full list--Hu12 05:28, 24 April 2008 (UTC)Reply

Added Added now, thanks for reporting and sorry for I had initially declined it — VasilievVV 14:01, 24 April 2008 (UTC)Reply
For another perspective from the blacklist's most active admin, see Herby's comments about this one on my user page (I had asked for his advice):
--A. B. (talk) 13:57, 24 April 2008 (UTC)Reply
Thanks again. ;)--Hu12 14:27, 24 April 2008 (UTC)Reply

IMASEO Services (India) spam

IMASEO Services: linkspamming Wikipedia since 2005 despite requests, then warnings to stop and, finally, multiple account blocks:


Contact data
IMASEO Services
U 8/4 DLF Phase III
Gurgaon
Phone: +91-124-4152776


Spammed domains




  • Google Adsense 4799094371848660
  • SEO client


  • SEO client


  • SEO client


  • SEO client?


  • Google Adsense ID: 0288878065673786
  • SEO client?



Related domain



Accounts









Reference


Note: do not confuse this SemGuru with the unrelated Polish company, SEMGuru.pl; the Australian company, imaseo.net; or the IMASEO Contest. --A. B. (talk) 16:34, 23 April 2008 (UTC)Reply


Done --A. B. (talk) 16:38, 23 April 2008 (UTC)Reply

ledcalculator.net



Adsense pub-2941481051944299
See also MediaWiki talk:Spam-whitelist report (declined)

Cross wiki spamming


the webmasters other site myresistor.com is already BL'd. Thanks, --Hu12 14:24, 24 April 2008 (UTC)Reply

corfu-kerkyra.eu / zanteisland.com

See WikiProject Spam Item


Cross wiki spamming

.Thanks, --Hu12 14:07, 25 April 2008 (UTC)Reply

Proposed additions (Bot reported)

This section is for websites which have been added to multiple wikis as observed by a bot.

Items there will automatically be archived by the bot when they get stale.

Sysops, please change the LinkStatus template to closed when the report is dealt with. More information can be found at User:SpamReportBot/cw/about

These are automated reports, please check the records and the link thoroughly, it may be good links! For some more info, see Spam blacklist/help#SpamReportBot_reports

If the report contains links to less than 5 wikis, then only add it when it is really spam. Otherwise just close it, if it gets spammed broader the bot will reopen the report.

Please place suggestions on the automated reports in the discussion section.

List
User:SpamReportBot/cw/nakedafrica.net
User:SpamReportBot/cw/hnl-statistika.com
User:SpamReportBot/cw/therasmus-hellofasite.it
User:SpamReportBot/cw/prolococusanese.interfree.it
User:SpamReportBot/cw/rprece.interfree.it

Proposed removals

This section is for proposing that a website be unlisted; please add new entries at the bottom of the section. Remember to provide the specific URL blacklisted, links to the articles they are used in or useful to, and arguments in favour of unlisting. Completed requests will be marked as done or denied and archived. See also /recurring requests for repeatedly proposed (and refused) removals. The addition or removal of a link is not a vote, please do not bold the first words in statements.

members.lycos.co.uk

Hi,
the above-mentionned url was added on April 17 and it doesn't suit me: i need a link to members.lycos.co.uk/sfsk/ (the Manfred Wörner Foundation) in the Macedonia article of the French-speaking Wikipedia (reference). There's also a link in en:Manfred Wörner Foundation. I'm not able to assess the number of links to member.lycos.co.uk through all Wikipediæ but i find the article relevant and not spam. I will enquire as to whether the article could be found on another url. (:Julien:) 08:33, 19 April 2008 (UTC)Reply

Beetstra added it only for /davidbisbal. No ideas why it blocks whole domain. Commented out — VasilievVV 09:45, 19 April 2008 (UTC)Reply
Actually the members.lycos.co.uk url was added by Nakon (in full not only for /davidisbal). (:Julien:) 11:11, 19 April 2008 (UTC)Reply
I've rem'd it out for now. Some of the additions were not well thought out I'm afraid. This needs discussion please - personally I think it probably should be removed for now --Herby talk thyme 11:36, 19 April 2008 (UTC)Reply
I specifically did davidbisbal, the site (davidbisbalbrowser.com, commercial) is a redirect-site to members.lycos.co.uk/davidbisbal. I was thinking that the members site is spam-sensitive, but did not do the domain for that reason. There will be cross-wiki spamming of the domain, but in that case the specific urls need to be blacklisted. If it proofs in the end that it is out of hand, then the domain can be added, where then specific whitelisting can be used for certian sites (for en, most of it will fail en:WP:RS, en:WP:COI, en:WP:NOT, en:WP:EL &c. &c.)
I am removing the # from the davidbisbal rule on members.lycos.co.uk, that should not be the problem. --Dirk Beetstra T C (en: U, T) 19:10, 19 April 2008 (UTC)Reply
Yes I'd already fixed the site wide listing. I will remove members.lycos.co.uk completely from the blacklist in 24 hours if no one objects. Thanks --Herby talk thyme 07:40, 20 April 2008 (UTC)Reply
Hmmm. 99.9% of links to Lycos member pages are entirely inappropriate, in my experience. JzG 06:55, 21 April 2008 (UTC)Reply
Guy is almost certainly correct but there are currently 240 links on en wp alone. Listing this would cause some chaos. I think it should be raised on en wp to seek consensus at the very least. Removed Removed for now --Herby talk thyme 08:24, 24 April 2008 (UTC)Reply

logisticsclub.com

Dear Wiki and concerned person, the above mentioned url was blocked but I don't know why. This web site belongs to Logistics association in Turkey. Aim of this web site is to band together the logistics sector, university student and demandant sector of logistics and to provide information about logistics and transportation and warehouse, to announce related conferences and seminars. There are three different url for this club logisticsclub.com - logisticsclub.org - lojistikkulubu.com - loj*istikkulubu.org all of this url amount to the same web site, two of them Turkish web address. Consequently this site of Logistics Club is not spam, why all web address of Logistics Club are blocked. Now we want to add a subject "Lojistik Nedir? (in english What is the Logistics)" - www.logisticsclub.com/modules.php?name=News&file=article&sid=2 with web address into References section of http://tr.wikipedia.org/wiki/Lojistik this head. The preceding unsigned comment was added by Farukcaliskan (talk • contribs) 11:58, 22 Apr 2008 (UTC)

See request. --Dirk Beetstra T C (en: U, T) 12:05, 22 April 2008 (UTC)Reply

pl.net

It appears pl.net was blacklisted (see User:SpamReportBot/cw/pl.net) for the sake of a single spammed user page (a /~username/ page). This affects a link that is being used as a reference on WP, and I could ask for a whitelist exception, but the block on this domain seems too sweeping for the amount of damage it was causing. I suggest removing the blacklisting. Kellen T 13:39, 23 April 2008 (UTC)Reply

Indeed, looking at the spam report indicates to me that the user was fleshing out external links on the crosswiki links with some from en:WP, and wasn't spamming at all. Probably most of the domains that got blacklisted as a result of this should be unlisted. Kellen T 13:46, 23 April 2008 (UTC)Reply
Removed Removed, excessive listing for now. I'll check some of the others when I have time --Herby talk thyme 13:49, 23 April 2008 (UTC)Reply
Awesome, thanks Herby! Kellen T 13:52, 23 April 2008 (UTC)Reply

Letras Libres

Letraslibres(dot)com neither is spam nor may be used to bypass it, is just a magazine in spanish.— The preceding unsigned comment was added by 86.107.121.199 (talk)

It is not the site that is defined as spam. It seems to be a blog, and the link was inappropriately added to many wikipedia, see User:SpamReportBot/cw/letraslibres.com. --Dirk Beetstra T C (en: U, T) 16:35, 23 April 2008 (UTC)Reply
It is not a blog. It is an online journal or, perhaps better, portal that includes both a journal and a series of blogs, with some articles by fairly important figures in Spanish and Latin American literature and literary criticism. Perhaps, rather than blocking its use altogether, those who are inappropriately adding links could be cautioned (or even blocked if necessary). --Jbmurray 20:20, 23 April 2008 (UTC)Reply
Cross wiki blocking or warning is as yet impossible. I am removing the link per your analysis, and will link to this discussion on the report. Thanks. Done --Dirk Beetstra T C (en: U, T) 09:38, 24 April 2008 (UTC)Reply

Indymedia.org

Indymedia.org was added here for the spam report at User:SpamReportBot/cw/indymedia.org. It appears that this was just one IP adding a few articles to certain pages all on the same theme/subject matter (a story about a certain few politicians), but this is causing problems on the English Wikipedia as Indymedia is a very widely linked site there, and a reliable source on some articles. The addition actually said to remove within an hour, but that was 9~ hours ago. Can this be removed? Lawrence Cohen 17:13, 24 April 2008 (UTC)Reply

Removed Removed for now
Over enthusiastic link placement was the problem (& they kept going). As to time - it was only an hour or so back that Dirk placed it & I was watching! This will need some reviewing I guess --Herby talk thyme 17:29, 24 April 2008 (UTC)Reply
Acknowledged. It has been there for some time to stop an active spammer (please Werdna, bring us global blocking!). The editor is adding to top of list (not a sign of good faith), and was poked yesterday in two languages to discuss here, still today on these two languages they continued. After blacklisting the editor was also poked on the other two languages where they were most active, but as yet no reaction. I recognised the link was OK, hence the short period.
The IP was busy on 4 wiki (at least), maybe we should block the IP for a significant period of time on all 4 wikis and hope that that helps. --Dirk Beetstra T C (en: U, T) 17:44, 24 April 2008 (UTC)Reply
For what it's worth I've spam4im warned both the en wp IPs however the main one has constructive edits on so I would be reluctant to block it for now. As you say - global blocking would certainly help in such cases --Herby talk thyme 18:22, 24 April 2008 (UTC)Reply

usbig.net

This is a genuine, well-respected organization. It is not clear to me why it would be on the blacklist, and its appearance here renders editing several articles impossible. Guido den Broeder 20:25, 25 April 2008 (UTC)Reply

Same request for: basicincome.com, freiheitstattvollbeschaeftigung.de, globalincome.org. Guido den Broeder 20:34, 25 April 2008 (UTC)Reply

Troubleshooting and problems

This section is for comments related to problems with the blacklist (such as incorrect syntax or entries not being blocked), or problems saving a page because of a blacklisted link. This is not the section to request that an entry be unlisted (see Proposed removals above).

Discussion

Help needed

Dear all. Eagle 101 and I have been working on bots in the spam IRC channels (see #wikipedia-spam-t for talking, people there will be able to steer you to the other channels; #wikipedia-en-spam and #cvn-sw-spam). The bots are now capable of real-time cross wiki spam detection (and soon that will also be reported). It would be nice if some of you would join us there, and help us cleaning etc. as this appears to go faster than we at first expect (and I do get the feeling the en wiki is not a good starting point for finding them! --Beetstra 21:35, 22 March 2008 (UTC)Reply

Something interesting for ya all to look at. I'm going to work on making each link go to subpages, and have them updated in a way that we can comment on the subpages as well, and bring the ones that need blacklisting to the meta blacklist. I can't have the bot automatically post here, we would flood this list out, so we will have to look at them all and then link to them. Hopefully we can get all the reports in one place, the coibot reports etc. Folks more or less simple crosswiki spam is easily detectable. :) —— Eagle101 Need help? 22:55, 22 March 2008 (UTC)Reply
Bah, you probably want to see the subpage at User:SpamReportBot/test ;) —— Eagle101 Need help? 23:00, 22 March 2008 (UTC)Reply

Addition to the COIBot reports

The lower list in the COIBot reports now have after each link four numbers between brackets (e.g. "www.example.com (0, 0, 0, 0)"):

  1. first number, how many links did this user add (is the same after each link)
  2. second number, how many times did this link get added to wikipedia (for as far as the linkwatcher database goes back)
  3. third number, how many times did this user add this link
  4. fourth number, to how many different wikipedia did this user add this link.

If the third number or the fourth number are high with respect to the first or the second, then that means that the user has at least a preference for using that link. Be careful with other statistics from these numbers (e.g. good user do add a lot of links). If there are more statistics that would be useful, please notify me, and I will have a look if I can get the info out of the database and report it. The bots are running on a new database, Eagle 101 is working on transferring the old data into this database so it becomes more reliable.

For those with access to IRC, there this data is available in real time. --Beetstra 10:40, 26 March 2008 (UTC)Reply

Log weirdness

I guess it may be a caching issue but for me the log appears to end at July 2007? Editing gave me March 2008 but it ain't there now for me? --Herby talk thyme 12:16, 26 March 2008 (UTC)Reply

I've rv'd myself for now but something is going wrong??? --Herby talk thyme 14:21, 26 March 2008 (UTC)Reply
Looks to me like you put the log entry in the right section, I'm re-adding it for ya. Did you purge? ~Kylu (u|t) 16:28, 26 March 2008 (UTC)Reply
Agreed in a sense but just purged the cache & it cuts off at July 2007 for me (I even tried making it #March 2008 and got de nada). Is it just me - it has been "one of those" days :) --Herby talk thyme 17:02, 26 March 2008 (UTC)Reply
I don't see past July 2007 either :\ Mønobi 17:11, 26 March 2008 (UTC)Reply
https://wikitech.leuksman.com/view/Server_admin_log#March_26 - issues with the rendering cluster again (which would keep &action=purge from working) ~Kylu (u|t) 17:40, 26 March 2008 (UTC)Reply
Did the full ff purge & still have the same as Monobi today. I am recording the entries that I cannot log at present but I guess if this is not resolved soon alternatives of some sort may be needed. If anyone else finds (or does not find) the same it would be good to hear. Thanks --Herby talk thyme 08:46, 27 March 2008 (UTC)Reply
Leave me the log entries you want added on my talk, and I'll add them for you if you'd like. I can get around this problem. :) ~Kylu (u|t) 14:12, 27 March 2008 (UTC)Reply
Ok, sorry for archiving this. It looks like we hit some sort of limit. My suggestion is to make a second log page for the time being and start logging from that while the original bug is reported to bugzilla. —— nixeagle 02:54, 30 March 2008 (UTC)Reply

Hopefully sorted for now via Spam blacklist/LogPre2008. Of course this is a wiki so if anyone disagrees....:) Cheers --Herby talk thyme 11:38, 30 March 2008 (UTC)Reply

Crosswiki spam detection

Ok folks we can more or less detect any crosswiki spam addition. Wander over to User:SpamReportBot/cw. This is a report of all links added by only a few people across more then 3 wikis. Each section here is its own subpage, which means you can transclude them on this page, link to the specific section, etc. You can also comment on the subpages if you have further notes etc, such as "this is not spam because of X". Depending on what we all think of it, I'll transclude User:SpamReportBot/cw on this page. —— Eagle101 Need help? 00:31, 29 March 2008 (UTC)Reply

I'll also note that it automatically removes old items. Items should stay up for 2-3 days before being removed by the bot. (that is if no more links are added). If good links consistently come up, I'll come up with a whitelist mechanism that we can add links to if we deem the additions ok and we don't want to see the additions there. Please suggest improvements on how the bot reports. —— Eagle101 Need help? 01:30, 29 March 2008 (UTC)Reply
I started to blacklist a number of these and then stopped when I noticed the blacklist log is acting seriously weird. --A. B. (talk) 02:21, 30 March 2008 (UTC)Reply
Alright, thanks for your work. I'm going to continue to work on the bot and the algorithm being used, so noting false hits is important. The major one seems to be knowing accounts that edit a lot. I'll work on a fix to that tomorrow, I'm hitting the sack tonight. —— nixeagle 03:06, 30 March 2008 (UTC)Reply
Once again, Wikipedia is a better quality project because of hardworking and conscientious editors.--Hu12 13:50, 4 April 2008 (UTC)Reply

XRumer spam

Well, anyone who is involved in crosswiki spam, has at some point seen Xrummer (is the best!) spam. Now he hotlinks a thumbnail for his program, as seen on [1]. Code he's using:

X-Rumer is the BEST! 
 
<img>http://upload.wikimedia.org/wikipedia/en/thumb/6/6b/XRumer_screenshot.gif/200px-XRumer_screenshot.gif</img> 

So I added the following line: \bupload\.wikimedia\.org\/.*XRumer_screenshot\.gif\b to blacklist all links to possible thumbnail sizes. although I don't know if I did it properly (and the logging system used here confuses me). So, could anyone here review if I did it properly? es:Drini 19:07, 28 March 2008 (UTC)Reply

That works. I just tried it out. (adding the link that is). —— Eagle101 Need help? 01:25, 29 March 2008 (UTC)Reply
I deleted the pic on enwiki, btw, but am told that it'll be a while before that link is purged. If it's a huge problem, we can request that a shell user delete the file manually, but... ~Kylu (u|t) 22:13, 1 April 2008 (UTC)Reply

SpamReportBot/cw feedback

First item: after a lot of checking, I went through and made comments in each section as to which bot-reported domains needed blacklisting and which looked legit. When I was all done, I saw that none of my edits "stuck" -- it was if I'd never made them.. This must have something to do with the fact that these reports are transcluded. Then I went and blacklisted 13 domains; afterward I saw others had also blacklisted some of the same links, so there was some wasted effort. Conclusion: we very much need a way to mark up these reports so we don't duplicate each others' efforts.

In lieu of marking each report, here's my feedback on some of the domains reported so far:

  • I blacklisted these:
    • tremulous.net.ru
    • logosphera.com
    • vidiac.com
    • yarakweb.com
    • img352.imageshack.us
    • ayvalikda.com
    • sarimsaklida.com
    • worldmapfinder.com
    • cundadan.com
    • bikerosario.com.ar
    • alfpoker.com
    • karvinsko.eu
    • yarak.co.uk
  • Links added to these sites looked legitimate:
    • wikilivres.info
    • unwto.org
    • en.pwa.co.th
    • villatuelda.es
  • Some others still need evaluation

All in all, SpamReportBot/cw looks like a very powerful, useful tool. --A. B. (talk) 03:28, 30 March 2008 (UTC)Reply

OK, I just figured out that if I post my comments in the bot report sections above the line that says "<!-- ENDBOT POST BELOW HERE -->", then they'll show up. I don't know if it's a good idea to do this, however -- will it screw up the bot or the transclusion? --A. B. (talk) 03:36, 30 March 2008 (UTC)Reply
The work of the bot is awesome & deserves both thanks & discussion. There seems a few issues that need addressing such as what to look at, logging etc & it would be good to see discussion here. I feel that there may be a case for listing all bot generated sites because the behaviour is "spammy". However I also think because it is bot generated and there will likely have been no warnings, that entries can & should be removed after some sensible interaction has taken place. I am well aware that others here would not share my views so I will substantially reduce my activity on this page (& Meta).
The bot - while excellent - has generated far more work that I have time for and so I will just look at dealing with the request from the people who make requests here & who I've got to know & trust if I am around. Given the vast number of admins on Meta this should not cause any problems - however Meta seems to attract many people who want be admins but are not inclined to do any of the work. If I am around I'll help but my time is short & there is much to do on Commons. Thanks --Herby talk thyme 12:25, 30 March 2008 (UTC)Reply
You do raise a valid point, as far as no warnings. Thankfully we just turned a major corner. We now have the ability to detect most spammy behavior. However now that detection and reversion is easier (SUL), we may want to evaluate what we do in response to those that add links many times.
When I first started helping in this effort, we were shooting in the dark. There was no COIBot reports, irc feeds, the crosswiki linksearch tool, or any sort of monitoring of more then one wiki at a time... thus detecting spam across multiple wikis was... pardon my language, damned hard! As such we blacklisted all we could find. This type of spam was and is sneaky as it bypasses most community's detection mechanism. Its only one link to folks on the various wikis, but added togather its across 5 or more!
Now that we have a detection mechanism, one that we can adopt should the behavior of spammers change significantly, we need to ask ourselves, should we blacklist with the same vigor? Should we attempt to assume good faith of ones that appear to us to be accidental, or in good faith? How do we go about warning someone that may never see the warning, or be unable to read the language in which the warning is placed in? In addition, we must remain ever wary of en:Joe jobs.
These are questions that need to be answered, and Herbythyme is right on the ball hinting at these here and elsewhere. Its perfectly valid to keep our response the same as it always was, but this may not be the best course of action. I don't know for sure what is. Please discuss your thoughts to this below my comment, or in its own section. :) —— nixeagle 18:20, 30 March 2008 (UTC)Reply
Someone will have to remove the blacklisted links from the wikis. Can that be done by a bot and/or can a bot be set up to give information on the affected wikis about where the blacklisted links are, so the local community can remove the links themselves? Removing spam is a tedious task, and sometimes one feels one is as much infringing with the local communities as is any spammer. If possible the local communities should evaluate the blacklisted links themselves, and remove the ones they don't want, and either strip or whitelist the others. I realize that might not be very realistic. For Commons there is the CommonsTicker and CommonsDelinker. Is it possible to handle the blacklisted links in a similar way? --Jorunn 13:49, 30 March 2008 (UTC)Reply
Possibly it could be done by bot... I can work on writing this if its wanted. SUL will make things much easier. I usually just click the diff links and click undo on each of the ones I blacklist. In otherwords I don't blacklist things I'm not willing to undo the link additions to. —— nixeagle 17:54, 30 March 2008 (UTC)Reply
A.B. - As far as your edits not sticking on the transcluded pages... can you show me an example? I can't fix it unless I can see an example of the problem. :S It will be useful down the road to have the blacklisted or not portion in the page itself, so this should work without any problems... —— nixeagle 18:03, 30 March 2008 (UTC)Reply
Replying to myself again: AB - "<!-- ENDBOT POST BELOW HERE -->", posting above that means the bot will overwrite your comments should there be future link additions from that domain.
Also, A.B. and everyone else interested, I just modified the algorithm to remove 2 out of 4 identified false hits. I'll look at the other two, but I'd like to see this run for a day or so and see what crops up. Please do attempt to comment on the actual sub pages. —— nixeagle 19:31, 30 March 2008 (UTC)Reply

I removed transclusion from this page because it was loading very slowly — VasilievVV 06:22, 13 April 2008 (UTC)Reply

Sure, when and if it gets back to a manageable level, we can place it back on this list. —— nixeagle 19:45, 16 April 2008 (UTC)Reply
I want to add, we also need people on IRC watching our bots. The output of the bots we are running there does show when accounts are actually busy spamming cross-wiki, and much work and damage can be stopped when reacting promptly there. I yesterday added two before they were reported here (closed the reports this morning). Also, when you hit them when they are busy, they notice that what they do is a problem, if you add them the next day, they may never know what happened. --Dirk Beetstra T C (en: U, T) 10:33, 17 April 2008 (UTC)Reply

Clearing the backlog!

OK - we have a choice - drown in it or tackle it! It will not be long before that page will not load never mind anything else.

Assuming drown is not the choice (:)) I think we need to use a larger mesh. These are reports of possible excessive linkage. If we had the time & people we would look in detail at every one with a fine tooth comb - we haven't.

Action plan

  1. I'm going to close all those that have been around for a week or so. The worst that will happen is that they will be re opened again?
  2. I think we need to take the view that we take a quick look at each - if it doesn't look like a threat to the project we close it and move on. One of the issues here is great though the bot is no human has actually checked it so it is far more labour intensive that manual reports.
  3. Recruit - can anyone who knows anyone who is a "spam fighter" get them to take a look at this stuff. For anyone who has time & some cross wiki experience it is a worthwhile area to work. Those with close ties with other language projects could approach local workers too.

Comments welcome but it is a time for doing not talking (I'll spam talk pages on Meta with a link to this). Cheers --Herby talk thyme 07:09, 16 April 2008 (UTC)Reply

What actions can a non-admin take? I'm an admin on nlwiki and I'd like to help out if I can, but I'm not an admin here. --Erwin(85) 07:38, 16 April 2008 (UTC)Reply
Any help would be seriously appreciated Erwin. This being a wiki you can do what you like! More helpfully (& my opinion only) are the links really excessive, unwanted, spam? Again for me it means checking the diffs out on some of the wikis (& the site probably too). Has it been removed by the local folk (fr & nl are pretty good at spotting spam)? Maybe try Luxo's tool for cross wiki contribs (& blocks too).
Then it is your judgement - if you feel it is not excessive linkage or is not a current threat to the project then "close" it (as far as I know that merely means replacing "open" on the status with "closed"?) with any comments.
If you do see it as spammy then add that comment and hopefully someone will get round to blacklisting it and closing it - certainly I will do what I can.
There are some reports where the same IP is placing a number of links - that makes me quite suspicious so if you pick up on anything like that do mention it. Any help will be appreciated - thanks --Herby talk thyme 07:49, 16 April 2008 (UTC)Reply

It will probably not break the bot, but the bot will just put it back .. so it is of no help. I would suggest to just close those that seem fine-ish, they will come back if it reoccurs. The problem is that I am at the other side of the bots, and hell, there is a lot of work that does not even get onto this page. We need people here, and on IRC! Blacklisting is a solution, but it would be better to hit them with the wikitrout when they are actually doing it. I blacklisted a couple of links while they were busy spamming, and I have seen two immediately coming here to complain. It also gives us less work, when the links are blocked (here) or whitelisted (on the bot), the reports can be closed, and there is less to clean .. We just really need more people! --Dirk Beetstra T C (en: U, T) 10:41, 16 April 2008 (UTC)Reply

I will automatically hide those older then 5 days. Later I can (vie a database call, display them should we ever get that far). If there is continued link additions, the bot will re-add the link. Sorry folks for not being around :( —— nixeagle 19:25, 16 April 2008 (UTC)Reply
Just a note, those hidden can be recalled at a later date if folks are interested in looking at it. I do agree, we just need more people! —— nixeagle 19:44, 16 April 2008 (UTC)Reply
Well, you have one more person. If I step on toes, or mess up with the various templates etc, please poke me or I won't learn. – Mike.lifeguard | @en.wb 04:33, 17 April 2008 (UTC)Reply
I've re transcluded the list, as nakon did some damage to it. That along with the removal of the older items did the trick, however! we need folks to continue to watch this, or its just going to happen again. If the bot reported sectino ever gets larger then about 25-35 items, we have a backlog. (It should generate about 20-30 a day). —— nixeagle 04:34, 17 April 2008 (UTC)Reply

(same as above :-) ): I want to add, we also need people on IRC watching our bots. The output of the bots we are running there does show when accounts are actually busy spamming cross-wiki, and much work and damage can be stopped when reacting promptly there. I yesterday added two before they were reported here (closed the reports this morning). Also, when you hit them when they are busy, they notice that what they do is a problem, if you add them the next day, they may never know what happened. --Dirk Beetstra T C (en: U, T) 10:34, 17 April 2008 (UTC)Reply

Not really related, but to ping people's watchlists maybe, and avoid spamming talk pages. I've suggested new styles for Template:LinkSummary and Template:UserSummary on their respective talk pages. Don't want to make rash changes as these are rather often used. My purpose is to make them readable, which makes them useful. Currently, I find it very difficult to find what I'm looking for in there, so I redid them. – Mike.lifeguard | @en.wb 15:35, 18 April 2008 (UTC)Reply

Looking ahead

"Not dealing with a crisis that can be foreseen is bad management"

The Spam blacklist is now hitting 120K & rising quite fast. The log page started playing up at about 150K. What are our options looking ahead I wonder. Obviously someone with dev knowledge connections would be good to hear from. Thanks --Herby talk thyme 10:46, 20 April 2008 (UTC)Reply

I believe that the extension is capable of taking a blacklist from any page (that is, the location is configurable, and multiple locations are possible). We could perhaps split the blacklist itself into several smaller lists. I'm not sure there's any similarly easy suggestion for the log though. If we split it up into a log for each of several blacklist pages, we wouldn't have a single, central place to look for that information. I suppose a search tool could be written to find the log entries for a particular entry. – Mike.lifeguard | @en.wb 12:24, 20 April 2008 (UTC)Reply
What exactly are the problems with having a large blacklist? --Erwin(85) 12:34, 20 April 2008 (UTC)Reply
Just the sheer size of it at a certain moment, it takes long to load, to search etc. The above suggestion may make sense, smaller blacklists per month, transcluded into the top level? --Dirk Beetstra T C (en: U, T) 13:16, 20 April 2008 (UTC)Reply
Not a technical person but the log page became very difficult to use at 150K. Equally the page is getting slower to load. As I say - not a techy - but my ideal would probably be "current BL" (6 months say) & before that? --Herby talk thyme 13:37, 20 April 2008 (UTC)Reply
I don't know how smart attempting to transclude them is... The spam blacklist is technically "experimental" (which sounds more scary than it really is) so it may not work properly. I meant we can have several pages, all of which are spam blacklists. You can have as many as you want, and they can technically be any page on the wiki (actually, anywhere on the web that is accessible) provided the page follows the correct format. So we can have one for each year, and just request that it be added to the configuration file every year, which will make the sysadmins ecstatic, I'm sure :P OTOH, if someone gives us the go-ahead for transclusion, then that'd be ok too. – Mike.lifeguard | @en.wb 22:12, 20 April 2008 (UTC)Reply
A much better idea: bugzilla:13805! – Mike.lifeguard | @en.wb 01:43, 21 April 2008 (UTC)Reply

Well, Mike.lifeguard, i don't know where to answer you, for what you made by putting all my links 'weblog.ro' from pages like Simone de Beauvoir, Houellebecq etc. on the black list, so i'll do it here, where i see your name.

What if all this is not true and you, and all your friends here, made an abuse? What if that site you're talking about is a simple blog, and has no advertising and will never had, and all the videos it has there are just cultural, with cultural themes, and no one who wants to get clicks will ever do it by posting cultural things about writers, what if all those are really just writers that i love or belive in, and want everybody to see those interviews, and that's all, what if you, just you, are a plane, simple, pure, full time idiot after all, and you've just offended a guy who did nothing wrong, and doesn't even know how to do that? hum?

never mind, have a nioce life with your friends. you must be happy persons. i know you will remain many.

Date/time in bot reports

In the bot reports there's a date and time given for each diff. However the given time isn't actually the time of the revision. Both the hour and the minutes differ so it's not simply another timezone. Does anyone know what the given date/time mean?--Erwin(85) 18:12, 20 April 2008 (UTC)Reply

It is the time on the machine the bots are running on. For me (I am in Wales, UK) it looks like the box is 5 hours and 2 minutes off. We could correct for that, but I guess it is more an indication of the spam-speed than something that is really necessery, the diffs give the correct times. Hope this explains. --Dirk Beetstra T C (en: U, T) 19:51, 20 April 2008 (UTC)Reply
Thanks. --Erwin(85) 12:29, 21 April 2008 (UTC)Reply